User Tools

Site Tools


rfc_metadata_in_the_v3_era

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
rfc_metadata_in_the_v3_era [2019/09/10 16:47]
arusso created
rfc_metadata_in_the_v3_era [2019/09/11 15:42] (current)
arusso
Line 3: Line 3:
   * New XSD is in place (10 Sept 2019): https://www.rfc-editor.org/in-notes/rfc-index.xsd   * New XSD is in place (10 Sept 2019): https://www.rfc-editor.org/in-notes/rfc-index.xsd
   * rfc-index.txt, .html, .xml (and similar files) are slightly different from before, this is due to:   * rfc-index.txt, .html, .xml (and similar files) are slightly different from before, this is due to:
-    *  add new publication formats and source format.+    *  added new publication formats and source format.
     * removed file sizes. (Byte count is no longer included as metadata for each RFC.)     * removed file sizes. (Byte count is no longer included as metadata for each RFC.)
   * There is only one page count per RFC, even if it is available in multiple file formats.    * There is only one page count per RFC, even if it is available in multiple file formats. 
-    * For pre-v3 docs, it is the page count of the .txt file.+    * As a result, in rfc-index.xml, page-count has been "pulled up" a level. 
 +  * For page count, source of data will change. 
 +    * For pre-v3 docs, it is the page count of the .txt file (except for a handful of old RFCs where there is no .txt file).
     * For v3 docs, it will be the page count of the PDF file.     * For v3 docs, it will be the page count of the PDF file.
  
-PDF naming conventions+=== PDF naming conventions ===
   * Externally, PDFs are listed simply as "PDF" (whether v3 or otherwise).   * Externally, PDFs are listed simply as "PDF" (whether v3 or otherwise).
-  * Internally, the db holds "v3PDF" to v3 output; "PDF" is used for pre-v3.+  * Internally, the db holds "v3PDF" to mean v3 output; "PDF" is used for pre-v3.
   * As before, .txt.pdf files are not listed in index files.   * As before, .txt.pdf files are not listed in index files.
  
-TEXT naming conventions+=== TEXT naming conventions ===
   * Externally, .txt format (whether pre-v3 or not) is listed as "TEXT".\\ Exception: rfc-index.xml: will display "ASCII" (for pre-v3) and "TEXT" (afterwards). Rationale: other index files were already using the "TEXT" instead of "ASCII", so they weren't changed to start differentiating.   * Externally, .txt format (whether pre-v3 or not) is listed as "TEXT".\\ Exception: rfc-index.xml: will display "ASCII" (for pre-v3) and "TEXT" (afterwards). Rationale: other index files were already using the "TEXT" instead of "ASCII", so they weren't changed to start differentiating.
-  * Internally, the db holds "TEXT" to mean v3 output with utf-8 encoding and BOM; "ASCII" is used for pre-v3.+  * Internally, the db holds "TEXT" to mean v3 output; "ASCII" is used for pre-v3. 
 + 
 +=== New resource: JSON files of RFC metadata === 
 +  * One JSON file per RFC was made available 11 September 2019. 
 +  * Each file contains data similar to an entry in rfc-index.xml. 
 +  * Sample file: https://www.rfc-editor.org/rfc/rfc5234.json 
 + 
 +=== Examples ===
  
 Example: RFC4254 Example: RFC4254
Line 33: Line 42:
  
     <format>     <format>
-        <file-format>TEXT</file-format>+        <file-format>ASCII</file-format>
         <file-format>HTML</file-format>         <file-format>HTML</file-format>
     </format>     </format>
rfc_metadata_in_the_v3_era.1568159246.txt.gz · Last modified: 2019/09/10 16:47 by arusso