User Tools

Site Tools


rfc_metadata_in_the_v3_era

Notes on RFC metadata in the v3 era

  • New XSD is in place (10 Sept 2019): https://www.rfc-editor.org/in-notes/rfc-index.xsd
  • rfc-index.txt, .html, .xml (and similar files) are slightly different from before, this is due to:
    • added new publication formats and source format.
    • removed file sizes. (Byte count is no longer included as metadata for each RFC.)
  • There is only one page count per RFC, even if it is available in multiple file formats.
    • As a result, in rfc-index.xml, page-count has been “pulled up” a level.
  • For page count, source of data will change.
    • For pre-v3 docs, it is the page count of the .txt file (except for a handful of old RFCs where there is no .txt file).
    • For v3 docs, it will be the page count of the PDF file.

PDF naming conventions

  • Externally, PDFs are listed simply as “PDF” (whether v3 or otherwise).
  • Internally, the db holds “v3PDF” to mean v3 output; “PDF” is used for pre-v3.
  • As before, .txt.pdf files are not listed in index files.

TEXT naming conventions

  • Externally, .txt format (whether pre-v3 or not) is listed as “TEXT”.
    Exception: rfc-index.xml: will display “ASCII” (for pre-v3) and “TEXT” (afterwards). Rationale: other index files were already using the “TEXT” instead of “ASCII”, so they weren't changed to start differentiating.
  • Internally, the db holds “TEXT” to mean v3 output; “ASCII” is used for pre-v3.

New resource: JSON files of RFC metadata

Examples

Example: RFC4254

-- OLD
    <format>
    <file-format>ASCII</file-format>
        <char-count>50338</char-count>
        <page-count>24</page-count>
    </format>
    <format>
    <file-format>HTML</file-format>
    </format>

-- NEW

    <format>
        <file-format>ASCII</file-format>
        <file-format>HTML</file-format>
    </format>
    <page-count>24</page-count>

For comparison, the JSON record includes (among other data):

"format":["ASCII","HTML"],"page_count":"24"

Example: RFC8888 (v3 era)

-- NEW

    <format>
        <file-format>TEXT</file-format>
        <file-format>HTML</file-format>
        <file-format>PDF</file-format>
        <file-format>XML</file-format>
    </format>
    <page-count>48</page-count>

For comparison, the JSON record would include (among other data):

"format":["TEXT","HTML","PDF","XML"],"page_count":"48"
rfc_metadata_in_the_v3_era.txt · Last modified: 2023/02/14 09:52 by jmahoney