The HTML has to render correctly on the following:
These requirements are expected to change in the future to reflect the expectation that HTML rendering will be required for current versions of browsers and platforms, while ideally continuing to render correctly on earlier versions.
The HTML document must preserve all semantic information that is in the canonical XML document. One use case is that preformatted text that has different tags in the XML will also be differentiable in the HTML, making it trivial to extract all of the (for example) ABNF in an RFC with a simple program. Another use case is that someone who wants to write programs that will extract information from an RFC can do so equally well with the XML and HTML, and can choose the tool that uses one or the other format for input.
A non-requirement is that the HTML document have any non-semantic information such as comments and processor instructions. (This non-requirement should be removed if they are not allowed the XML.)
The HTML document must come with a default internal set of CSS formatting. This will allow for a mostly-consistent display of RFCs across browsers. It will also allow for the HTML file to be moved over different transports (such as mail) and have the result look the same.
The HTML must display well in at least one text-based browser.
The HTML document must allow easy local override of the default CSS formatting. This will allow users who have a different visual style that they prefer to make RFCs display with that style without having to alter the contents of the HTML document. This might also be valuable for allowing people with specific accessibility needs to have custom CSS.
No HTML tags in the document may have style information. All style information must be done through “class” and “id” attributes, with the style for those represented in the CSS alone.
The HTML must make it easy to separate chunks into separate files. This will make creating EPUB documents easier in the future.
The output needs to be HTML 5. Language extensions might be acceptable after further discussion. The RFC Editor will need to use an automated validating tool before publishing the HTML. This requirement is not important for viewing with browsers, but is important for programs that will use the HTML format as input for processing.
All section, subsections, figures, and paragraphs should have stable numbered link anchors. Additionally, anchors expressed in the source XML should be exposed as anchors in the HTML as well.
The abstract must be marked up or tagged in a way that search engines will extract it as summary.
Answer added where possible, per RFC Format Design Team call 16-September-2014
Scope of the html document
There are sections that are really requirements for people writing
xml2rfc v3. Those should be teased out into the grammar draft or
draft talking about submission restrictions on the v3 format.
Section 3.2.13 has what looks like instructions to authors: “If the
quote needs a citation”… That should be in a document that talks in
terms of the XML. What should be here is what the HTML is expected to
reflect from the XML. What XML do we expect to be input to result in the
HTML example in that section?
“The only tags that may contain a 'style' attribute are ” (and
give an explicit list).
Basic HTML comments
It would be good to double-check that the currently deployed browsers
treat that input as expected (at least at those we list as requirements).
Section 3.1 disallows points like U+0009. Section 4 talks about
compressing instances of U+0009 (and other disallowed points) into a
Section 3.1 requires text containing elements be serialized as a single
unwrapped line (to help with diffs).
There is a separate requirement to indent children. (Thus there is an
implicit requirement to start children on a new line.)
Which of the following are we wanting to happen?:
<p>For more information see <xref target='foo'>[I-D.draft-foo-bar]</xref>. Your mileage may vary.</p>
<p>For more information see
Your mileage may vary.</p>
Either way, the text needs to be made consistent and some examples would
Making the text consistent while dealing with nested lists may get gnarly.
There are a couple of places where the document talks about non-div
tags. I think this is from working in how we're going to place author
provided and autogenerated ids. I don't think the distinction helps, and
we can just say “tags” wherever the text says “non-div tags”. (But I
also suspect we should be pulling out a section explicitly discussing
What prefixes to use for autogenerated tags?
Should an RFC style document encourage authors to use common tags for
things like “Security Considerations,” “IANA Considerations,” etc., to
help solve for the problem of intuitive pointers to common sections in
What reference to use (if any) for HTML5?
Where is the line between indicating what the XML should do within the HTML
for things like ASCII art, packet diagrams, etc, and what is appropriately
just information for the XML draft?
Using classes instead of ids to aid with styling.
This is a very good point. If others agree, I would propose that the current
draft be changed from “<div id='abstract'>” to “<div class='abstract'>” and
that we specify classes for all sections that seem to have special meanings.
It's not clear what “same logic” in “Paragraphs are wrapped in <div>s using
the same logic as sections” means. Is this intending to talk about how id
attributes get placed, or something else?
The document says “Additionally, anchors expressed in the source XML should
be exposed as anchors in the HTML as well.” I suggested in my nits message
striking “Additionally,” and “as well”. But it occurs to me that the document
needs to be more specific and reflect _how_ the source XML allows anchors to
be expressed, and how those will be translated into the HTML. This falls, I
think, into being clearer about author-provided and autogenerated ids.
There's text that says to wrap section numbers in an
<a class='self-ref' href=…>. I suspect this should apply to other <a href>
that are generated?
The paragraph that begins “For other block items, such as <figure>, <t>,
and <texttable> is talking about XML, not HTML. Can it be rewritten more
specifically in terms of input and output?