This page is for keeping thoughts about the expected output formats *other than* XML.
The formats discussed so far are:
Initial proposal: A strong design goal is that the conversion from canonical XML to HTML should be round-trippable, that is, that it should be possible to convert the HTML back to XML with literally zero loss of semantic content. Conversion from and to the canonical XML might be done with XSLT. Response:
For the example of counter="requirement", there are ways that the information could be propagated, such as into a class name of list_counter_requirement, but that's kind of ugly and subject to issues when the namespace characters are different. (What do you do with a counter name that contains spaces or non-alphameric characters?) With a requirement of "all", each of these edge cases would need to be nailed down. But is it the type of semantic information that *needs* to be propagated? I really don't think so.
As of 2013-10-09, it is not clear whether or not the text output will be ASCII or UTF-8. The following assumes ASCII. If the format is UTF-8, then the following is wrong.
The text-only format must have the same character-set limitations as the current RFC format. For new RFCs that have non-ASCII characters in them, each such character must be represented as [*U+xxxx*], where xxxx is a 4- or 6- character hex value. The use case here is that it must be possible to convert all of the encoded versions of the non-ASCII characters in the text-only document exactly to the correct characters in the canonical document. The choice of [*U+xxxx*] was made because it is extremely unlikely for that sequence to be part of a normal RFC, even one that talks about Unicode code points by their hex values. For example, an author's name that is represented in the canonical format as “Martin Dürst” would be represented in the text-only format as “Martin D[*U+00FC*]rst”. This requires that lines in the text-only format be longer than 80 columns if those lines contain non-ASCII characters.
Dave thinks: disagree with the above paragraph. I'm leaning towards saying there should be a separate UTF-8 (e.g. .utf8) text version. And for either version I don't think any U+ sequence should appear for a person's name.
Paul thinks: if there are two versions, the .txt should be UTF-8 and the ASCII version should be .asc. If there is an all-ASCII version, we need to ask the authors how they want their names (mis)spelled in ASCII.
Initial proposal: There should be multiple text outputs: ASCII-only with page breaks, ASCII-only without page breaks, UTF-8 with page breaks, UTF-8 without page breaks.
Response: Limit the .txt output to one option only, as similar as reasonable to what is available today. That would be text, ascii-art only with links to images, page breaks with headers and footers.
The paginated text format needs to deal with the issue of paragraph or art that would be split over a page break.
[PH] Eliminate the problem is to just be willing to leave extra white space at the bottom of the paginated pages. If a single paragraph or figure is too large to fit on a paginated page (the tool should warn about this every time it emits paginated text output), the Production Center can break the paragraph or split the figure into two.
[TH] (widow == bottom line of a paragraph that winds up in the next column/page. orphan == top line of a paragraph that is separated from the rest of the paragraph by a column/page break.) In most cases, both can be eliminated by not limiting yourself to a strict number of lines (N) on a page, but allowing yourself to go to N+1. If the paragraph is exactly 3 lines long, then a page length of N+2 can eliminate both the widow and orphan.
If you must limit the page size to a maximum of N lines, then you can use a page length of N-1 lines to force another line onto the top of the next page. If headings occur prior to the orphan, then they must be moved to the next page as well. Paragraphs exactly 3 lines long that have been split in either direction would just be moved to the next page, along with any headings.
Initial proposal: The document needs to include live links
For linking between RFCs, pointers to RFCs published before the format switchover will point to the TXT version For linking between RFCs, pointers to RFCs published after the format switchover will point to the PDF version and will allow for pointers to specific sections within a document The PDF version will include the standard front page header and include page numbers The PDF version will be sized for ???
Response: With HTML as an option, there is not a compelling case to require links in the PDF. One use case described was that of the IESG, several members of which choose to print out the PDF version for review. Links would not provide enough (any?) additional value to suggest we need to add this. Team suggests that the requirements for PDF do not actually need to change from what they are today: PDF as a direct copy of the TXT format with the inclusion of graphics.
Team also briefly considered how a tool like PrinceXML could generate PDF from HTML. That went to much in to implementation, and it was left as an example of something that might be possible and limit the number of tools needing to be modified or created.
If we also want to do MOBI (the native Amazon format), we might consider running the free-but-closed-source program from Amazon http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000765211. If people are hesitant about us even partially supporting that program, we could consider having a pointer to the program on an advisory page on rfc-editor.org.
If the HTML output is designed well, it can be used to create EPUB output with few, if any, additional requirements.