User Tools

Site Tools


design:formats

This is an old revision of the document!


Thoughts on Non-Canonical Formats

This page is for keeping thoughts about the expected output formats *other than* XML.

The formats discussed so far are:

  • Well-structured HTML
  • Unpaginated text
  • Paginated text
  • PDF
  • EPUB

Well-structured HTML

A strong design goal is that the conversion from canonical XML to HTML should be round-trippable, that is, that it should be possible to convert the HTML back to XML with literally zero loss of semantic content. Conversion from and to the canonical XML might be done with XSLT.

Joe says more here.

Unpaginated Text

Paginated Text

This is text with headers, footers, and page break characters.

Avoiding Bad Breaks in Paginated Text

The paginated text format needs to deal with the issue of paragraph or art that would be split over a page break.

[PH] Eliminate the problem is to just be willing to leave extra white space at the bottom of the paginated pages. If a single paragraph or figure is too large to fit on a paginated page (the tool should warn about this every time it emits paginated text output), the Production Center can break the paragraph or split the figure into two.

[TH] (widow == bottom line of a paragraph that winds up in the next column/page. orphan == top line of a paragraph that is separated from the rest of the paragraph by a column/page break.) In most cases, both can be eliminated by not limiting yourself to a strict number of lines (N) on a page, but allowing yourself to go to N+1. If the paragraph is exactly 3 lines long, then a page length of N+2 can eliminate both the widow and orphan.

If you must limit the page size to a maximum of N lines, then you can use a page length of N-1 lines to force another line onto the top of the next page. If headings occur prior to the orphan, then they must be moved to the next page as well. Paragraphs exactly 3 lines long that have been split in either direction would just be moved to the next page, along with any headings.

PDF requirements

  • The document needs to include live links
    • For linking between RFCs, pointers to RFCs published before the format switchover will point to the TXT version
    • For linking between RFCs, pointers to RFCs published after the format switchover will point to the PDF version and will allow for pointers to specific sections within a document
  • The PDF version will include the standard front page header and include page numbers
  • The PDF version will be sized for ???

We have talked about using PrinceXML to generate PDF.

EPUB

If we also want to do MOBI (the native Amazon format), we might consider running the free-but-closed-source program from Amazon http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000765211. If people are hesitant about us even partially supporting that program, we could consider having a pointer to the program on an advisory page on rfc-editor.org.

design/formats.1380696170.txt.gz · Last modified: 2013/10/01 23:42 by rsewikiadmin