[rfc-i] issue: canonical formats

Andrew G. Malis agmalis at gmail.com
Fri Jun 1 04:47:22 PDT 2012


John,

While this makes a lot of sense, one could also imagine that along
with a canonical source format, there is also a canonical rendering
engine, which produces a canonical display format. A canonical RFC
would be the combination of a canonical input file and the
corresponding canonical display file. Just a thought.

Cheers,
Andy

On Thu, May 31, 2012 at 10:36 PM, John Levine <johnl at taugh.com> wrote:
> As I read the wiki page, I see notes about a canonical source version
> and a canonical display version.  I would like to suggest that there
> be only one canonical version, and whatever it is, it should be a
> source versiont has structure and metadata at roughly the level that
> xml2rfc does.
>
> That is, I would like a canonical version that makes it possible to
> mechanically (i.e., using algorithms, not heuristics) identify that
> this part is the abstract, that part is a paragraph of text, and this
> other part is the second author's postal code.  It doesn't have to be
> xml2rfc, a constrained HTML or XHTML subset could do the job.
>
> The problem with output formats is that they are simultaneously
> overconstrained and undertagged.  Something like PDF/A prints nicely,
> but it is full of stuff like fonts and line and page breaks that
> aren't relevant to the semantics of the document, while missing the
> metadata about the abstract and the postcode.
>
> So I'd rather that a form with metadata but that doesn't attempt to do
> layout be canonical, and any other derived format is correct to the
> extent that it correctly represents the contents of the canonical
> version.  (This is not all that unsual.  Try figuring out which of the
> umpteen translations of a European Union law or regulation is the
> canonical one.)
>
> For long term stability, I'd also waht the canonical format to be well
> specified, and possible for a reasonably motivated person to interpret
> without complex tools.  So XML or HTML, which you can look at in any
> text editor and visually identify the text and the markup, would be
> better than, say, Postscript, which you can look at in the editor, but
> typically can't decode the text without running a lot of code in your
> head, or PDF or Word which needs a hex editor if you don't happen to
> have a rendering engine handy.
>
> R's,
> John
> _______________________________________________
> rfc-interest mailing list
> rfc-interest at rfc-editor.org
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest


More information about the rfc-interest mailing list