[rfc-i] issue: canonical formats

Iljitsch van Beijnum iljitsch at muada.com
Sun Jun 3 03:11:24 PDT 2012

On 1 Jun 2012, at 4:36 , John Levine wrote:

> As I read the wiki page, I see notes about a canonical source version
> and a canonical display version.  I would like to suggest that there
> be only one canonical version, and whatever it is, it should be a
> source versiont has structure and metadata at roughly the level that
> xml2rfc does.

I agree that having multiple canonical versions is problematic, especially canonical source and display versions. At some point the mechanism to generate the latter from the former will change, and at that point it won't be possible to generate the canonical display version from the canonical source version and where does that leave us?

Ultimately, the source is not important. What we need is a stable format that is both useful for humans to read and for tools to process. This is the version where we manually check whether every comma is where it needs to be and that every non-breaking space indeed doesn't break. This is the version we archive.

Of course it makes sense that authors would submit drafts to the drafts repository and to the RFC Editor in this format, too, although there may be reasons to also allow such submissions in different formats.

I have no problem at looking at the metadata and markup that XML2RFC supports right now, but in my opinion, XML2RFC took some things too far and its structure is too complex and too rigid for human authors to create, so we should be careful about what we import from XML2RFC.

> this other part is the second author's postal code.

Why not include the author's latitude and longitude rather than trying to attach meaning to postal codes?

> The problem with output formats is that they are simultaneously
> overconstrained and undertagged.

> So I'd rather that a form with metadata but that doesn't attempt to do
> layout be canonical

I think with HTML we can have the tagging and with CSS we can have usable layout. Yes, the tagging will be less than in XML and the layout will be less than in PDF, but we do get the enormous advantage that we don't need to maintain two forms and tools to convert from one to the other.

People are getting a bit annoyed with this canonical stuff, and yes, if the tools all do their job it doesn't really matter because all the versions will be the same in all aspects that we care about. But strange things happen from time to time, so unless we want to pay humans to make sure different versions are in fact semantically identical, we need to pick one version that we know is correct so we can always go back to that version when confusion arises.

More information about the rfc-interest mailing list