[rfc-i] issue: canonical formats

John R Levine johnl at taugh.com
Fri Jun 1 08:39:52 PDT 2012

>> The problem with output formats is that they are simultaneously
>> overconstrained and undertagged.  Something like PDF/A prints nicely,
>> but it is full of stuff like fonts and line and page breaks that
>> aren't relevant to the semantics of the document, while missing the
>> metadata about the abstract and the postcode.
> I would actually argue, especially in light of the other thread on
> non-US-ASCII characters, that the Font program is EXTREMELY IMPORTANT, not
> only for visual display but also for semantic relevance (as it could/would
> incorporate encoding information).

I'll let others comment on how good an idea it would be to have standards 
whose semantics would depend on which fonts were used to render UTF-8 
text.  In the discussion so far, one of the few points on which we appear 
to have complete agreement is that the encoding for non-ASCII material is 

> Additionally, it should be recognized that PDF supports a VERY RICH
> object-level (as well as document level) metadata model.  One can
> associate either simplistic "name/value pairs" or an entire chuck of XMP
> (and XML/RDF-based standard (ISO 16684)) with any set of graphic objects.

So I hear, but the tool support is a bit thin compared to support for 
things like XML, HTML, and JSON.  Particularly the support in tools that 
don't require buying software from your employer.

John Levine, johnl at taugh.com, Taughannock Networks, Trumansburg NY
"I dropped the toothpaste", said Tom, crestfallenly.

More information about the rfc-interest mailing list