[rfc-i] UTF-8 and Unicode examples

Bob Braden braden at ISI.EDU
Mon May 3 11:04:25 PDT 2004


  *> 
  *> Increasingly, protocols use UTF-8 as their 'native' format. If a 
  *> document wants to present an example, it can, due to the US-ASCII rule, 
  *> not use the character itself. A possible solution is to use the common 
  *> U+1234 notation for Unicode instead, or some specific notation for UTF-8 
  *> in ASCII.
  *> 
  *> The same problem exists for names, e.g., in acknowledgements, albeit 
  *> less urgently.
  *> 
  *> It would be nice to find a general solution rather than each author 
  *> winging it. (The inability to specify non-ASCII examples might well 
  *> contribute to implementor laziness as well, as implementors code from 
  *> examples and thus simply consider the UTF-8 thing some political 
  *> correctness item that they can safely ignore :-))
  *> 
  *> Henning

Henning,

The issue of extended character sets has been on the back-burner at the
RFC Editor for the past several years.  Yes, it "would be nice".
Unfortunately, it appears to us that this path is full of deep, deep
pits of non-interoperability!  There are no widely-available tools for
reading, searching, comparing, editing, or printing documents
containing UTF-8 and friends, as far as we know. (We would be happy to
be proven wrong.)

Bob Braden



More information about the rfc-interest mailing list