[rfc-i] UTF-8 and Unicode examples

Bob Braden braden at ISI.EDU
Mon May 3 16:30:10 PDT 2004


   *> 
  *> There are several plausible solutions:
  *> 
  *> (1) Simply state that the U+ notation is to be used, even though the 
  *> actual (UTF-8) encoding will not consist of two octets. For example, one 
  *> might write
  *> 
  *> M+00FCnchen

That is unpleasantly ambiguous looking.  A computer knows to look for
4 hex characters, but to a human it is harder to parse.  Maybe:

	M+00FC'nchen? or M'00FC'nchen?

and of course you have to be able to escape the +.

Bob



More information about the rfc-interest mailing list