[rfc-i] Another example of a draft with non-ASCII characters (draft-ietf-iri-3987bis-12.txt)

Joe Hildebrand (jhildebr) jhildebr at cisco.com
Tue Jul 17 07:04:10 PDT 2012


On 7/17/12 6:53 AM, "Brian E Carpenter" <brian.e.carpenter at gmail.com>
wrote:


>It seems that Wordpad handles UTF16 correctly, but not UTF8. If you
>do "Save As Unicode" from Notepad, Wordpad can read it, but there are some
>unexpected changes of font.

If you put a Byte Order Mark
(http://en.wikipedia.org/wiki/Byte_order_mark) at the front, Wordpad will
probably do just fine. It has no context to go on, so it's having to sniff
out the encoding and guess based on the first bit of the file.

As Julian has said, this is one of several reasons why just saying "make
the current .txt format UTF8 and stop" is not an adequate solution to the
problems at hand.  HTML and XML have ways of declaring their encoding
definitively inside the file format, so processors don't have to guess.

-- 
Joe Hildebrand




More information about the rfc-interest mailing list