[rfc-i] Unicode or UTF-8

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Thu Mar 29 04:17:05 PDT 2012


On 2012/03/29 1:47, Dave Thaler wrote:
> Iljitsch van Beijnum writes:
> [...]
>> If we want to go beyond ASCII, UTF-8 is a no-brainer, because there is no
>> difference between a file that is in US ASCII and a file that is in UTF-8 but just
>> happens to have no code points>  127
> [...]
>
> Not entirely true.  A file that is in UTF-8 may start with a 3-byte BOM (EF BB BF)
> that identifies it as being encoded in UTF-8.

It's used by Notepad, so if there's a significant constituency of people 
who use Notepad to create there Internet Drafts, we'll have to accept 
it. But I'd suggest we try to reject it first, and if that's not 
practical, we chop it off when we get the files to the server.

The alternative would be to say we consistently use it to distinguish 
plain old ASCII .txt files and the new ones, but I would bet that the 
havoc it creates with general tooling is not worth the advantage of the 
distinction.

Regards,   Martin.


More information about the rfc-interest mailing list