[rfc-i] Unicode or UTF-8
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Thu Mar 29 04:17:05 PDT 2012
On 2012/03/29 1:47, Dave Thaler wrote:
> Iljitsch van Beijnum writes:
> [...]
>> If we want to go beyond ASCII, UTF-8 is a no-brainer, because there is no
>> difference between a file that is in US ASCII and a file that is in UTF-8 but just
>> happens to have no code points> 127
> [...]
>
> Not entirely true. A file that is in UTF-8 may start with a 3-byte BOM (EF BB BF)
> that identifies it as being encoded in UTF-8.
It's used by Notepad, so if there's a significant constituency of people
who use Notepad to create there Internet Drafts, we'll have to accept
it. But I'd suggest we try to reject it first, and if that's not
practical, we chop it off when we get the files to the server.
The alternative would be to say we consistently use it to distinguish
plain old ASCII .txt files and the new ones, but I would bet that the
havoc it creates with general tooling is not worth the advantage of the
distinction.
Regards, Martin.
More information about the rfc-interest
mailing list