[rfc-i] Unicode or UTF-8

Joe Touch touch at isi.edu
Thu Mar 29 12:15:21 PDT 2012



On 3/29/2012 4:17 AM, "Martin J. Dürst" wrote:
> On 2012/03/29 1:47, Dave Thaler wrote:
...
>> Not entirely true. A file that is in UTF-8 may start with a 3-byte BOM
>> (EF BB BF)
>> that identifies it as being encoded in UTF-8.
>
> It's used by Notepad, so if there's a significant constituency of people
> who use Notepad to create there Internet Drafts, we'll have to accept
> it. But I'd suggest we try to reject it first, and if that's not
> practical, we chop it off when we get the files to the server.

Regardless of what our tools generate, it's not too bad to include some 
sort of post-processing at the ID submission site that does minor 
post-processing - e.g., dropping the BOM, or adding it for that matter.

I.e., whether the BOM is put there isn't as important as whether it's 
expected by readers. That affects the variants we post for viewing on 
different devices.

Ultimately, this too has no relationship to the archival format, IMO.

Joe


More information about the rfc-interest mailing list