[rfc-i] Unicode or UTF-8
Joe Touch
touch at isi.edu
Thu Mar 29 12:15:21 PDT 2012
On 3/29/2012 4:17 AM, "Martin J. Dürst" wrote:
> On 2012/03/29 1:47, Dave Thaler wrote:
...
>> Not entirely true. A file that is in UTF-8 may start with a 3-byte BOM
>> (EF BB BF)
>> that identifies it as being encoded in UTF-8.
>
> It's used by Notepad, so if there's a significant constituency of people
> who use Notepad to create there Internet Drafts, we'll have to accept
> it. But I'd suggest we try to reject it first, and if that's not
> practical, we chop it off when we get the files to the server.
Regardless of what our tools generate, it's not too bad to include some
sort of post-processing at the ID submission site that does minor
post-processing - e.g., dropping the BOM, or adding it for that matter.
I.e., whether the BOM is put there isn't as important as whether it's
expected by readers. That affects the variants we post for viewing on
different devices.
Ultimately, this too has no relationship to the archival format, IMO.
Joe
More information about the rfc-interest
mailing list