[rfc-i] Byte order marks
paul.hoffman at vpnc.org
Tue Nov 4 16:20:36 PST 2008
At 12:17 AM +0100 11/5/08, Simon Josefsson wrote:
>Paul Hoffman <paul.hoffman at vpnc.org> writes:
>> Further, I don't see the problem with concatenating. The BOM will
>> either not print (if the display/print device understands that it is a
>> zero-width character) or it will show as a single blurp on a blank
>> line. What is the problem with that?
>That is not a big problem. However, if there is no BOM, you won't have
>any of those two warts.
...and you won't have auto-detection, either. We consider aiding auto-detection to be more valuable than slight visual warts for some current readers being a detriment.
> >>2) BOM was not designed for UTF-8 auto-detection.
>> Um, so? It works fine for UTF-8 auto-detection.
>We disagree. I believe UTF-8 BOM works poorly as an auto-detection
>mechanism since it introduces new complexity that ultimately will lead
>to poor user experiences.
Do you have a better proposed solution? The problem arises with current software, not how we would want software to be developed in the future. If it were the latter, we could just say "software should can the whole file and, if it is valid UTF-8, make that the character set".
> >>3) As far as I can tell, RFC 3629 section 6 says that protocols SHOULD
>>> forbid use of BOM signatures when the data format is specified to be
>>> UTF-8, which seems to hold here. See:
>> Right. But the data format proposed here is not UTF-8. It is "either
>> US-ASCII or UTF-8". That's the whole point.
>Quoting your document:
> The proposed change is to change the encoding of Internet Drafts and
> RFCs to UTF-8.
> Upon publication of this document as an RFC, all existing RFCs and
> Internet Drafts will be considered to be encoded in UTF-8.
Ah, good point. We were a bit too pedantic there. I'll fix that in the next version to make it clear that the encoding is UTF-8 when non-ASCII characters are needed.
--Paul Hoffman, Director
More information about the rfc-interest