[rfc-i] Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt

Martin Duerst duerst at it.aoyama.ac.jp
Sun Oct 19 17:56:43 PDT 2008

At 06:54 08/10/20, Julian Reschke wrote:
>Tim Bray wrote:
>> On Sun, Oct 19, 2008 at 2:44 AM, Julian Reschke <julian.reschke at gmx.de> wrote:
>>> As the IETF itself requires all new work to allow non-ASCII characters,
>>> and the UTF-8 spec is a full standard, we really should eat our own dog
>>> food. Therefore, I'd like the UTF-8 proposal to move forward, with the
>>> problems pointed out being fixed (FF currently disallowed), and
>>> potentially requiring the UTF-8 BOM.
>> I would be in favor of recommending but requiring a UTF-8 BOM.
>> Requiring it would be quite onerous for some authors, as many popular
>> authoring tools don't generate one.
>We may not have to require it from the submitter, but the posting 
>process surely could add the BOM automatically (when not present and the 
>content contains non-ASCII characters)...

Very much agreed with Julian here. Be liberal in what you accept,...,
anybody :-? (assuming we go for a BOM, which I have my doubts about)

>I think the more important question is: will the presence of the BOM 
>negatively affect any clients?

Yes. Less and less, but there are still clients around that don't
like the BOM.

More fundamentally, the BOM destroys the ASCII-compatibility.
Without the BOM, an UTF-8 file is just a file with some occasional
8-bit data, but otherwise looks exactly like an Internet-Draft or RFC.
With the BOM, there is a discontinuity between ASCII-only and
"contains some UTF-8" documents. 

There are three possible outcomes:
- Everything works fine.
- Some strange character(s) printed at the top left of a document.
- Some more serious failure.

Windows applications by now probably don't have problems with a BOM,
because Notepad has been adding it for a long time. But I don't know
about other OSes. What I think also has to be examined closely are
tool chains. A typical RFC starts with a few empty lines, then
"Network Working Group",... This will change, so it's an easy guess
that some regular expressions in tools will break. Also, simple
concatenation is no longer simple.

Regards,   Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     

More information about the rfc-interest mailing list