[rfc-i] Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt
duerst at it.aoyama.ac.jp
Sun Oct 19 17:56:43 PDT 2008
At 06:54 08/10/20, Julian Reschke wrote:
>Tim Bray wrote:
>> On Sun, Oct 19, 2008 at 2:44 AM, Julian Reschke <julian.reschke at gmx.de> wrote:
>>> As the IETF itself requires all new work to allow non-ASCII characters,
>>> and the UTF-8 spec is a full standard, we really should eat our own dog
>>> food. Therefore, I'd like the UTF-8 proposal to move forward, with the
>>> problems pointed out being fixed (FF currently disallowed), and
>>> potentially requiring the UTF-8 BOM.
>> I would be in favor of recommending but requiring a UTF-8 BOM.
>> Requiring it would be quite onerous for some authors, as many popular
>> authoring tools don't generate one.
>We may not have to require it from the submitter, but the posting
>process surely could add the BOM automatically (when not present and the
>content contains non-ASCII characters)...
Very much agreed with Julian here. Be liberal in what you accept,...,
anybody :-? (assuming we go for a BOM, which I have my doubts about)
>I think the more important question is: will the presence of the BOM
>negatively affect any clients?
Yes. Less and less, but there are still clients around that don't
like the BOM.
More fundamentally, the BOM destroys the ASCII-compatibility.
Without the BOM, an UTF-8 file is just a file with some occasional
8-bit data, but otherwise looks exactly like an Internet-Draft or RFC.
With the BOM, there is a discontinuity between ASCII-only and
"contains some UTF-8" documents.
There are three possible outcomes:
- Everything works fine.
- Some strange character(s) printed at the top left of a document.
- Some more serious failure.
Windows applications by now probably don't have problems with a BOM,
because Notepad has been adding it for a long time. But I don't know
about other OSes. What I think also has to be examined closely are
tool chains. A typical RFC starts with a few empty lines, then
"Network Working Group",... This will change, so it's an easy guess
that some regular expressions in tools will break. Also, simple
concatenation is no longer simple.
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the rfc-interest