[rfc-i] Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt

Julian Reschke julian.reschke at gmx.de
Sun Oct 19 02:44:32 PDT 2008


it seems we're exhausted from the discussion :-)

I'd like to point out a few key points from the previous threads:

Some of the issues that were discovered with the UTF-8 proposal were:

1) "text/plain; charset=utf-8" doesn't work well once the file is stored 
locally in the file system, and the encoding information is lost. The 
proper fix for his is to use a UTF-8 BOM, which enables at least the 
standard Windows applications (Notepad/Wordpad) to do the right thing.

2) Just because applications can understand UTF-8 doesn't necessarily 
mean they can display all characters.

3) It's hard to print text/plain with FF characters indicating form 
feeds. Using UTF-8 as encoding doesn't really change this, but it may 
reduce the choice of programs that are actually able to do it.

My observations:

- So far nobody has made a competing proposal based on text/plain that 
would work better.

- The file format has no impact on what fonts the reader's operating 
system has; thus non-ASCII characters should be used carefully; in 
particular, when used where not alternate (all ASCII) form is present, 
font availability needs to be considered -- for instance, in protocol 
examples, use characters which are likely to be displayable everywhere 
(such as characters from latin-1)?

- Displaying text/plain; charset=utf-8 is no problem as long as the 
content is served via HTTP and displayed in a browser.

- There is disagreement whether the ability to print the specification 
text as-is is important. It *is* possible to provide alternative 
versions that can be printed from browsers nicely (actually, this is 
already done).

- All the problems that were reported could be solved by moving to a 
different format, such as a simple subset of text/html (for instance one 
<pre> element per page). That would solve the printing problem, but 
would introduce new issues for people who do not want to use browsers to 
read the spec, and probably with tools that expect no markup in the spec.

My proposal:

As the IETF itself requires all new work to allow non-ASCII characters, 
and the UTF-8 spec is a full standard, we really should eat our own dog 
food. Therefore, I'd like the UTF-8 proposal to move forward, with the 
problems pointed out being fixed (FF currently disallowed), and 
potentially requiring the UTF-8 BOM.

BR, Julian

More information about the rfc-interest mailing list