[rfc-i] Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt
julian.reschke at gmx.de
Sun Oct 19 02:44:32 PDT 2008
it seems we're exhausted from the discussion :-)
I'd like to point out a few key points from the previous threads:
Some of the issues that were discovered with the UTF-8 proposal were:
1) "text/plain; charset=utf-8" doesn't work well once the file is stored
locally in the file system, and the encoding information is lost. The
proper fix for his is to use a UTF-8 BOM, which enables at least the
standard Windows applications (Notepad/Wordpad) to do the right thing.
2) Just because applications can understand UTF-8 doesn't necessarily
mean they can display all characters.
3) It's hard to print text/plain with FF characters indicating form
feeds. Using UTF-8 as encoding doesn't really change this, but it may
reduce the choice of programs that are actually able to do it.
- So far nobody has made a competing proposal based on text/plain that
would work better.
- The file format has no impact on what fonts the reader's operating
system has; thus non-ASCII characters should be used carefully; in
particular, when used where not alternate (all ASCII) form is present,
font availability needs to be considered -- for instance, in protocol
examples, use characters which are likely to be displayable everywhere
(such as characters from latin-1)?
- Displaying text/plain; charset=utf-8 is no problem as long as the
content is served via HTTP and displayed in a browser.
- There is disagreement whether the ability to print the specification
text as-is is important. It *is* possible to provide alternative
versions that can be printed from browsers nicely (actually, this is
- All the problems that were reported could be solved by moving to a
different format, such as a simple subset of text/html (for instance one
<pre> element per page). That would solve the printing problem, but
would introduce new issues for people who do not want to use browsers to
read the spec, and probably with tools that expect no markup in the spec.
As the IETF itself requires all new work to allow non-ASCII characters,
and the UTF-8 spec is a full standard, we really should eat our own dog
food. Therefore, I'd like the UTF-8 proposal to move forward, with the
problems pointed out being fixed (FF currently disallowed), and
potentially requiring the UTF-8 BOM.
More information about the rfc-interest