[rfc-i] New version: draft-hoffman-utf8-rfcs-04.txt

Julian Reschke julian.reschke at gmx.de
Tue Nov 4 07:49:27 PST 2008


Joe Touch wrote:
> - From actually *reading* 2046 ;-)
> 
> 4.1.2.  Charset Parameter
> 
>    A critical parameter that may be specified in the Content-Type field
>    for "text/plain" data is the character set.  This is specified with a
>    "charset" parameter, as in:
> 
>      Content-type: text/plain; charset=iso-8859-1
> 
>    Unlike some other parameter values, the values of the charset
>    parameter are NOT case sensitive.  The default character set, which
>    must be assumed in the absence of a charset parameter, is US-ASCII.

text/plain is only ASCII in absence of additional encoding information, 
and also only if no other information is available from the transport 
protocol, see 
<http://greenbytes.de/tech/webdav/rfc2616.html#missing.charset>.

>>> UTF-8 creates the problem by deliberately overloading text/plain to
>>> also mean UTF-8.
>> First time I hear that funny theory. Whether it is plain text or not
>> is completely independant of the character set in use.
> 
> Yes. And the default charset for plain/text *is* US-ASCII.

There are circumstances where this is true (for instance, I assume, when 
transported as an email attachment).

How is this relevant, though?

When the documents are transported using a MIME type, *of course* the 
encoding should be specified as UTF-8, so it really doesn't matter what 
the default is.

The problem is that after saving to a local file, the mime type 
information is lost (including the encoding information), so what 
happens after that solely depends on the operating system's treatment of 
text files (which, at least for WinXP, has nothing to do what RFC2046 
says about text/plain).

BR, Julian


More information about the rfc-interest mailing list