[rfc-i] New version: draft-hoffman-utf8-rfcs-04.txt
Julian Reschke
julian.reschke at gmx.de
Tue Nov 4 07:49:27 PST 2008
Joe Touch wrote:
> - From actually *reading* 2046 ;-)
>
> 4.1.2. Charset Parameter
>
> A critical parameter that may be specified in the Content-Type field
> for "text/plain" data is the character set. This is specified with a
> "charset" parameter, as in:
>
> Content-type: text/plain; charset=iso-8859-1
>
> Unlike some other parameter values, the values of the charset
> parameter are NOT case sensitive. The default character set, which
> must be assumed in the absence of a charset parameter, is US-ASCII.
text/plain is only ASCII in absence of additional encoding information,
and also only if no other information is available from the transport
protocol, see
<http://greenbytes.de/tech/webdav/rfc2616.html#missing.charset>.
>>> UTF-8 creates the problem by deliberately overloading text/plain to
>>> also mean UTF-8.
>> First time I hear that funny theory. Whether it is plain text or not
>> is completely independant of the character set in use.
>
> Yes. And the default charset for plain/text *is* US-ASCII.
There are circumstances where this is true (for instance, I assume, when
transported as an email attachment).
How is this relevant, though?
When the documents are transported using a MIME type, *of course* the
encoding should be specified as UTF-8, so it really doesn't matter what
the default is.
The problem is that after saving to a local file, the mime type
information is lost (including the encoding information), so what
happens after that solely depends on the operating system's treatment of
text files (which, at least for WinXP, has nothing to do what RFC2046
says about text/plain).
BR, Julian
More information about the rfc-interest
mailing list