[rfc-i] RFCs vs US-ASCII
Роман Донченко
DXDragon at yandex.ru
Wed Oct 6 11:07:55 PDT 2010
Bjoern Hoehrmann <derhoermi at gmx.net> писал в своём письме Wed, 06 Oct 2010
19:40:00 +0400:
> * Julian Reschke wrote:
>> I was just made aware of
>>
>> http://www.rfc-editor.org/rfc/rfc2557.txt
>>
>> which has at least one instance of a non-ASCII character (É).
>>
>> Are there more?
>
> It would appear so,
>
> % grep -lrP "[\x80-\xff]" *
> rfc1305.txt
> rfc2166.txt
> rfc2302.txt
> rfc2497.txt
> rfc2557.txt
> rfc2708.txt
> rfc2875.txt
>
> For instance, the combination 0x9f and 0xf7 seems to be used for quote
> marks. A quick check with iconv does not suggest a particular encoding.
Here's what I was able to decipher:
rfc1305.txt: ± (U+00B1 PLUS-MINUS SIGN) and ↑ (U+2191 UPWARDS ARROW),
encoding unknown.
rfc2166.txt: “ (U+201C LEFT DOUBLE QUOTATION MARK) and ” (U+201D RIGHT
DOUBLE QUOTATION MARK), encoded with windows-1252.
rfc2302.txt: I didn't find any non-ASCII characters in this one.
rfc2497.txt: SHY (U+00AD SOFT HYPHEN), encoded with ISO-8859-1.
rfc2557.txt: É (U+00C9 LATIN CAPITAL LETTER E WITH ACUTE), encoded with
ISO-8859-1.
rfc2708.txt: Some kind of apostrophe (0xC6), encoding unknown. Used to be
’ (U+2019 RIGHT SINGLE QUOTATION MARK) in
draft-ietf-printmib-job-protomap-02.txt.
rfc2875.txt: Some kind of apostrophe (0xC6) and quotes (0xF4 and 0xF6),
encoding unknown (but the same as in rfc2708.txt). Used to be ordinary
ASCII apostrophe and quotation marks in draft-ietf-pkix-dhpop-02.txt.
Others:
rfc64.txt: µ (U+00B5 MICRO SIGN), encoded with ISO-8859-1.
rfc101.txt, rfc177.txt, rfc178.txt, rfc182.txt, rfc227.txt, rfc234.txt,
rfc235.txt, rfc243.txt, rfc270.txt, rfc282.txt, rfc288.txt, rfc290.txt,
rfc292.txt, rfc303.txt: é (U+00E9 LATIN SMALL LETTER E WITH ACUTE),
encoded with ISO-8859-1 (in the RFC Online attribution notice).
rfc237.txt, rfc306.txt, rfc307.txt, rfc310.txt, rfc313.txt, rfc315.txt,
rfc316.txt, rfc317.txt, rfc323.txt, rfc327.txt, rfc367.txt, rfc369.txt: é
(U+00E9 LATIN SMALL LETTER E WITH ACUTE) and è (U+00E8 LATIN SMALL LETTER
E WITH GRAVE), encoded with ISO-8859-1 (in the RFC Online attribution
notice).
rfc441.txt: what looks like NBSP (U+00A0 NO-BREAK SPACE), encoded with
ISO-8859-1, as well as é and è in the RFC Online attribution notice.
Hope this helps,
Roman.
More information about the rfc-interest
mailing list