[rfc-i] RFCs vs US-ASCII

Роман Донченко DXDragon at yandex.ru
Wed Oct 6 11:07:55 PDT 2010


Bjoern Hoehrmann <derhoermi at gmx.net> писал в своём письме Wed, 06 Oct 2010  
19:40:00 +0400:

> * Julian Reschke wrote:
>> I was just made aware of
>>
>>   http://www.rfc-editor.org/rfc/rfc2557.txt
>>
>> which has at least one instance of a non-ASCII character (É).
>>
>> Are there more?
>
> It would appear so,
>
>   % grep -lrP "[\x80-\xff]" *
>   rfc1305.txt
>   rfc2166.txt
>   rfc2302.txt
>   rfc2497.txt
>   rfc2557.txt
>   rfc2708.txt
>   rfc2875.txt
>
> For instance, the combination 0x9f and 0xf7 seems to be used for quote
> marks. A quick check with iconv does not suggest a particular encoding.

Here's what I was able to decipher:

rfc1305.txt: ± (U+00B1 PLUS-MINUS SIGN) and ↑ (U+2191 UPWARDS ARROW),  
encoding unknown.

rfc2166.txt: “ (U+201C LEFT DOUBLE QUOTATION MARK) and ” (U+201D RIGHT  
DOUBLE QUOTATION MARK), encoded with windows-1252.

rfc2302.txt: I didn't find any non-ASCII characters in this one.

rfc2497.txt: SHY (U+00AD SOFT HYPHEN), encoded with ISO-8859-1.

rfc2557.txt: É (U+00C9 LATIN CAPITAL LETTER E WITH ACUTE), encoded with  
ISO-8859-1.

rfc2708.txt: Some kind of apostrophe (0xC6), encoding unknown. Used to be  
’ (U+2019 RIGHT SINGLE QUOTATION MARK) in  
draft-ietf-printmib-job-protomap-02.txt.

rfc2875.txt: Some kind of apostrophe (0xC6) and quotes (0xF4 and 0xF6),  
encoding unknown (but the same as in rfc2708.txt). Used to be ordinary  
ASCII apostrophe and quotation marks in draft-ietf-pkix-dhpop-02.txt.

Others:

rfc64.txt: µ (U+00B5 MICRO SIGN), encoded with ISO-8859-1.

rfc101.txt, rfc177.txt, rfc178.txt, rfc182.txt, rfc227.txt, rfc234.txt,  
rfc235.txt, rfc243.txt, rfc270.txt, rfc282.txt, rfc288.txt, rfc290.txt,  
rfc292.txt, rfc303.txt: é (U+00E9 LATIN SMALL LETTER E WITH ACUTE),  
encoded with ISO-8859-1 (in the RFC Online attribution notice).

rfc237.txt, rfc306.txt, rfc307.txt, rfc310.txt, rfc313.txt, rfc315.txt,  
rfc316.txt, rfc317.txt, rfc323.txt, rfc327.txt, rfc367.txt, rfc369.txt: é  
(U+00E9 LATIN SMALL LETTER E WITH ACUTE) and è (U+00E8 LATIN SMALL LETTER  
E WITH GRAVE), encoded with ISO-8859-1 (in the RFC Online attribution  
notice).

rfc441.txt: what looks like NBSP (U+00A0 NO-BREAK SPACE), encoded with  
ISO-8859-1, as well as é and è in the RFC Online attribution notice.

Hope this helps,
Roman.



More information about the rfc-interest mailing list