[rfc-i] Incorrect use of the word "ASCII" in section 3.2

Martin Rex mrex at sap.com
Mon Mar 11 10:21:41 PDT 2013


Paul Hoffman wrote:
>
> Greetings again. In the -03 version of the format requirements document,
> section 3.2, it says:
> 
>       *  The official language of the RFC Series is English.  ASCII is
>          required for all text that must be read to understand or
>          implement the technology described in the RFC.  Use of non-
>          ASCII characters, expressed in a standard Unicode Encoding Form
>          (such as UTF-8), must receive explicit approval from the
>          document stream manager and will be allowed after the rules for
>          the common use cases are defined in the Style Guide.
> 
> The terms "ASCII" and "non-ASCII" are incorrect, given that the second
> sentence talks about encoding forms such as UTF-8.
> 
> My best guess is that what is meant for "ASCII" is "the characters
> U+0021 through U+007E" and for "non-ASCII" it is "characters other than
> U+0021 through U+007E". If that guess is correct, ...

I'm sorry, but I believe you're blowing this out of proportion.

We do have an IETF RFC about ASCII:

  https://tools.ietf.org/html/rfc20

and UTF-8 itself is defined as a superset of US-ASCII, so what you're doing
above lead to an infinite recursion within the definition, not only be
unusual and difficult to comprehend for many.

If anything, a
  s/ASCII/US-ASCII/
should be perfectly sufficient to become consistent with the terminology
of rfc3629. (although I'm not personally aware of anything else called ASCII).

   https://tools.ietf.org/html/rfc3629

              UTF-8, a transformation format of ISO 10646

Abstract

   ISO/IEC 10646-1 defines a large character set called the Universal
   Character Set (UCS) which encompasses most of the world's writing
   systems.  The originally proposed encodings of the UCS, however, were
   not compatible with many current applications and protocols, and this
   has led to the development of UTF-8, the object of this memo.  UTF-8
   has the characteristic of preserving the full US-ASCII range,
   providing compatibility with file systems, parsers and other software
   that rely on US-ASCII values but are transparent to other values.
   This memo obsoletes and replaces RFC 2279.


or at the Introduction (such as last paragraph on the bottom of page 1):

   https://tools.ietf.org/html/rfc3629#page-1

 o  Character numbers from U+0000 to U+007F (US-ASCII repertoire)
    correspond to octets 00 to 7F (7 bit US-ASCII values).  A direct
    consequence is that a plain ASCII string is also a valid UTF-8
    string.


-Martin


More information about the rfc-interest mailing list