[rfc-i] ASCII isn't English (was: Re: Character sets, was Comments on draft-iab-rfcformat)

Nico Williams nico at cryptonector.com
Wed Dec 19 12:30:15 PST 2012


On Wed, Dec 19, 2012 at 3:44 AM, "Martin J. Dürst"
<duerst at it.aoyama.ac.jp> wrote:
> On 2012/12/19 14:38, Nico Williams wrote:
>> ALSO, it may well be easier to say that it must be possible to
>> machine-render an ASCII-only version of any RFC such that, for
>> example, non-ASCII characters get replaced with corresponding HTML
>> entities (e.g.,&aacute;).  This may not be great for reading, but
>> given the *language rule* it should still suffice for those who cannot
>> display odd scripts or even non-ASCII at all.  And guess what: it is
>> absolutely possible to replace non-ASCII Unicode with, e.g., HTML
>> entities.
>
> ASCII isn't English. This will be relevant when we move to a higher
> typographic quality for RFCs. Given that some people will continue to author
> drafts in MS Word, it will come up almost automatically. In good English
> typography, quotes are not 'abc' or "abc", but‘abc’or “abc”. Except for
> places such as programming text, every decent English book uses these. There
> is other non-ASCII punctuation, such as m-dashes,... where the same applies.
>
> I think we should discuss this rather than ignore it.

I have no problem with non-ASCII quotes, m-dashes, etcetera, where
these can always be trivially mapped onto ASCII equivalents.  I
*prefer* that we be consistent about these so that searching RFC text
can be easier, and since we've been using ASCII only, I prefer ASCII
quotes and such going forward as well.  I think it'd be quite fair to
say that the RFC-Editor, s well as xml2rfc and such tools must convert
such odd characters to ASCII equivalents where possible.

Given English language text we can easily machine-format that text to
be ASCII-only or ASCII-mostly with non-ASCII encoded/displayed in
whatever manner is best for a given display format.

The whole ASCII conversation is starting to bore me.  It's not that
hard, really.  There is no excuse for not being able to render text in
quite a few scripts nowadays, even on ttys, so non-ASCII is not really
an issue.

IMO all we need is a rule saying normative text must be in English in
canonical RFCs, that translations are allowed but non-canonical, and
*maybe* (at most) that the RFC-Editor should strive to constrain RFCs
to characters/scripts that can reasonably be rendered by tools that
render or display RFCs.  If we pick HTML as a canonical *display*
format (but please, not as a canonical *source* format) then the vast
majority of non-ASCII characters that an author might want to include
in an RFC are easily displayed, and with XML as a canonical *source*
format we have the power to render into many alternate formats
including plain ASCII text with whatever encoding one might prefer for
non-ASCII chars (HTML entities, #codepoint, ...).

Nico
-- 

PS: I want normative RFC text to be in English because that's a) the
lingua franca of the Internet community, b) the lingua franca of our
modern world.  I say this as someone for whom English is a third
language.  I have to admit to liking English very much, but even so,
to pick any other language would be to disenfranchise at the very
least 50% of our community.


More information about the rfc-interest mailing list