[rfc-i] Pagination requirements

Martin Rex mrex at sap.com
Wed May 16 00:36:52 PDT 2012

=?UTF-8?B?Ik1hcnRpbiBKLiBEw7xyc3Qi?= wrote:
> >
> > Could you point to a specific location in that document where you believe
> > that the lack of non-ASCII glyphs is a problem, I don't see one.
> Look for all the places where it uses "&#x" (including the first place, 
> where it explains why and how this is used).

The &#x notation is a perfectly reasonable example for using the escapes.
Using the real characters would be extremely counterproductive.

When I write code (I'm using C89), then I will have to always know the
exact codepoints.  What exotic glyphs look like is completely irrelevant,
because pretty much all of the software on all of the machines I'm using
will not be able to visualize them anyway, and the source code needs
to be ASCII as well, so that I compiles on all of the platforms.

> For another document, please compare 
> http://tools.ietf.org/html/draft-duerst-eai-mailto-03 (the ASCII 
> version) with 
> http://www.sw.it.aoyama.ac.jp/2012/pub/draft-duerst-eai-mailto-03.html 
> (HTML with non-ASCII characters). Again, look for "&#x" in the former, 
> and for the corresponding text in the later.
> >> Yes, more and more specifications in the Apps Area, RAI Area, and even
> >> Security Area need to say something about internationalization. They
> >> don't all need to include non-ASCII characters, but for those that do
> >> this issue needs to be addressed.
> >
> > I'm pretty sure that there are less than 1 in 200 RFCs where this
> > would make sense at all.
> The fact that author names also may contain non-ASCII characters would 
> significantly change this percentage.

That is a completely silly argument.

In order to ensure that author names on english language standards
can be inserted, printed, read, and typed into english language
references of that document, supplying the author name in ASCII letters
is imperative.  Using a representation that 90% of the worlds population
would not be able to (a) recognize and (b) type on their keyboard when
being handed a hardcopy printout is entirely useless (for the purpose
of a standard).

> >
> > Helpful?  How?  Your example comes out as garbled noise here:
> In that case, what about getting a newer mail user agent?

I completely fail to the see the purpose of that.
Using the cyrillic example (since none of my software visualizes
the Cherokee stuff), simple using the original Latin-1 letters instead
gives the EXACT same visual result on screen and on paper!  So it would be
pretty silly to NOT use the Latin letters, since they ensure that things
that were intended to look the same _will_ look the same when rendered.

> >>     However, the problem is made more serious by introducing the full
> >>     range of Unicode code points into protocol strings.  For example, the
> >>     characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the
> >>     Cherokee block, i.e., "á\217\232á\216¢á\216µá\217\213á\216¢á\217\213á\217\222", look similar to the US-ASCII
> >>     characters "STPETER" as they might look when presented in a
> >>     "creative" font.
> >
> > I absolutely do not see a need to "demonstrate" this,
> For almost everybody, there is a huge difference between being told 
> something as a theoretical fact, and having it right in front of their 
> eyes. Maybe that's different for you, but then you are definitely the 
> exception.

Completely irrelevant for programmatic processing of unicode codepoints
in protocols and for interop.

> > and in particular,
> > the majority of fonts does not even have glyphs for such codepoints and
> > a lot of software does not have capabilities to display them anyway.
> First, please note that the above just shows garbage, and isn't a font 
> problem, but a problem of character decoding/interpretation.

It is a _font_ problem.

> > When creating a html document with your example codepoints:
> >
> >      ASCII:     STPETER<p>
> >      Cherokee:&#x13DA;&#x13A2;&#x13B5;&#x13AC;&#x13A2;&#x13AC;&#x13D2;<p>
> >
> > I get displayed empty square boxes in MSIE and square boxes with tiny
> > xdigits inside in FF 3.6.
> The newest version of FireFox is at least 12.0. Maybe you should upgrade 
> (if for nothing else, then for security reasons).

FF 12.0 shows THE EXACT SAME output.  square outlines with hexdigits
in them, which gives a _very_strong_hint_ that it is a font problem.
(all my machines are either WinXP or WinXP 64-bit).  MSIE and Outlook
show empty square boxes instead.

> >      Cyrillic:&#x0405;&#x0422;&#x0420;&#x0415;&#x0422;&#x0415;R<p>
> >
> > Anyway, I consider it completely unnecessary trying to demonstrate (and fail)
> > that there exist different unicode codepoints that have similar glyphs.
> > It is perfectly sufficient to state that this is the case and leave all
> > the rest to the Unicode SDO.

While I'm 100% positively certain that ASCII text will be easily
readable and comprehensible in 2000 years from now,  XML, HTML and Unicode
could easily require a Rosetta Stone.  Needless complexity.  Too much
concern about the exterior appearance, ignorance of internal values.
Plastic surgery without medical indication, Nip&Tuck syndrome.


More information about the rfc-interest mailing list