[rfc-i] Pagination requirements

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Tue May 15 22:55:55 PDT 2012


On 2012/05/16 9:07, Martin Rex wrote:
> Peter Saint-Andre wrote:
>>
>> Julian Reschke wrote:
>>>
>>> It's simply extremely hard to explain in a specification how to deal
>>> with non-ASCII characters if you can't use them.
>>>
>>> Read RFC 3987, then please come back and explain why this is not a problem.
>
> Could you point to a specific location in that document where you believe
> that the lack of non-ASCII glyphs is a problem, I don't see one.

Look for all the places where it uses "&#x" (including the first place, 
where it explains why and how this is used).

For another document, please compare 
http://tools.ietf.org/html/draft-duerst-eai-mailto-03 (the ASCII 
version) with 
http://www.sw.it.aoyama.ac.jp/2012/pub/draft-duerst-eai-mailto-03.html 
(HTML with non-ASCII characters). Again, look for "&#x" in the former, 
and for the corresponding text in the later.


>> Yes, more and more specifications in the Apps Area, RAI Area, and even
>> Security Area need to say something about internationalization. They
>> don't all need to include non-ASCII characters, but for those that do
>> this issue needs to be addressed.
>
> I'm pretty sure that there are less than 1 in 200 RFCs where this
> would make sense at all.

The fact that author names also may contain non-ASCII characters would 
significantly change this percentage.


>> P.S. Here's a relevant paragraph from a discussion about visually
>> similar characters in the security considerations of an I-D I'm working
>> on (draft-ietf-precis-framework)...
>>
>>     However, the problem is made more serious by introducing the full
>>     range of Unicode code points into protocol strings.  For example, the
>>     characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the
>>     Cherokee block look similar to the US-ASCII characters "STPETER" as
>>     they might look when presented in a "creative" font.
>>
>> It would be helpful to include the actual characters, not just the
>> Unicode codepoint numbers:
>
> Helpful?  How?  Your example comes out as garbled noise here:

In that case, what about getting a newer mail user agent?

>>     However, the problem is made more serious by introducing the full
>>     range of Unicode code points into protocol strings.  For example, the
>>     characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the
>>     Cherokee block, i.e., "á\217\232á\216¢á\216µá\217\213á\216¢á\217\213á\217\222", look similar to the US-ASCII
>>     characters "STPETER" as they might look when presented in a
>>     "creative" font.
>
> I absolutely do not see a need to "demonstrate" this,

For almost everybody, there is a huge difference between being told 
something as a theoretical fact, and having it right in front of their 
eyes. Maybe that's different for you, but then you are definitely the 
exception.

> and in particular,
> the majority of fonts does not even have glyphs for such codepoints and
> a lot of software does not have capabilities to display them anyway.

First, please note that the above just shows garbage, and isn't a font 
problem, but a problem of character decoding/interpretation.

For the fonts, you have a point. But decent software uses fallback fonts 
in such cases. And most software these days indeed has the capabilities 
to display these characters (they are in the BMP, and don't use 
combining marks or reordering or any other stuff that would need fancy 
rendering).

> When creating a html document with your example codepoints:
>
>      ASCII:     STPETER<p>
>      Cherokee:&#x13DA;&#x13A2;&#x13B5;&#x13AC;&#x13A2;&#x13AC;&#x13D2;<p>
>
> I get displayed empty square boxes in MSIE and square boxes with tiny
> xdigits inside in FF 3.6.

The newest version of FireFox is at least 12.0. Maybe you should upgrade 
(if for nothing else, then for security reasons).

> Maybe you should have used glyphs from a less
> exotic codepage than Cherokee, e.g. Cyrillic (but it's lacking the R):

Well, you could use Я (&#x42F;).


>      Cyrillic:&#x0405;&#x0422;&#x0420;&#x0415;&#x0422;&#x0415;R<p>
>
> Anyway, I consider it completely unnecessary trying to demonstrate (and fail)
> that there exist different unicode codepoints that have similar glyphs.
> It is perfectly sufficient to state that this is the case and leave all
> the rest to the Unicode SDO.

See above.

Regards,   Martin.


More information about the rfc-interest mailing list