[rfc-i] Comments on draft-flanagan-nonascii-03 (was: Re: Font selection for RFCs)

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Wed Oct 29 00:19:15 PDT 2014


Hello Heather, others,


Some comments on draft-flanagan-nonascii-03.

Note: I may have made some of these comments on earlier versions, but I 
hope I can explain them better this time.

Looking at the examples in section 3.2, I see things like:
                                              P. Fältström (P. Faltstrom)
                                                            Tele2/Swipnet
and
    PROPOSED/NEW: The following people contributed significant text to
    early versions of this draft: Patrik Fältström (Patrik Faltstrom),
    陈智昌 (William Chan), and Fred Baker.

The repetition of almost the same text (Fältström and Faltstrom) is 
highly annoying. Those not used to accents, umlauts, and diacritics will 
mostly just see the same thing twice. Those used to the above will of 
course find the second instance totally superfluous. So in effect it 
serves nobody.

It is also not something I have seen in any serious publication. Both 
ACM and IEEE, and I'm sure many other serious publications, would just 
use Fältström. Same for books and the like, and for publications from 
organizations such as W3C,...

At the best, people will find it annoying. At the worst, we will become 
laughing stock, everybody joking how the IETF, despite a lot of efforts, 
still cannot wean itself off its beloved ASCII :-).

The only reason I have heard for this was that the document should still 
be searchable with ASCII-only. Search engines can deal with accents 
without problems, lo let's call this the 'grep on a plane' use case. 
What I propose is that for XML and HTML, we hide the ASCII somewhere 
where it's still searchable (e.g. in an attribute). For plain text, 
maybe we can create an Appendix that only contains ASCII texts so that 
they get caught with grep. Human beings can ignore it.


As a separate issue, I'd suggest that in the Acknowledgments example, 
instead of "陈智昌 (William Chan)", we use "William Chan (陈智昌)", 
because this way, the text flows better. This is also what the W3C or 
people such as Don Knuth use. Of course if somebody like William prefers 
"陈智昌 (William Chan)", we can allow that, but it shouldn't be what we 
suggest.


On 2014/10/29 06:28, Heather Flanagan (RFC Series Editor) wrote:

> On 10/28/14 4:43 PM, Nico Williams wrote:
>> On Mon, Oct 27, 2014 at 8:45 PM, "Martin J. Dürst"
>> <duerst at it.aoyama.ac.jp> wrote:

>>> Then there is the issue of different languages having different font
>>> preferences. The most widely known example is Chinese vs. Japanese.
> Although
>>> the characters are the same, and legible either way, the fonts that the
>>> Chinese are used to and the fonts that the Japanese are used to are quite
>>> different. Showing a Japanese text with a standard Chinese font feels
> weird,
>>> and the same the other way round. That's why XML and HTML have
> xml:lang/lang
>>> attributes, and we should make sure they are usable. [...]

>> This is probably a problem that I bet the RSE will have to deal with.
>> I'm glad you pointed it out.

There is a very small example in the nonascii draft:
(http://tools.ietf.org/html/draft-flanagan-nonascii-03#section-3.2)

    "Patrik Fältström (Patrik Faltstrom), 陈智昌 (William Chan), and 
Fred Baker."

The attached .gif shows how this appears in Firefox on Windows. It's 
very much the same on IE and Safari, but not on Chrome or Opera (old or 
new). It's also visible for me here when writing this mail (Thunderbird, 
so not surprising that it's similar to Firefox).

What's the problem? (Please check the .gif; the mileage in your browser 
may vary.) The first character in William's name (his family name) is 
shown in a different font (thinner and with serifs) than the last two 
characters (his given name, thicker and no serifs). It would be similar 
to having your family name written e.g. in Times Roman and your given 
name in a bold Helvetica.

Why does this happen? I'm on a Japanese OS, and that makes (some) 
browsers try to prefer Japanese fonts over Chinese ones, but they have 
different coverage. William's family name isn't used in Japanese, and so 
isn't covered by most Japanese fonts. To the rendering falls back to a 
Chinese font, where the character is available. But these fonts differ 
in style.

When I enclosed William's Chinese name in <span lang='zh'>  </span>
(or <span lang='zh-Hans'>  </span>; zh is the code for Chinese, zh-Hans 
is the code for Simplified Chinese), the browser used a Chinese font 
from the start and the problem went away. So I strongly suggest that we 
allow the xml:lang attribute on all relevant elements (we probably 
already do), and make sure that we recommend its use in particular for 
names (and other text) in Chinese and Japanese and other languages that 
might benefit from it for glyph selection.


Regards,   Martin.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: RFCnonASCII.gif
Type: image/gif
Size: 34204 bytes
Desc: not available
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20141029/86e6dd84/attachment-0001.gif>


More information about the rfc-interest mailing list