[rfc-i] Comments on draft-flanagan-nonascii-03

Brian E Carpenter brian.e.carpenter at gmail.com
Wed Oct 29 19:46:47 PDT 2014


I feel that *one* occurrence in a document of, say,
Patrik Fältström (ASCII: Patrik Faltstrom) would be helpful,
given that we plan to have a residual ASCII format but in
the Unicode formats we will use the correct version.
Patrik only has one name that is misspelt in ASCII.

Repetition is certainly not needed.

William Chan (陈智昌) is certainly a different case, since the
document will primarily be in Latin characters and the
CJK form acts as a gloss on his "Western" name. He really
does have two different names. I believe 陈 is the family name
and 智昌 is not pronounced "William".

Regards
   Brian

On 29/10/2014 20:19, Martin J. Dürst wrote:
> Hello Heather, others,
> 
> 
> Some comments on draft-flanagan-nonascii-03.
> 
> Note: I may have made some of these comments on earlier versions, but I
> hope I can explain them better this time.
> 
> Looking at the examples in section 3.2, I see things like:
>                                              P. Fältström (P. Faltstrom)
>                                                            Tele2/Swipnet
> and
>    PROPOSED/NEW: The following people contributed significant text to
>    early versions of this draft: Patrik Fältström (Patrik Faltstrom),
>    陈智昌 (William Chan), and Fred Baker.
> 
> The repetition of almost the same text (Fältström and Faltstrom) is
> highly annoying. Those not used to accents, umlauts, and diacritics will
> mostly just see the same thing twice. Those used to the above will of
> course find the second instance totally superfluous. So in effect it
> serves nobody.
> 
> It is also not something I have seen in any serious publication. Both
> ACM and IEEE, and I'm sure many other serious publications, would just
> use Fältström. Same for books and the like, and for publications from
> organizations such as W3C,...
> 
> At the best, people will find it annoying. At the worst, we will become
> laughing stock, everybody joking how the IETF, despite a lot of efforts,
> still cannot wean itself off its beloved ASCII :-).
> 
> The only reason I have heard for this was that the document should still
> be searchable with ASCII-only. Search engines can deal with accents
> without problems, lo let's call this the 'grep on a plane' use case.
> What I propose is that for XML and HTML, we hide the ASCII somewhere
> where it's still searchable (e.g. in an attribute). For plain text,
> maybe we can create an Appendix that only contains ASCII texts so that
> they get caught with grep. Human beings can ignore it.
> 
> 
> As a separate issue, I'd suggest that in the Acknowledgments example,
> instead of "陈智昌 (William Chan)", we use "William Chan (陈智昌)",
> because this way, the text flows better. This is also what the W3C or
> people such as Don Knuth use. Of course if somebody like William prefers
> "陈智昌 (William Chan)", we can allow that, but it shouldn't be what we
> suggest.
> 
> 
> On 2014/10/29 06:28, Heather Flanagan (RFC Series Editor) wrote:
> 
>> On 10/28/14 4:43 PM, Nico Williams wrote:
>>> On Mon, Oct 27, 2014 at 8:45 PM, "Martin J. Dürst"
>>> <duerst at it.aoyama.ac.jp> wrote:
> 
>>>> Then there is the issue of different languages having different font
>>>> preferences. The most widely known example is Chinese vs. Japanese.
>> Although
>>>> the characters are the same, and legible either way, the fonts that the
>>>> Chinese are used to and the fonts that the Japanese are used to are
>>>> quite
>>>> different. Showing a Japanese text with a standard Chinese font feels
>> weird,
>>>> and the same the other way round. That's why XML and HTML have
>> xml:lang/lang
>>>> attributes, and we should make sure they are usable. [...]
> 
>>> This is probably a problem that I bet the RSE will have to deal with.
>>> I'm glad you pointed it out.
> 
> There is a very small example in the nonascii draft:
> (http://tools.ietf.org/html/draft-flanagan-nonascii-03#section-3.2)
> 
>    "Patrik Fältström (Patrik Faltstrom), 陈智昌 (William Chan), and Fred
> Baker."
> 
> The attached .gif shows how this appears in Firefox on Windows. It's
> very much the same on IE and Safari, but not on Chrome or Opera (old or
> new). It's also visible for me here when writing this mail (Thunderbird,
> so not surprising that it's similar to Firefox).
> 
> What's the problem? (Please check the .gif; the mileage in your browser
> may vary.) The first character in William's name (his family name) is
> shown in a different font (thinner and with serifs) than the last two
> characters (his given name, thicker and no serifs). It would be similar
> to having your family name written e.g. in Times Roman and your given
> name in a bold Helvetica.
> 
> Why does this happen? I'm on a Japanese OS, and that makes (some)
> browsers try to prefer Japanese fonts over Chinese ones, but they have
> different coverage. William's family name isn't used in Japanese, and so
> isn't covered by most Japanese fonts. To the rendering falls back to a
> Chinese font, where the character is available. But these fonts differ
> in style.
> 
> When I enclosed William's Chinese name in <span lang='zh'>  </span>
> (or <span lang='zh-Hans'>  </span>; zh is the code for Chinese, zh-Hans
> is the code for Simplified Chinese), the browser used a Chinese font
> from the start and the problem went away. So I strongly suggest that we
> allow the xml:lang attribute on all relevant elements (we probably
> already do), and make sure that we recommend its use in particular for
> names (and other text) in Chinese and Japanese and other languages that
> might benefit from it for glyph selection.
> 
> 
> Regards,   Martin.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> rfc-interest mailing list
> rfc-interest at rfc-editor.org
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest



More information about the rfc-interest mailing list