[rfc-i] v3imp #4 Ruby text

Sean Leonard dev+ietf at seantek.com
Wed Jan 28 10:01:39 PST 2015


On 1/25/2015 12:09 AM, "Martin J. Dürst" wrote:
> [...]Sean to give an actual example of where he thinks ruby are needed 
> in IDs/RFCs.

For those not familiar, "ruby characters" are small, annotative glosses 
that can be placed above or to the right of other characters. These 
annotations are used as pronunciation guides for characters that are 
likely to be unfamiliar to the reader. 
<http://en.wikipedia.org/wiki/Ruby_character> The term "furigana" (振り 
仮名) is used in Japan. Since this construct is used for pronunciation 
guides, its use for IPA phonetic symbols or other characters has become 
increasingly common.

Pictorial examples are attached.

Unicode uses U+FFF9 U+FFFA U+FFFB to delineate interlinear annotation. 
Interestingly, ISO/IEC 6429 (ECMA-48) also defines the ANSI escape code 
PARALLEL TEXTS (PTX) {CSI \} along with six parameter values for the 
same purpose.

As an international organization (IETF) and document series (RFC 
Editor), the premise that non-US-ASCII codes should be included has been 
widely accepted, even though the prose specification text is still 
limited to (US-)English. Since this series attracts document authors 
from all over the world, educating others on appropriate pronunciation 
is a reasonable goal.

Let's look at draft-flanagan-nonascii-03 and match up:

Section 3.1.
"Where the use of non-ASCII characters is purely as part of an example 
and not otherwise required for correct protocol operation, escaping the 
Unicode character is not required."

If an example includes Unicode characters, such as {<artwork>SIP/2.0 200 
= 2**3 * 5**2 но сто девяносто девять - простое</artwork>}, one should 
expect interlinear annotation. Furthermore, if the example is provided 
in the spec-text, such as {<t>blah blah blah <tt>SIP/2.0 200 = 2**3 * 
5**2 но сто девяносто девять - простое</tt> is a sample response.</t>}, 
interlinear annotation should be anticipated as well.

Section 3.2. Authors, Contributors, and Acknowledgements

"Person names may appear in several places within an RFC.  In all cases, valid Unicode is required.  For names that include non-ASCII characters, an author-provided, ASCII-only identifier is required to assist in search and indexing of the document."


An example is given in what would be the <abstract><t> block. Overall, 
we should anticipate that authors with names with unusual pronunciations 
will want the affordance of stating how they want their names to be 
pronounced.

In the example in draft-flanagan-nonascii 陈智昌 (William Chan), the 
US-ASCII name is the author's chosen English name--it does not reflect 
his preferred pronunciation in Chinese or other languages (e.g., 
Japanese and Chinese names can use the same characters, but have 
markedly different pronunciations). To this day I know of several 
prominent security implementers who still refer to him as "Russ 
HOOOS-lee" or "Russ HOSE-lee". I also know of a prominent physician in 
the Los Angeles area whose name is Phúc. Well guess what, if you don't 
speak Vietnamese and try to pronounce her name, you are in for a serious 
/faux pas/. (The right way to pronounce this name, by the way, is like 
FOOK.)

Section 3.3. Company Names

Same thing as Section 3.2. I am sure that a lot of employees of Huawei 
are quite sick of having to explain how to pronounce their corporate name.

Section 3.4: Body of the document
"When the mention of non-ASCII characters is required for correct 
protocol operation and understanding, the characters' Unicode character 
name or code point MUST be included in the text. [...] Use of the actual 
UTF-8 character (e.g., Δ) is encouraged so that a reader can more easily 
see what the character is, if their device can render the text."

Well obviously if the protocol operation makes use of the interlinear 
annotation characters, you are encouraged to put those in. The same goes 
for all general-purpose Unicode control characters.

3.5. Tables

Tables are interesting because an alternate way to include ruby is to 
stuff the characters into adjacent cells--the ruby on the top cell and 
the main text on the bottom cell. This throws off the semantic intent of 
a table-based layout if you have data that you want to annotate.

3.7. Bibliographic text

Bibliographic text can contain names, so all the name stuff above 
applies. Frequently there may be names without widely accepted US-ASCII 
equivalents, so picking any particular US-ASCII-based name will make it 
more difficult to search and index the reference. Titles of works and 
such can also have ruby, such as ONE PIECE(ワンピース)(to take an 
offhand non-sequitur example).

---

An example of an I-D where ruby would have been extremely useful is 
draft-duerst-ruby-01 <http://tools.ietf.org/id/draft-duerst-ruby-01.txt>.

Another example is found in the desired standardized pronunciation of 
certain acronyms, such as URL / URI / URN. To this day most people in 
the real world call it url, as in "duke of earl" 
<https://www.google.com/search?q=duke+of+url>. Furthermore uri ("ur-ee" 
or "yur-ee") and urn (as in, the container for human remains) are quite 
common. Why is it that URI must be spelled out, but most people 
pronounce PKIX as "PEE - kicks"?

Nothing in RFC 3986 or RFC 2141 preclude these pronunciations, yet 
certain individuals in this organization insist on correcting folks when 
they say uree instead of YOU - ARE - EYE. Look, people are going to say 
what they say. /təˈmeɪtoʊ/, /təˈmɑːtəʊ/. If this issue is so important 
that someone wants to make a normative or suggestive statement, put it 
in ruby at the first instance of the acronym:

     /juː ɑ(ɹ) aɪ/
The URI is a Uniform Resource Identifier…

Furthermore, the use of interlinear annotation markers /assists/ search 
and index engines, because it is possible to search through the RFC (or 
any other) series to extract such annotated terms of art. Having 
dedicated interlinear annotation markers also helps with UI affordances: 
for example, a term annotated with IPA could have a text-to-speech 
generator with a little speaker icon, to help users know how to 
pronounce the thing.

Sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tokyo.gif
Type: image/gif
Size: 1836 bytes
Desc: not available
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20150128/6504ede0/attachment-0003.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: beijing.gif
Type: image/gif
Size: 1871 bytes
Desc: not available
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20150128/6504ede0/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vertical-layout.gif
Type: image/gif
Size: 2033 bytes
Desc: not available
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20150128/6504ede0/attachment-0005.gif>


More information about the rfc-interest mailing list