[rfc-i] v3imp #4 Ruby text

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Sun Jan 25 00:09:57 PST 2015

On 2015/01/24 01:39, Tony Hansen wrote:

> On 1/23/15 4:04 AM, Sean Leonard wrote:

>> Support can be in markup form {<ruby> <rt> <rp> -- see HTML5} or by
>> supporting the raw Unicode code points {U+FFF9 U+FFFA U+FFFB}.
>> Personally I think the Unicode code points are sufficient for the
>> canonical format; a formatter can convert these codes into appropriate
>> markup (e.g., HTML5 <ruby>). However as our own Martin J. Dürst is the
>> co-author of UTR-20
>> <http://www.unicode.org/reports/tr20/#Interlinear>, the markup
>> position may win out.

> Since V3 is Unicode-based, and Unicode is allowed in all the places Sean
> seems to be concerned about, I think this is covered.

Are you trying to say that because somebody can use U+FFF9 through 
U+FFFB in their ID or RFC, these will magically appear as Ruby in the 
output? Such an assumption would be wrong. These characters are intended 
mostly for internal use (they got introduced because MS Word used these 
codepoints internally for their ruby implementation and later MS 
realized that they would be in trouble if the codepoints got officially 
allocated to something else). Please read the relevant text (p. 823,... 
of the Unicode Standard, Version 7.0.0, 
http://www.unicode.org/versions/Unicode7.0.0/ch23.pdf). I have included 
the most relevant pieces for this discussion below in the P.S.

> This only thing
> I'm not sure about is the use of ruby in artwork -- would that just be
> ruby annotations on words found within the artwork, or is Sean thinking
> it would be used in another fashion? If the former, I think it's okay.

If we are using SVG in artwork, and we need something that looks like 
ruby, it would probably be okay to place the ruby as a separate piece of 
text above the base text. Same with ASCII art.

> The only real question I have is whether there needs to be an explicit
> statement about ruby being supported or not.

Maybe a good idea. I'm still looking for Sean to give an actual example 
of where he thinks ruby are needed in IDs/RFCs.

Regards,   Martin.

Excerpts from Unicode Stardand:

Annotation Characters: U+FFF9–U+FFFB

An interlinear annotation consists of annotating text that is related to 
a sequence of annotated
characters. For all regular editing and text-processing algorithms, the 
annotated characters
are treated as part of the text stream. The annotating text is also part 
of the content,
but for all or some text processing, it does not form part of the main 
text stream. However,
within the annotating text, characters are accessible to the same kind 
of layout, text-processing,
and editing algorithms as the base text. The annotation characters 
delimit the
annotating and the annotated text, and identify them as part of an 
annotation. See
Figure 23-4.

The annotation characters are used in internal processing when 
out-of-band information is
associated with a character stream, very similarly to the usage of 
U+FFFC object replacement
character. However, unlike the opaque objects hidden by the latter 
character, the
annotation itself is textual.


Use in Plain Text. Usage of the annotation characters in plain text 
interchange is strongly
discouraged without prior agreement between the sender and the receiver, 
because the content
may be misinterpreted otherwise. Simply filtering out the annotation 
characters on
input will produce an unreadable result or, even worse, an opposite 
meaning. On input, a
plain text receiver should either preserve all characters or remove the 
interlinear annotation
characters as well as the annotating text included between the 
interlinear annotation
separator and the interlinear annotation terminator.

When an output for plain text usage is desired but the receiver is 
unknown to the sender,
these interlinear annotation characters should be removed as well as the 
annotating text
included between the interlinear annotation separator and the 
interlinear annotation

This restriction does not preclude the use of annotation characters in 
plain text interchange,
but it requires a prior agreement between the sender and the receiver 
for correct
interpretation of the annotations.

More information about the rfc-interest mailing list