[rfc-i] Unicode names in RFCs and xml2rfc

Henrik Levkowetz henrik at levkowetz.com
Wed Dec 4 01:54:50 PST 2019

Hi Martin,

I agree with what you say.  And a different mechanism than <u> is indicated.

I'd love to make it possible to use a <contributor> element with the same
child elements as <author> in text, to achieve rendering of contributors
in the same format as authors, with the same provisions for both ASCII and
non-ASCII content.  However, accomplishing any changes to the schema which
the design team didn't envision has been hard; so hard, actually, that I've
had to step back and mostly stop engaging on the xml2rfc-dev list.



On 2019-12-04 03:31, Martin Thomson wrote:
> I'm reading the code in xml2rfc to work out how it is intended to work and finding it extraordinarily difficult to achieve a relatively modest goal: putting a person's name into the document.
> My requirements are simple: acknowledge contributions using a person's preferred name.  More concretely, I see no value in expanding ø or ü, but I would however like to provide ASCII analogues of the Japanese names in the list.   This goal seems consistent with the text in RFC 7997:
>    Person names may appear in several places within an RFC (e.g., the
>    header, Acknowledgements, and References).  When a script outside the
>    Unicode Latin blocks [UNICODE-CHART] is used for an individual name,
>    an author-provided, ASCII-only identifier will appear immediately
>    after the non-Latin characters, surrounded by parentheses.  This will
>    improve general readability of the text.
> I'm talking about acknowledgments, so the list appears in a <t> element.  The intent is to render the list of names in an ordinary paragraph, with commas separating each.
> None of the elements that permit Unicode text fit in this context.  I realize that I could use <artwork> for this, but that's clearly an abuse of that element; more so because it renders very differently depending on context (I could probably do something with SVG, now that I think of it...).  
> <u> is singularly unsuitable for this purpose.  It insists on - at a minimum - including the U+NNNN notation for every character.  If I could use format="char" or format="char-ascii" it might be acceptable.  Assuming that I have properly understood the code.  The <u> element is not documented in RFC 7991.
> I appreciate the value in having a clear signal from the author that a block of text is intended to include Unicode.  Unicode tends to lead to all sorts of inconvenient inconsistencies, like multiple different dash and hyphen styles, quoting variations, and other such oddities.  I can (grudgingly) accept that some sort of indication is appropriate so that what should be relatively uncommon text usage can be scrutinized additionally.
> It shouldn't be this difficult to acknowledge someone using their name.
> _______________________________________________
> rfc-interest mailing list
> rfc-interest at rfc-editor.org
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20191204/fa62c5ed/attachment-0001.asc>

More information about the rfc-interest mailing list