[rfc-i] UTF-8 and Unicode examples
Henning Schulzrinne
hgs at cs.columbia.edu
Thu May 6 05:54:09 PDT 2004
The proposal I made earlier can also address these cases. It's not as
elegant as having 'real' UTF-8 or Unicode in the document, but a simple
translation application can easily convert characters into the real
thing without breaking the ASCII-only restrictions in TXT documents.
I agree that the xml2rfc folks should think about how to more
intelligently capture non-ASCII characters. It is easy to translate them
into the ASCII rendition, but it preserves them for some future time
where adding non-ASCII representation is reasonable at least for some
output formats. Is the XML charset mechanism sufficient here?
Julian Reschke wrote:
> Alex Rousskov wrote:
>
>> What I meant is that in my experience, HTML output quality already
>> does NOT imply comparable TXT output quality. Great HTML-looking RFCs
>> already often do not look good in TXT. The genie of output "interface"
>> quality is out of the bottle as far as xml2rfc is concerned.
>
>
> Well, thatnks for saying that clearly; maybe the discussion should be
> about HTML/XML vs TXT and not about UTF8 in TXT.
>
> Anyway, I'm aware of at least two use cases for non-ASCII characters in
> documents:
>
> 1) Contact Info
>
> Although the document is written in English, it will contain contact and
> related information for people in all kinds of countries; and these
> people will frequently have *names* or *adresses* containing these
> characters. Forcing them to translate to plain ASCII without giving them
> a chance to at least *also* supply the correct name seems to be rude. So
> it would be a good thing if xml2rfc would accept non-ASCII characters
> inside author information, as long there'd be a mandatory additional
> field that contains the "best effort" ASCII representation.
>
> But of course this is jusr cosmetic.
>
>
> 2) Protocol Information
>
> Protocols already have to deal with non-ASCII characters; but not
> allowing them inside the spec makes it hard to discuss these issues
> (such as: if I have character "Ä" inside a file name, how would I create
> a file URL: for that). It's possible to work around these issues, but it
> would make specs much more readable by explicitly allowing *a few*
> non-ASCII characters for usage in protocol examples. Actually, one
> single non-ASCII character (specially selected for these cases) may be
> enough.
>
>
> Any other use cases?
>
>
> Best regards, Julian
>
>
More information about the rfc-interest
mailing list