[rfc-i] UTF-8 and Unicode examples

Henning Schulzrinne hgs at cs.columbia.edu
Thu May 6 05:54:09 PDT 2004


The proposal I made earlier can also address these cases. It's not as 
elegant as having 'real' UTF-8 or Unicode in the document, but a simple 
translation application can easily convert characters into the real 
thing without breaking the ASCII-only restrictions in TXT documents.

I agree that the xml2rfc folks should think about how to more 
intelligently capture non-ASCII characters. It is easy to translate them 
into the ASCII rendition, but it preserves them for some future time 
where adding non-ASCII representation is reasonable at least for some 
output formats. Is the XML charset mechanism sufficient here?

Julian Reschke wrote:

> Alex Rousskov wrote:
> 
>> What I meant is that in my experience, HTML output quality already
>> does NOT imply comparable TXT output quality. Great HTML-looking RFCs
>> already often do not look good in TXT. The genie of output "interface"
>> quality is out of the bottle as far as xml2rfc is concerned.
> 
> 
> Well, thatnks for saying that clearly; maybe the discussion should be 
> about HTML/XML vs TXT and not about UTF8 in TXT.
> 
> Anyway, I'm aware of at least two use cases for non-ASCII characters in 
> documents:
> 
> 1) Contact Info
> 
> Although the document is written in English, it will contain contact and 
> related information for people in all kinds of countries; and these 
> people will frequently have *names* or *adresses* containing these 
> characters. Forcing them to translate to plain ASCII without giving them 
> a chance to at least *also* supply the correct name seems to be rude. So 
> it would be a good thing if xml2rfc would accept non-ASCII characters 
> inside author information, as long there'd be a mandatory additional 
> field that contains the "best effort" ASCII representation.
> 
> But of course this is jusr cosmetic.
> 
> 
> 2) Protocol Information
> 
> Protocols already have to deal with non-ASCII characters; but not 
> allowing them inside the spec makes it hard to discuss these issues 
> (such as: if I have character "Ä" inside a file name, how would I create 
> a file URL: for that). It's possible to work around these issues, but it 
> would make specs much more readable by explicitly allowing *a few* 
> non-ASCII characters for usage in protocol examples. Actually, one 
> single non-ASCII character (specially selected for these cases) may be 
> enough.
> 
> 
> Any other use cases?
> 
> 
> Best regards, Julian
> 
> 



More information about the rfc-interest mailing list