[rfc-i] Draft: Representation of Unicode and UTF-8 characters

Henning Schulzrinne hgs at cs.columbia.edu
Sat May 15 11:23:06 PDT 2004

> FWIW, I would suggest to make the above more generic to avoid an
> implication that "names in acknowledgments and protocol examples" are
> the only use cases. I am also not sure what "abstract representation"
> is. How about something along these lines:
> 	IETF documents may need to represent Unicode characters while
> 	obeying the US-ASCII encoding rules. Unicode use cases include
> 	protocol examples and human names in acknowledgments. To
> 	represent Unicode characters in IETF documents, authors should
> 	use the	following conventions:


>>- UTF-8 strings enumerate the bytes as uppercase hexadecimal digits in
>>angled brackets, e.g., <C2 A9> for the <U+00A9> (copyright) character.
> Is space the only correct delimiter here? Why is it inconsistent with
> comma delimiter used above?

Both should probably use spaces - saves space. I misread the Unicode 

> To reduce the number of violations, should conflicts be restricted to
> cases where the whole <> sequence matches the pattern and not just the
> prefix?


>>Documents may choose a different convention, but then need to
>>explain the notation.
> To make this work reliably, especially with automated tools, we need a
> blob of text that authors can include to indicate they _are_ following
> the above convention. RFCs without the blob would be assumed not to
> follow the convention (by default). Otherwise, it might not be clear
> whether an RFC is unaware of the above convention or knowingly uses
> it!
> An alternative is mandatory usage enforced by RFC Editor, but I am
> guessing we do not want to go that far.

I suspect the more common case is that these will be xml2rfc drafts, 
where the author can turn on the relevant flag for appropriate HTML 
rendering. I don't think that translating plain-text RFCs into ASCII 
with a sprinkling of UTF-8/Unicode is all that helpful. For example, for 
protocol examples, I will likely want to see the underlying codes, to 
simply generating test examples from that.

I think there is a danger of over-engineering here...

> $0.02,
> Alex.

More information about the rfc-interest mailing list