[rfc-i] Draft: Representation of Unicode and UTF-8 characters
hgs at cs.columbia.edu
Sat May 15 11:23:06 PDT 2004
> FWIW, I would suggest to make the above more generic to avoid an
> implication that "names in acknowledgments and protocol examples" are
> the only use cases. I am also not sure what "abstract representation"
> is. How about something along these lines:
> IETF documents may need to represent Unicode characters while
> obeying the US-ASCII encoding rules. Unicode use cases include
> protocol examples and human names in acknowledgments. To
> represent Unicode characters in IETF documents, authors should
> use the following conventions:
>>- UTF-8 strings enumerate the bytes as uppercase hexadecimal digits in
>>angled brackets, e.g., <C2 A9> for the <U+00A9> (copyright) character.
> Is space the only correct delimiter here? Why is it inconsistent with
> comma delimiter used above?
Both should probably use spaces - saves space. I misread the Unicode
> To reduce the number of violations, should conflicts be restricted to
> cases where the whole <> sequence matches the pattern and not just the
>>Documents may choose a different convention, but then need to
>>explain the notation.
> To make this work reliably, especially with automated tools, we need a
> blob of text that authors can include to indicate they _are_ following
> the above convention. RFCs without the blob would be assumed not to
> follow the convention (by default). Otherwise, it might not be clear
> whether an RFC is unaware of the above convention or knowingly uses
> An alternative is mandatory usage enforced by RFC Editor, but I am
> guessing we do not want to go that far.
I suspect the more common case is that these will be xml2rfc drafts,
where the author can turn on the relevant flag for appropriate HTML
rendering. I don't think that translating plain-text RFCs into ASCII
with a sprinkling of UTF-8/Unicode is all that helpful. For example, for
protocol examples, I will likely want to see the underlying codes, to
simply generating test examples from that.
I think there is a danger of over-engineering here...
More information about the rfc-interest