[rfc-i] How lack of Unicode support in IDs is detrimental to design

Martin Rex mrex at sap.com
Fri Jul 27 14:04:13 PDT 2012

Andrew Sullivan wrote:
> But Phill was talking about a case where he is planning to put UTF-8
> _in the protocol_, and make that UTF-8 significant in the protocol;

UTF8 is a sequence of octets, and the most appropriate form to put
that in a spec is to use a hex dump, sometimes accompanied by a
textual description.

> and he was quite correctly pointing out that providing zero examples
> to show how this could happen is an excellent way to ensure that some
> underpaid contract programmer with half an attention span will
> implement the protocol in the future without handling the UTF-8
> encoded protocol fields, and then there'll be an interoperability
> problem.

That is a complete misunderstanding of technology.

100% of ALL programmers (that includes the professional ones),
will be completely unable to enter on their keyboards 90% of the
Unicode glyphs that Phil might be tempted to use.  And from those
few lucky ones that are able to enter it, >50% might find themselves
unable to put it into their implementation, e.g. because from the zoo
of platforms that a C89 source code is compiled on, only the ASCII
subset of characters can be used inside code statements.

Really, Phil's example may apply to a very very rare subset of
occasions when someone wants to submit a spec for visual UI design
as for publication as an I-D.  Frankly, I don't expect there to
be a lot of documents that needs this, and for the more exotic
examples that Phil might be dreaming of, he probably will not be
able to type the glyphs on his very own keyboard himself -- but
instead will have to enter the Unicode codepoint.

Having to type &lt; instead of the vanilla ASCII char "<", on the
other hand, looks like a significant step backwards, and may
affect a much larger number of documents and occurrences.


More information about the rfc-interest mailing list