[rfc-i] How lack of Unicode support in IDs is detrimental to design
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Sun Jul 29 23:22:11 PDT 2012
On 2012/07/28 4:39, Martin Rex wrote:
> Phillip Hallam-Baker wrote:
>> I am just writing a draft that describes how to implement a PIN based
>> challenge response.
>> To establish an initial connection to the server, the user presents a
>> PIN value such as CS0F40-30LV09-K000.
>> Now the specification states that the PIN code is in UTF8. It probably
>> does not make any good sense for a French implementation to use accent
>> characters but I would hope that a implementer would have the sense to
>> use the Greek alphabet for a Greek language deployment, Cyrillic for
>> Russian and so on.
> For most developers at the ~ 1 dozen software layers at and on top
> of the network layer (below the UI), this information will be a
> simple series of octets. So for the vast amount of purposes,
> the use of a hex dump is the most appropriate form to provide
> the example in your specification.
When you wrote a spec, did you make sure that all the US-ASCII stuff was
also presented in hex? If not, why not?
It's my experience that most programmers, most of the time, prefer
characters over hex. Programmers can get used to hex, and they use it
when nothing else works, but it's not that they are really keen on it.
With a tiny bit of effort, programmers can also get used to a few
examples of non-ASCII.
>> Not being able to express these ideas in drafts means not being able
>> to communicate them effectively.
> Nope, it means helping the vast amount of developers at the network
> layer and the dozen software layers below the UI being able to
> easily understand your document.
When using UTF-8, there's usually a good reason to do so. To make this
clear, it is very appropriate to use actual characters in examples.
Actual characters should work on command lines and in similar places,
and while they are in some sense "UI", they are really more for
programmers than for the average user.
> It would be a real nightmare (for the document authors and the document
> consumers) if every single document that uses DNS had to deal with
> Unicode to A-label conversion over and over and over again.
> The more reasonable approach is to put all that crap into a very
> small amount of documents, and keep the world straight and simple
> for all the rest.
A more reasonable approach would be to not use "crap" and similar words
in email. For the record, I don't like U-label <-> A-label conversion
either, it would be much easier if the DNS were just UTF-8 throughout,
but that's not where we are at, unfortuately.
[It seems that you are implying that "straight and simple" means "just
use A-labels". In my view, it would be "just use U-labels", however.]
> draft-ietf-dane-protocol-23.txt does not contain any unicode glyphs.
> Adding such examples would make the document worse, because the support
> of internationalized domain names is completely orthogonal to most
> uses of the DNS protocol. Adding examples with real unicode glyphs
> to that document would only create confusion, complexity and problems
> for rendering the document.
More information about the rfc-interest