[rfc-i] How lack of Unicode support in IDs is detrimental to design

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Sun Jul 29 23:53:48 PDT 2012


Hello Phil,

On 2012/07/28 6:01, Phillip Hallam-Baker wrote:
> +1
>
> And in point of fact, the specification does specify both the PIN as
> it would be presented to the human user and the UTF8 byte code.
>
> But it is rather difficult to write a spec that says the bytecode
> value of "АБВГ	Д-ЕЖЅZ-З" is [21 01 21 03 ...] without being able to
> give the PIN code in the form that it is intended to be presented to
> the human user. Just giving the bytecode says absolutely nothing of
> value as the spec already says that the bytes are fed into the MAC
> function.

I of course very much agree with you that it's very good to have 
non-ASCII examples if that's part of the functionality of the spec.

However, the example above, in my eyes, has some problems. I just copied 
the string and threw it into http://rishida.net/tools/conversion/
(which I highly recommend to hex aficionado such as Martin Rex).

First, there is a tab character rather than a space.

Second, while АБВГДЕЖЗ are the eight first letter of the (Russian) 
Cyrillic alphabet, "Ѕ" and "Z" look strange. It turns out that "Ѕ" is 
Cyrillic, but only used in very few languages these days (see 
http://en.wikipedia.org/wiki/Dze).

The "Z", however, doesn't occur anywhere is the Cyrillic Unicode block 
(http://www.unicode.org/charts/PDF/U0400.pdf), nor in Cyrillic Extended 
A (http://www.unicode.org/charts/PDF/U2DE0.pdf) or Cyrillic Extended B 
(http://www.unicode.org/charts/PDF/UA640.pdf). It is plain and simply an 
ASCII "Z".

So unless there's some very specific reason for the above irregularities 
(in which case they should be documented), I suggest that the example be 
fixed.

Regards,   Martin.


More information about the rfc-interest mailing list