[rfc-i] How lack of Unicode support in IDs is detrimental to design

Phillip Hallam-Baker hallam at gmail.com
Mon Jul 30 09:50:15 PDT 2012


Just to be clear, the ASCII PIN was generated from running code, the
Cyrilic one was not, I have not yet got round to writing that. In fact
having review of the type of incompatibility issue that was thrown up
is precisely the point I am making.


My point was that if people implement the spec by implementing the
examples and the examples only use the ASCII subset of UTF8, an
implementation that assumes that everything is ASCII only will pass
that test. That in turn will end up creating an interoperability
issue.

My test of the quality of a document is not whether it will work on
obsolete equipment. My test of a document is how effective it is at
communicating the ideas embodied in it.


90% of real world implementers work from the examples. 95% of managers
check the code by looking at the examples. Not being able to express
the necessary test vectors for the UI is a defect.

Octet streams do not help because users never input octet streams.
They input characters in a printable alphabet.



On Sun, Jul 29, 2012 at 11:53 PM, "Martin J. Dürst"
<duerst at it.aoyama.ac.jp> wrote:
> Hello Phil,
>
>
> On 2012/07/28 6:01, Phillip Hallam-Baker wrote:
>>
>> +1
>>
>> And in point of fact, the specification does specify both the PIN as
>> it would be presented to the human user and the UTF8 byte code.
>>
>> But it is rather difficult to write a spec that says the bytecode
>> value of "АБВГ  Д-ЕЖЅZ-З" is [21 01 21 03 ...] without being able to
>> give the PIN code in the form that it is intended to be presented to
>> the human user. Just giving the bytecode says absolutely nothing of
>> value as the spec already says that the bytes are fed into the MAC
>> function.
>
>
> I of course very much agree with you that it's very good to have non-ASCII
> examples if that's part of the functionality of the spec.
>
> However, the example above, in my eyes, has some problems. I just copied the
> string and threw it into http://rishida.net/tools/conversion/
> (which I highly recommend to hex aficionado such as Martin Rex).
>
> First, there is a tab character rather than a space.
>
> Second, while АБВГДЕЖЗ are the eight first letter of the (Russian) Cyrillic
> alphabet, "Ѕ" and "Z" look strange. It turns out that "Ѕ" is Cyrillic, but
> only used in very few languages these days (see
> http://en.wikipedia.org/wiki/Dze).
>
> The "Z", however, doesn't occur anywhere is the Cyrillic Unicode block
> (http://www.unicode.org/charts/PDF/U0400.pdf), nor in Cyrillic Extended A
> (http://www.unicode.org/charts/PDF/U2DE0.pdf) or Cyrillic Extended B
> (http://www.unicode.org/charts/PDF/UA640.pdf). It is plain and simply an
> ASCII "Z".
>
> So unless there's some very specific reason for the above irregularities (in
> which case they should be documented), I suggest that the example be fixed.
>
> Regards,   Martin.



-- 
Website: http://hallambaker.com/


More information about the rfc-interest mailing list