[rfc-i] Data point [Re: Fwd:I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]

Joe Touch touch at ISI.EDU
Mon Oct 6 22:49:27 PDT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This conversation convinces me that - as with the US election debates -
people see what they already believe.

Dave and I see that tools we expect to work and use daily fail on UTF-8.

You see tools you expect to work and use daily succeed on UTF-8.

At some point, if a change is made, everyone is going to have to go
around debugging everything, which is admittedly what we continue to do
for things we need to work (e.g., formfeed).

I can see some workarounds for various things, and some things won't
work anymore, period. Most of this we can eventually adapt to.

The key missing component appears to be formfeed, though. We have extant
examples of ways to print RFCs that understand FFs. Does anyone have one
that understands unicode FFs? And if we don't, is there a path forward?

(yes, I can generate them in Wordpad, but it doesn't interpret them
correctly, like it does with ASCII FFs - which can be seen in print preview)

Joe

Martin Duerst wrote:
> Hello Dave,
> 
> At 09:01 08/10/07, Dave CROCKER wrote:
>>
>> Brian E Carpenter wrote:
>>> Thanks. My results on Windows XP:
>>
>> Brian,
>>
>> Thanks for conducting such an extensive and pragmatic test sequence.  It is 
>> exactly these sorts of combinatorial toss-and-eat activities for which the core 
>> representation of RFCs have been famously robust.  I've understood that 
>> robustness as being a continuing requirement.
> 
> Well, yes, but as Brian showed, just too good to be true.
> 
> The main issues are line endings and the infamous form feed.
> As Brian showed, the traditional US-ASCII stuff is
> essentially unreadable in Notepad.
> 
> Also, there are quite a number of cases where the FF confuses
> some printers and some software.
> 
> So the conclusion is probably that current RFCs are famously robust
> with the tools we always used for them because they worked.
> Similar things very much apply to the new proposal!
> 
> Also, I think there are various failure modes. For ASCII-only,
> we had:
> 
> a) Something went wrong, and it's obvious that something went wrong.
> 
> b) Something went wrong, but we never noticed until it's too late.
> 
> With the new proposal, we potentially add another:
> 
> c) Something went wrong, and those who care will notice.
> 
> Clearly, most of the failures for current ASCII are in the a)
> category; not having much in b) is a kind of low-level but
> important robustness. The new addition of c) applies to
> mangled characters and the like. Essentially, correct rendering
> of e.g. Chinese is important for people who can read Chinese.
> And those people will notice that something when wrong when
> it went wrong.
> 
> Regards,    Martin.
> 
>> By my reading of your results, your test demonstrates that raw UTF-8 produces 
>> unpredictable and/or undesirable outcomes with common tools.
>>
>> Hence it fails the requirement.
>>
>> d/
>> -- 
>>
>>   Dave Crocker
>>   Brandenburg InternetWorking
>>   bbiw.net
>> _______________________________________________
>> rfc-interest mailing list
>> rfc-interest at rfc-editor.org
>> http://mailman.rfc-editor.org/mailman/listinfo/rfc-interest
> 
> 
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     
> 
> _______________________________________________
> rfc-interest mailing list
> rfc-interest at rfc-editor.org
> http://mailman.rfc-editor.org/mailman/listinfo/rfc-interest
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjq+GYACgkQE5f5cImnZruXlgCgiG6Nv3QKcbs+ZBdgeAqn7gwK
3AUAoNYWaOp7D0tdm912yTSilZ375i+/
=k87v
-----END PGP SIGNATURE-----


More information about the rfc-interest mailing list