Joe Hildebrand (jhildebr)
jhildebr at cisco.com
Mon Jul 23 08:44:13 PDT 2012
On 7/21/12 9:03 PM, ""Martin J. Dürst"" <duerst at it.aoyama.ac.jp> wrote:
>"at the time of publication" doesn't make sense at all. If that's
>removed, then as a policy, it makes ample sense. It also makes sense to
>have some place in the infrastructure check it, just to be sure. But I
>don't think there's a need to check it every time an edit is made.
That makes sense. How about:
"The RFC Editor will make policy as to what codepoints are allowed in
documents published into the RFC stream."
>Btw, the Unicode Consortium (not Forum) doesn't publish anything that
Sorry, sloppy writing on my part.
>uses unassigned codepoints. They use images to talk about potential new
>character. Experimental implementations may occasionally use unassigned
>codepoints, but that's a separate matter.
If we make it a policy choice of the stream owner, then problem solved.
>>Which reminds me: are we ok with non-ASCII characters being represented
>> by their UTF-8 encoding? For those stuck in the previous millennium we
>> could simply require ASCII encoding, and use character references for
>> everything non-ASCII.
>This is not a question of previous millennium or not. I think we should
>not disallow character references, because there are some characters for
>which it's really helpful if they are explicitly visible, think e.g.
> (non-breaking space). But in general, it's much better if the
>characters are visible directly. It's also way, way closer to what you
>get in the final product.
Of course, I misunderstood Julian's point, which upon re-reading was
clear. The tooling that is used to pretty-print needs to know which
character references to emit syntactically. I think is probably
nice, since some other processing tools silently switch U+00A0 to U+0020.
<, >, &, ", and ' are probably important. Everything
else, I'd recommend against, but am open to suggestion.
More information about the rfc-interest