"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Sat Jul 21 20:03:39 PDT 2012
On 2012/07/22 5:03, Julian Reschke wrote:
> On 2012-07-21 21:02, Joe Hildebrand (jhildebr) wrote:
>>> - "Unicode codepoints that are unassigned at the time of publication
>>> MUST not be used." - not sure why. Early editions of XML 1.0 tried this;
>>> in the end they had to give up and jsut delegate to the Unicode specs.
>> I saw that more as a policy thing to enhance the readability of the doc.
>> If we think we may need to use a codepoint before it is formally adopted
>> (for example, if the Unicode Forum wanted to use this format), then it
>> would make sense to remove the requirement.
"at the time of publication" doesn't make sense at all. If that's
removed, then as a policy, it makes ample sense. It also makes sense to
have some place in the infrastructure check it, just to be sure. But I
don't think there's a need to check it every time an edit is made.
Btw, the Unicode Consortium (not Forum) doesn't publish anything that
uses unassigned codepoints. They use images to talk about potential new
character. Experimental implementations may occasionally use unassigned
codepoints, but that's a separate matter.
> Ack. Just want to make sure that we don't hardwire something into a
> hard-to-change document if we don't have to.
Yes definitely please don't!
> Which reminds me: are we ok with non-ASCII characters being represented
> by their UTF-8 encoding? For those stuck in the previous millennium we
> could simply require ASCII encoding, and use character references for
> everything non-ASCII.
This is not a question of previous millennium or not. I think we should
not disallow character references, because there are some characters for
which it's really helpful if they are explicitly visible, think e.g.
(non-breaking space). But in general, it's much better if the
characters are visible directly. It's also way, way closer to what you
get in the final product.
More information about the rfc-interest