[rfc-i] [IAB Trac] #269: Discussion of UTF-8 in RFCs (Section 3.3)

Nico Williams nico at cryptonector.com
Wed Feb 27 14:45:06 PST 2013

On Wed, Feb 27, 2013 at 4:28 PM, Dave Thaler <dthaler at microsoft.com> wrote:
> Paul Kyzivat wrote:
>> On 2/28/13 1:39 AM, Heather Flanagan (RFC Series Editor) wrote:
>> > On 2/26/13 11:52 PM, Brian E Carpenter wrote:
>> >> A test case if I may. Is this normative or informative use of UTF-8?
>> >>
>> >> "UTF-8 strings MUST be allowed (for example, 'smörgåsbord')."
>> >>
>> >> The first half is clearly normative but the second half, IMHO, isn't.
>> > Correct.
>> >>
>> >> I'm not trying to be clever here; I am genuinely unsure what your
>> >> text means to me as an author or reviewer.
>> > This is the same type of judgement call made on a regular basis when
>> > it comes to deciding things like references.  I expect most cases will
>> > be obvious, and edge cases will need to be discussed, exactly as they
>> > are today.
>> While I think we need allow this, the subjective nature of the decision means
>> that idnits can't be authoritative about it. It will probably need to flag most
>> instances of UTF-8 as warnings, and in some documents there could be
>> *many* of those. The only clean solution I see to that is to permit UTF-8
>> everywhere.
>>       Thanks,
>>       Paul
> Some portions, like the page header, abstract, and references section could
> be omitted from idnits checking for UTF-8.
> For other sections which could have both normative and non-normative
> content, I agree it would be good for idnits to generate warnings
> (much as it does for certain classes of IP addresses whenever they're
> used in a doc).

If idnits worked at the XML level (and that is how it should be for
XML inputs) then we could have markup for normative and informative.
The xml2rfc schema doesn't have such markup now, but it could be
added.  This would allow for even better precision in what we mean
than ever before, and this could even be rendered in a way that
preserves the normative/informative distinction in HMTL, PDF and other
output formats.

Of course, that'd be much harder to achieve for non-XML input formats,
but I don't care about those.

Indeed, idnits is probably much simpler to implement on XML input:
first validate the XML (that it's well-formed and adheres to the
schema), then perform a very few validation steps, like presence of
author metadata, form of the draft handle, ..., and that non-ASCII
appears only where permitted.

> I disagree that permitting UTF-8 everywhere is "clean" :)

I don't mind allowing UTF-8 everywhere, and I don't see why I
shouldn't get to have non-ASCII in examples in I18N-related RFCs.


More information about the rfc-interest mailing list