[rfc-i] [IAB Trac] #269: Discussion of UTF-8 in RFCs (Section 3.3)

Paul Kyzivat pkyzivat at alum.mit.edu
Wed Feb 27 14:55:13 PST 2013

My basic concern is that we provide sufficient tools to ensure that 
drafts conform to our formatting rules. If the decision about validity 
of UTF-8 usage is specified as being subjective, then we must identify 
who is expected to apply the subjective judgement, and give them 
sufficient tools to do the job. idnits could provide the warnings as 
such a tool, but if there is a high false positive rate, then it will 
increase reviewer burden and make it more likely that errors will be missed.

So anything that can be done to automate more of this checking and 
reduce the false positive rate will be important.

One possibility would be to tag sections in the markup version of the 
draft with metadata describing whether they are normative or not. But 
then idnits would need to use the markup version rather than the display 
version. And tagging sections as normative or not could be burdensome.


On 2/28/13 6:28 AM, Dave Thaler wrote:
> Paul Kyzivat wrote:
>> On 2/28/13 1:39 AM, Heather Flanagan (RFC Series Editor) wrote:
>>> On 2/26/13 11:52 PM, Brian E Carpenter wrote:
>>>> A test case if I may. Is this normative or informative use of UTF-8?
>>>> "UTF-8 strings MUST be allowed (for example, 'smörgåsbord')."
>>>> The first half is clearly normative but the second half, IMHO, isn't.
>>> Correct.
>>>> I'm not trying to be clever here; I am genuinely unsure what your
>>>> text means to me as an author or reviewer.
>>> This is the same type of judgement call made on a regular basis when
>>> it comes to deciding things like references.  I expect most cases will
>>> be obvious, and edge cases will need to be discussed, exactly as they
>>> are today.
>> While I think we need allow this, the subjective nature of the decision means
>> that idnits can't be authoritative about it. It will probably need to flag most
>> instances of UTF-8 as warnings, and in some documents there could be
>> *many* of those. The only clean solution I see to that is to permit UTF-8
>> everywhere.
>> 	Thanks,
>> 	Paul
> Some portions, like the page header, abstract, and references section could
> be omitted from idnits checking for UTF-8.
> For other sections which could have both normative and non-normative
> content, I agree it would be good for idnits to generate warnings
> (much as it does for certain classes of IP addresses whenever they're
> used in a doc).
> I disagree that permitting UTF-8 everywhere is "clean" :)
> -Dave

More information about the rfc-interest mailing list