RFC Errata


Errata Search

 
Source of RFC  
Summary Table Full Records

RFC 5890, "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework", August 2010

Source of RFC: idnabis (app)

Errata ID: 4824
Status: Held for Document Update
Type: Technical
Publication Format(s) : TEXT

Reported By: Juan Altmayer Pizzorno
Date Reported: 2016-05-17
Held for Document Update by: Alexey Melnikov
Date Held: 2016-10-07

Section 4.2 says:

Because A-labels (the form actually used in the
DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
in general, more compressed that UTF-16 or UTF-32), U-labels that
obey all of the relevant symmetry (and other) constraints of these
documents may be quite a bit longer, potentially up to 252 characters
(Unicode code points).

It should say:

Because A-labels (the form actually used in the
DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
in general, more compressed that UTF-16 or UTF-32), U-labels that
obey all of the relevant symmetry (and other) constraints of these
documents may be quite a bit longer, potentially up to 59 Unicode
code points, or up to 236 octets.

Notes:

(The same rationale as my report for 2.3.2.1 applies:)

The contents of U-labels are encoded in the up to 59 ASCII characters (see 2.3.2.1)
output by the Punycode algorithm in their corresponding A-labels. The Punycode
decoder (https://tools.ietf.org/html/rfc3492#section-6.2) consumes at least one
of those ASCII characters for each code point inserted into the U-label. An U-label,
thus, can contain at the most 59 Unicode code points.

Since U-labels are defined (in 2.3.2.1) to be expressed in a standard Unicode Encoding
Form, and UTF-32, UTF-16 and UTF-8 (as revised by RFC3629) all can encode a code
point in at most 4 octets, 236 octets is an upper bound for an U-label's length.

I think it should be possible to derive a tighter bound, but its rationale would likely be
less straighforward.

I imagine the number 252 was originally derived by multiplying 63, the maximum
length of an A-label (including the "xn--" prefix), by 4, the maximum number of
octets needed to represent a code point.

Report New Errata



Advanced Search