User Tools

Site Tools


design:utf-8

If RFCs-to-be come in with internationalized characters (e.g., in name, postal address, example), the expectatin is to ask the i18n directorate for review assistance. If that is not possible, either because the review team has closed and/or is unable to complete a review, the first step will be to reach out to the Apps ADs for assistance, and if they have no guidance or is unresponsive, send a request for assistance to the rfc-interest mailing list.

Some discussion of requirements, goals, and desires around non-ASCII characters are at https://www.rfc-editor.org/rse/wiki/doku.php?id=formatreq. The table below summarizes a taxonomy of cases where (non-ASCII) UTF-8 might or might not be allowed, along with some thoughts. The intent is that each row represents a separate policy decision.

We expect that any case we mark “Yes” for Consensus will also require providing an ASCII-only transliteration unless we explicitly note otherwise.

Case Section Use Consensus Comments
1a (title page) Author name Yes Answer should match (6a)
1b (title page) Author affiliation Yes Answer should match (6b)
1c (title page) Document title No?
2 Abstract Prose No? Could be same as (3e), but Abstracts may also be separately compiled into other indices so could have a different answer ​
3a Body or Appendix Example string Yes E.g. fictional person name, IRI, EAI, domain name, etc. Currently there's no XML markup to denote example strings, so hard to distinguish from (3c)
3b Body or Appendix Code snippet No? “Code” does not mean examples here, but actual grammar like ABNF, C, etc.
3c Body or Appendix Literal protocol element Yes? Required transliteration should use U+xxxx syntax
3d Body or Appendix Document title of a cited document Yes Answer should match (4c)
3e Body or Appendix Prose No e.g. use of “naïve” in http://tools.ietf.org/html/rfc4690#section-1.5.5
3f Body or Appendix Section title No?
4a References Author name Yes Answer should match (1a)
4b References Author affiliation Yes Answer should match (1b)
4c References Document title Yes Not necessarily an RFC that's being referenced
4d References Document IRI No
5a Acknowledgements Person name Yes
5b Acknowledgements Organization name Yes
6a Authors Addresses Author name Yes
6b Authors Addresses Author affiliation Yes
6c Authors Addresses Author email address (EAI) No?
6d Authors Addresses Author IRI No
6e Authors Addresses Author postal address Yes
7a (page footer) Author surname Yes?
7b (page header) Abbreviated document name No?
8a (metadata) Keywords Yes

Other open questions:

  1. Where UTF-8 is allowed, what normalization form(s) are ok? (NFC, NFD, NFKC, or NFKD)
    1. Paul thinks: irrelevant. Characters are characters.
  2. Can you reference an external document that contains a non-ascii title/author/etc.? If you need a transliteration, where do you get it from?
    1. Paul thinks: Yes, definitely. This is important for non-ASCII author names. Transliteration can be guessed at by the RFC author.
    2. Heather: I would rather the author provide the transliteration. The RFC Editor shouldn't be guessing on behalf of the authors.

Strawman requirements (beyond those listed at https://www.rfc-editor.org/rse/wiki/doku.php?id=design:start):

  1. An implementer must be able to implement the specification without any confusion or ambiguity introduced by the use of UTF-8 rather than ASCII.
  2. Must be able to reference (cite) the document from elsewhere in a standard way, including from documents that only support ASCII.
  3. Must be able to reference (cite) other documents in an unambiguous way.
  4. Cross-references (including references to other documents) must be unambiguous even from a printed document.
  5. Must be able to index the document in various ways, so searching by keyword, author name, etc. can work.
  6. All documents will be UTF-8 encoded and MUST apply Normalization Form C to all metadata fields such as document name, authors, and references unless a specific exception is granted by the RSE. The body of the document MAY contain other normalization forms as declared necessary by the authors. Non-ASCII characters are only allowed in Author names and contact information, examples, and references. Author names will also require an ASCII representation to encourage broader indexing.

Strawman principles (similar to RFC 6912 approach):

  1. If something could affect interoperability or would block an implementer from being able to implement, any use of UTF-8 must be accompanied by an ASCII transliteration. The transliteration must be called out in a way to make it clear it is a transliteration (instead of just a part of the original item).
  2. Do not assume that any non-ASCII character will necessarily be rendered correctly (or at all)
design/utf-8.txt · Last modified: 2019/10/07 19:10 by rsewikiadmin