User Tools

Site Tools


design:utf-8

This is an old revision of the document!


Some discussion of requirements, goals, and desires around non-ASCII characters are at https://www.rfc-editor.org/rse/wiki/doku.php?id=formatreq. The table below summarizes a taxonomy of cases where (non-ASCII) UTF-8 might or might not be allowed, along with some thoughts. The intent is that each row represents a separate policy decision.

We expect that any case we mark “Yes” for Consensus will also require providing an ASCII-only transliteration unless we explicitly note otherwise.

Case Section Use Consensus Comments
1a (title page) Author name Yes? Answer should match (6a)
1b (title page) Author affiliation Answer should match (6b)
1c (title page) Document title
2 Abstract Prose Could be same as (3e), but Abstracts may also be separately compiled into other indices so could have a different answer ​
3a Body or Appendix Example string E.g. fictional person name, IRI, EAI, domain name, etc. Currently there's no XML markup to denote example strings, so hard to distinguish from (3c)
3b Body or Appendix Code snippet
3c Body or Appendix Literal protocol element Required transliteration should use U+xxxx syntax
3d Body or Appendix Document title of a cited document Answer should match (4c)
3e Body or Appendix Prose No? e.g. use of “naïve” in http://tools.ietf.org/html/rfc4690#section-1.5.5
3f Body or Appendix Section title
4a References Author name Yes? Answer should match (1a)
4b References Author affiliation Answer should match (1b)
4c References Document title Not necessarily an RFC that's being referenced
4d References Document IRI No?
5a Acknowledgements Person name Yes?
5b Acknowledgements Organization name
6a Authors Addresses Author name Yes?
6b Authors Addresses Author affiliation
6c Authors Addresses Author email address (EAI)
6d Authors Addresses Author IRI No?
6e Authors Addresses Author postal address
7a (page footer) Author surname
7b (page header) Abbreviated document name

A separate question is, where UTF-8 is allowed, what normalization form(s) are ok? (NFC, NFD, NFKC, or NFKD)

Strawman requirements (beyond those listed at https://www.rfc-editor.org/rse/wiki/doku.php?id=design:start):

  1. An implementer must be able to implement the specification without any confusion or ambiguity introduced by the use of UTF-8 rather than ASCII.
  2. Must be able to reference (cite) the document from elsewhere in a standard way, including from documents that only support ASCII.
  3. Must be able to reference (cite) other documents in an unambiguous way.
  4. Cross-references (including references to other documents) must be unambiguous even from a printed document.
  5. Must be able to index the document in various ways, so searching by keyword, author name, etc. can work.

Strawman principles:

  1. If something could affect interoperability or would block an implementer from being able to implement, any use of UTF-8 must be accompanied by an ASCII transliteration.
design/utf-8.1380136920.txt.gz · Last modified: 2013/09/25 12:22 by dthaler