If RFCs-to-be come in with internationalized characters (e.g., in name, postal address, example), the expectatin is to ask the i18n directorate for review assistance. If that is not possible, either because the review team has closed and/or is unable to complete a review, the first step will be to reach out to the Apps ADs for assistance, and if they have no guidance or is unresponsive, send a request for assistance to the rfc-interest mailing list. Some discussion of requirements, goals, and desires around non-ASCII characters are at https://www.rfc-editor.org/rse/wiki/doku.php?id=formatreq. The table below summarizes a taxonomy of cases where (non-ASCII) UTF-8 might or might not be allowed, along with some thoughts. The intent is that each row represents a separate policy decision. We expect that any case we mark "Yes" for Consensus will also require providing an ASCII-only transliteration unless we explicitly note otherwise. ^ Case ^ Section ^ Use ^ Consensus ^ Comments ^ | 1a | (title page) | Author name | Yes | Answer should match (6a) | | 1b | (title page) | Author affiliation | Yes | Answer should match (6b) | | 1c | (title page) | Document title | No? | | | 2 | Abstract | Prose | No? | Could be same as (3e), but Abstracts may also be separately compiled into other indices so could have a different answer ​| | 3a | Body or Appendix | Example string | Yes | E.g. fictional person name, IRI, EAI, domain name, etc. Currently there's no XML markup to denote example strings, so hard to distinguish from (3c) | | 3b | Body or Appendix | Code snippet | No? | "Code" does not mean examples here, but actual grammar like ABNF, C, etc. | | 3c | Body or Appendix | Literal protocol element | Yes? | Required transliteration should use U+xxxx syntax | | 3d | Body or Appendix | Document title of a cited document | Yes | Answer should match (4c) | | 3e | Body or Appendix | Prose | No | e.g. use of "naïve" in http://tools.ietf.org/html/rfc4690#section-1.5.5 | | 3f | Body or Appendix | Section title | No? | | | 4a | References | Author name | Yes | Answer should match (1a) | | 4b | References | Author affiliation | Yes | Answer should match (1b) | | 4c | References | Document title | Yes | Not necessarily an RFC that's being referenced | | 4d | References | Document IRI | No | | | 5a | Acknowledgements | Person name | Yes | | | 5b | Acknowledgements | Organization name | Yes | | | 6a | Authors Addresses | Author name | Yes | | | 6b | Authors Addresses | Author affiliation | Yes | | | | 6c | Authors Addresses | Author email address (EAI) | No? | | | 6d | Authors Addresses | Author IRI | No | | | 6e | Authors Addresses | Author postal address | Yes | | | 7a | (page footer) | Author surname | Yes? | | | 7b | (page header) | Abbreviated document name | No? | | | 8a | (metadata) | Keywords | Yes | | | Other open questions: - Where UTF-8 is allowed, what normalization form(s) are ok? (NFC, NFD, NFKC, or NFKD) - //Paul thinks//: irrelevant. Characters are characters. - Can you reference an external document that contains a non-ascii title/author/etc.? If you need a transliteration, where do you get it from? - //Paul thinks//: Yes, definitely. This is important for non-ASCII author names. Transliteration can be guessed at by the RFC author. - //Heather//: I would rather the author provide the transliteration. The RFC Editor shouldn't be guessing on behalf of the authors. - ... Strawman requirements (beyond those listed at https://www.rfc-editor.org/rse/wiki/doku.php?id=design:start): - An implementer must be able to implement the specification without any confusion or ambiguity introduced by the use of UTF-8 rather than ASCII. - Must be able to reference (cite) the document from elsewhere in a standard way, including from documents that only support ASCII. - Must be able to reference (cite) other documents in an unambiguous way. - Cross-references (including references to other documents) must be unambiguous even from a printed document. - Must be able to index the document in various ways, so searching by keyword, author name, etc. can work. - All documents will be UTF-8 encoded and MUST apply Normalization Form C to all metadata fields such as document name, authors, and references unless a specific exception is granted by the RSE. The body of the document MAY contain other normalization forms as declared necessary by the authors. Non-ASCII characters are only allowed in Author names and contact information, examples, and references. Author names will also require an ASCII representation to encourage broader indexing. Strawman principles (similar to RFC 6912 approach): - If something could affect interoperability or would block an implementer from being able to implement, any use of UTF-8 must be accompanied by an ASCII transliteration. The transliteration must be called out in a way to make it clear it is a transliteration (instead of just a part of the original item). - Do not assume that any non-ASCII character will necessarily be rendered correctly (or at all) - ...