User Tools

Site Tools


design:utf-8

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
design:utf-8 [2013/09/04 19:32]
dthaler
design:utf-8 [2019/10/07 12:10] (current)
rsewikiadmin
Line 1: Line 1:
-Some discussion of requirements, goals, and desires around non-ASCII characters are on the [[formatreq]] page.+If RFCs-to-be come in with internationalized characters (e.g., in name, postal address, example), the expectatin is to ask the i18n directorate for review assistance. If that is not possible, either because the review team has closed and/or is unable to complete a review, the first step will be to reach out to the Apps ADs for assistance, and if they have no guidance or is unresponsive, send a request for assistance to the rfc-interest mailing list.  
 + 
 + 
 +Some discussion of requirements, goals, and desires around non-ASCII characters are at https://www.rfc-editor.org/rse/wiki/doku.php?id=formatreq.
 The table below summarizes a taxonomy of cases where (non-ASCII) UTF-8 might or might not be allowed, along with some thoughts. The intent is that each row represents a separate policy decision. The table below summarizes a taxonomy of cases where (non-ASCII) UTF-8 might or might not be allowed, along with some thoughts. The intent is that each row represents a separate policy decision.
  
Line 5: Line 8:
  
 ^ Case ^ Section ^ Use ^ Consensus ^ Comments ^ ^ Case ^ Section ^ Use ^ Consensus ^ Comments ^
-| 1a | (title page) | Author name | Yes| Answer should match (6a) | +| 1a | (title page) | Author name | Yes | Answer should match (6a) | 
-| 1b | (title page) | Author affiliation | | Answer should match (6b) | +| 1b | (title page) | Author affiliation | Yes | Answer should match (6b) | 
-| 1c | (title page) | Document title | | | +| 1c | (title page) | Document title | No? | | 
-| 2 | Abstract | | | | +| 2 | Abstract | Prose No? Could be same as (3e), but Abstracts may also be separately compiled into other indices so could have a different answer ​
-| 3a | Body or Appendix | Example string | | E.g. fictional person name, IRI, EAI, domain name, etc. Currently there's no XML markup to denote example strings, so hard to distinguish from (3c) | +| 3a | Body or Appendix | Example string | Yes | E.g. fictional person name, IRI, EAI, domain name, etc. Currently there's no XML markup to denote example strings, so hard to distinguish from (3c) | 
-| 3b | Body or Appendix | Code snippet | | | +| 3b | Body or Appendix | Code snippet | No? "Code" does not mean examples here, but actual grammar like ABNF, C, etc. 
-| 3c | Body or Appendix | Literal protocol element | | Transliteration should use U+xxxx syntax | +| 3c | Body or Appendix | Literal protocol element | Yes? Required transliteration should use U+xxxx syntax | 
-| 3d | Body or Appendix | Document title of a cited document | | Answer should match (4c) | +| 3d | Body or Appendix | Document title of a cited document | Yes | Answer should match (4c) | 
-| 3e | Body or Appendix | Prose | No| e.g. use of "naïve" in http://tools.ietf.org/html/rfc4690#section-1.5.5 | +| 3e | Body or Appendix | Prose | No | e.g. use of "naïve" in http://tools.ietf.org/html/rfc4690#section-1.5.5 
-| 4a | References | Author name | Yes| Answer should match (1a) | +| 3f | Body or Appendix | Section title | No? | 
-| 4b | References | Author affiliation | | Answer should match (1b) | +| 4a | References | Author name | Yes | Answer should match (1a) | 
-| 4c | References | Document title | | Not necessarily an RFC that's being referenced | +| 4b | References | Author affiliation | Yes | Answer should match (1b) | 
-| 4d | References | Document IRI | No| | +| 4c | References | Document title | Yes | Not necessarily an RFC that's being referenced | 
-| 5a | Acknowledgements | Person name | Yes| | +| 4d | References | Document IRI | No | | 
-| 5b | Acknowledgements | Organization name | | | +| 5a | Acknowledgements | Person name | Yes | | 
-| 6a | Authors Addresses | Author name | Yes| |  +| 5b | Acknowledgements | Organization name | Yes | | 
-| 6b | Authors Addresses | Author affiliation | | |      | +| 6a | Authors Addresses | Author name | Yes | |  
-| 6c | Authors Addresses | Author email address (EAI) | | | +| 6b | Authors Addresses | Author affiliation | Yes | |      | 
-6c | Authors Addresses | Author IRI | No| | +| 6c | Authors Addresses | Author email address (EAI) | No? | | 
-| 7a | (page footer) | Author surname | | | +6d | Authors Addresses | Author IRI | No | | 
-| 7b | (page footer) | Abbreviated document name | | |+| 6e | Authors Addresses | Author postal address | Yes | | 
 +| 7a | (page footer) | Author surname | Yes? | | 
 +| 7b | (page header) | Abbreviated document name | No? | | 
 +| 8a | (metadata) | Keywords | Yes | |
 | |
 +
 +Other open questions:
 +  - Where UTF-8 is allowed, what normalization form(s) are ok? (NFC, NFD, NFKC, or NFKD)
 +    - //Paul thinks//: irrelevant. Characters are characters.
 +  - Can you reference an external document that contains a non-ascii title/author/etc.? If you need a transliteration, where do you get it from?
 +    - //Paul thinks//: Yes, definitely. This is important for non-ASCII author names. Transliteration can be guessed at by the RFC author.
 +    - //Heather//: I would rather the author provide the transliteration. The RFC Editor shouldn't be guessing on behalf of the authors.
 +  - ...
 +
 +Strawman requirements (beyond those listed at https://www.rfc-editor.org/rse/wiki/doku.php?id=design:start):
 +  - An implementer must be able to implement the specification without any confusion or ambiguity introduced by the use of UTF-8 rather than ASCII.
 +  - Must be able to reference (cite) the document from elsewhere in a standard way, including from documents that only support ASCII.
 +  - Must be able to reference (cite) other documents in an unambiguous way.
 +  - Cross-references (including references to other documents) must be unambiguous even from a printed document.
 +  - Must be able to index the document in various ways, so searching by keyword, author name, etc. can work.
 +  - All documents will be UTF-8 encoded and MUST apply Normalization Form C to all metadata fields such as document name, authors, and references unless a specific exception is granted by the RSE.  The body of the document MAY contain other normalization forms as declared necessary by the authors.  Non-ASCII characters are only allowed in Author names and contact information, examples, and references.  Author names will also require an ASCII representation to encourage  broader indexing.
 +
 +Strawman principles (similar to RFC 6912 approach):
 +  - If something could affect interoperability or would block an implementer from being able to implement, any use of UTF-8 must be accompanied by an ASCII transliteration. The transliteration must be called out in a way to make it clear it is a transliteration (instead of just a part of the original item).
 +  - Do not assume that any non-ASCII character will necessarily be rendered correctly (or at all)
 +  - ...
design/utf-8.1378348332.txt.gz · Last modified: 2013/09/04 19:32 by dthaler