User Tools

Site Tools


design:utf-8

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revision Both sides next revision
design:utf-8 [2013/09/04 19:26]
dthaler created
design:utf-8 [2013/10/15 15:54]
dthaler
Line 1: Line 1:
-Some discussion of requirements, goals, and desires around non-ASCII characters are on the [[formatreq]] page.+Some discussion of requirements, goals, and desires around non-ASCII characters are at https://www.rfc-editor.org/rse/wiki/doku.php?id=formatreq.
 The table below summarizes a taxonomy of cases where (non-ASCII) UTF-8 might or might not be allowed, along with some thoughts. The intent is that each row represents a separate policy decision. The table below summarizes a taxonomy of cases where (non-ASCII) UTF-8 might or might not be allowed, along with some thoughts. The intent is that each row represents a separate policy decision.
  
Line 5: Line 5:
  
 ^ Case ^ Section ^ Use ^ Consensus ^ Comments ^ ^ Case ^ Section ^ Use ^ Consensus ^ Comments ^
-| 1a | (title page) | Author name | Yes| Answer should match (6a) | +| 1a | (title page) | Author name | Yes | Answer should match (6a) | 
-| 1b | (title page) | Author affiliation | | Answer should match (6b) | +| 1b | (title page) | Author affiliation | Yes | Answer should match (6b) | 
-| 1c | (title page) | Document title | | | +| 1c | (title page) | Document title | No? | | 
-| 2 | Abstract | | | | +| 2 | Abstract | Prose No? Could be same as (3e), but Abstracts may also be separately compiled into other indices so could have a different answer ​
-| 3a | Body or Appendix | Example string | | E.g. fictional person name, IRI, EAI, domain name, etc. Currently there's no XML markup to denote example strings, so hard to distinguish from (3c) | +| 3a | Body or Appendix | Example string | Yes | E.g. fictional person name, IRI, EAI, domain name, etc. Currently there's no XML markup to denote example strings, so hard to distinguish from (3c) | 
-| 3b | Body or Appendix | Code snippet | | | +| 3b | Body or Appendix | Code snippet | No? "Code" does not mean examples here, but actual grammar like ABNF, C, etc. 
-| 3c | Body or Appendix | Literal protocol element | | Transliteration should use U+xxxx syntax | +| 3c | Body or Appendix | Literal protocol element | Yes? Required transliteration should use U+xxxx syntax | 
-| 3d | Body or Appendix | Document title of a cited document | | Answer should match (4c) | +| 3d | Body or Appendix | Document title of a cited document | Yes | Answer should match (4c) | 
-| 3e | Body or Appendix | Prose | No| e.g. use of "naïve" in http://tools.ietf.org/html/rfc4690#section-1.5.5 | +| 3e | Body or Appendix | Prose | No | e.g. use of "naïve" in http://tools.ietf.org/html/rfc4690#section-1.5.5 
-| 4a | References | Author name | Yes| Answer should match (1a) | +| 3f | Body or Appendix | Section title | No? | 
-| 4b | References | Author affiliation | | Answer should match (1b) | +| 4a | References | Author name | Yes | Answer should match (1a) | 
-| 4c | References | Document title | | Not necessarily an RFC that's being referenced | +| 4b | References | Author affiliation | Yes | Answer should match (1b) | 
-| 4d | References | Document IRI | No| | +| 4c | References | Document title | Yes | Not necessarily an RFC that's being referenced | 
-| 5a | Acknowledgements | Person name | Yes| | +| 4d | References | Document IRI | No | | 
-| 5b | Acknowledgements | Organization name | | | +| 5a | Acknowledgements | Person name | Yes | | 
-| 6a | Authors Addresses | Author name | Yes| |  +| 5b | Acknowledgements | Organization name | Yes | | 
-| 6b | Authors Addresses | Author affiliation | | |      | +| 6a | Authors Addresses | Author name | Yes | |  
-| 6c | Authors Addresses | Author email address (EAI) | | | +| 6b | Authors Addresses | Author affiliation | Yes | |      | 
-6c | Authors Addresses | Author IRI | No? | |+| 6c | Authors Addresses | Author email address (EAI) | No? | | 
 +6d | Authors Addresses | Author IRI | No | | 
 +| 6e | Authors Addresses | Author postal address | Yes | | 
 +| 7a | (page footer) | Author surname | Yes| | 
 +| 7b | (page header) | Abbreviated document name | No? | | 
 +| 8a | (metadata) | Keywords | Yes | |
 | |
 +
 +Other open questions:
 +  - Where UTF-8 is allowed, what normalization form(s) are ok? (NFC, NFD, NFKC, or NFKD)
 +    - //Paul thinks//: irrelevant. Characters are characters.
 +  - Can you reference an external document that contains a non-ascii title/author/etc.? If you need a transliteration, where do you get it from?
 +    - //Paul thinks//: Yes, definitely. This is important for non-ASCII author names. Transliteration can be guessed at by the RFC author.
 +    - //Heather//: I would rather the author provide the transliteration. The RFC Editor shouldn't be guessing on behalf of the authors.
 +  - ...
 +
 +Strawman requirements (beyond those listed at https://www.rfc-editor.org/rse/wiki/doku.php?id=design:start):
 +  - An implementer must be able to implement the specification without any confusion or ambiguity introduced by the use of UTF-8 rather than ASCII.
 +  - Must be able to reference (cite) the document from elsewhere in a standard way, including from documents that only support ASCII.
 +  - Must be able to reference (cite) other documents in an unambiguous way.
 +  - Cross-references (including references to other documents) must be unambiguous even from a printed document.
 +  - Must be able to index the document in various ways, so searching by keyword, author name, etc. can work.
 +  - All documents will be UTF-8 encoded and MUST apply Normalization Form C to all metadata fields such as document name, authors, and references unless a specific exception is granted by the RSE.  The body of the document MAY contain other normalization forms as declared necessary by the authors.  Non-ASCII characters are only allowed in Author names and contact information, examples, and references.  Author names will also require an ASCII representation to encourage  broader indexing.
 +
 +Strawman principles (similar to RFC 6912 approach):
 +  - If something could affect interoperability or would block an implementer from being able to implement, any use of UTF-8 must be accompanied by an ASCII transliteration. The transliteration must be called out in a way to make it clear it is a transliteration (instead of just a part of the original item).
 +  - Do not assume that any non-ASCII character will necessarily be rendered correctly (or at all)
 +  - ...
design/utf-8.txt · Last modified: 2019/10/07 12:10 by rsewikiadmin