Differences

This shows you the differences between two versions of the page.

--- design:utf-8 [2013/09/04 19:26]
dthaler created
+++ design:utf-8 [2013/10/15 15:54]
dthaler
@@ Line 1: / Line 1: @@
-Some discussion of requirements, goals, and desires around non-ASCII characters are on the [[formatreq]] page.
+Some discussion of requirements, goals, and desires around non-ASCII characters are at https://www.rfc-editor.org/rse/wiki/doku.php?id=formatreq.
 The table below summarizes a taxonomy of cases where (non-ASCII) UTF-8 might or might not be allowed, along with some thoughts. The intent is that each row represents a separate policy decision.
@@ Line 5: / Line 5: @@
 ^ Case ^ Section ^ Use ^ Consensus ^ Comments ^
-| 1a | (title page) | Author name | Yes? | Answer should match (6a) |
+| 1a | (title page) | Author name | Yes | Answer should match (6a) |
-| 1b | (title page) | Author affiliation | | Answer should match (6b) |
+| 1b | (title page) | Author affiliation | Yes | Answer should match (6b) |
-| 1c | (title page) | Document title | | |
+| 1c | (title page) | Document title | No? | |
-| 2 | Abstract | | | |
+| 2 | Abstract | Prose | No? | Could be same as (3e), but Abstracts may also be separately compiled into other indices so could have a different answer |
-| 3a | Body or Appendix | Example string | | E.g. fictional person name, IRI, EAI, domain name, etc. Currently there's no XML markup to denote example strings, so hard to distinguish from (3c) |
+| 3a | Body or Appendix | Example string | Yes | E.g. fictional person name, IRI, EAI, domain name, etc. Currently there's no XML markup to denote example strings, so hard to distinguish from (3c) |
-| 3b | Body or Appendix | Code snippet | | |
+| 3b | Body or Appendix | Code snippet | No? | "Code" does not mean examples here, but actual grammar like ABNF, C, etc. |
-| 3c | Body or Appendix | Literal protocol element | | Transliteration should use U+xxxx syntax |
+| 3c | Body or Appendix | Literal protocol element | Yes? | Required transliteration should use U+xxxx syntax |
-| 3d | Body or Appendix | Document title of a cited document | | Answer should match (4c) |
+| 3d | Body or Appendix | Document title of a cited document | Yes | Answer should match (4c) |
-| 3e | Body or Appendix | Prose | No? | e.g. use of "naïve" in http://tools.ietf.org/html/rfc4690#section-1.5.5 |
+| 3e | Body or Appendix | Prose | No | e.g. use of "naïve" in http://tools.ietf.org/html/rfc4690#section-1.5.5 |
-| 4a | References | Author name | Yes? | Answer should match (1a) |
+| 3f | Body or Appendix | Section title | No? | |
-| 4b | References | Author affiliation | | Answer should match (1b) |
+| 4a | References | Author name | Yes | Answer should match (1a) |
-| 4c | References | Document title | | Not necessarily an RFC that's being referenced |
+| 4b | References | Author affiliation | Yes | Answer should match (1b) |
-| 4d | References | Document IRI | No? | |
+| 4c | References | Document title | Yes | Not necessarily an RFC that's being referenced |
-| 5a | Acknowledgements | Person name | Yes? | |
+| 4d | References | Document IRI | No | |
-| 5b | Acknowledgements | Organization name | | |
+| 5a | Acknowledgements | Person name | Yes | |
-| 6a | Authors Addresses | Author name | Yes? | |
+| 5b | Acknowledgements | Organization name | Yes | |
-| 6b | Authors Addresses | Author affiliation | | |      |
+| 6a | Authors Addresses | Author name | Yes | |
-| 6c | Authors Addresses | Author email address (EAI) | | |
+| 6b | Authors Addresses | Author affiliation | Yes | |      |
-| 6c | Authors Addresses | Author IRI | No? | |
+| 6c | Authors Addresses | Author email address (EAI) | No? | |
+| 6d | Authors Addresses | Author IRI | No | |
+| 6e | Authors Addresses | Author postal address | Yes | |
+| 7a | (page footer) | Author surname | Yes? | |
+| 7b | (page header) | Abbreviated document name | No? | |
+| 8a | (metadata) | Keywords | Yes | |
 |
+Other open questions:
+  - Where UTF-8 is allowed, what normalization form(s) are ok? (NFC, NFD, NFKC, or NFKD)
+    - //Paul thinks//: irrelevant. Characters are characters.
+  - Can you reference an external document that contains a non-ascii title/author/etc.? If you need a transliteration, where do you get it from?
+    - //Paul thinks//: Yes, definitely. This is important for non-ASCII author names. Transliteration can be guessed at by the RFC author.
+    - //Heather//: I would rather the author provide the transliteration. The RFC Editor shouldn't be guessing on behalf of the authors.
+  - ...
+Strawman requirements (beyond those listed at https://www.rfc-editor.org/rse/wiki/doku.php?id=design:start):
+  - An implementer must be able to implement the specification without any confusion or ambiguity introduced by the use of UTF-8 rather than ASCII.
+  - Must be able to reference (cite) the document from elsewhere in a standard way, including from documents that only support ASCII.
+  - Must be able to reference (cite) other documents in an unambiguous way.
+  - Cross-references (including references to other documents) must be unambiguous even from a printed document.
+  - Must be able to index the document in various ways, so searching by keyword, author name, etc. can work.
+  - All documents will be UTF-8 encoded and MUST apply Normalization Form C to all metadata fields such as document name, authors, and references unless a specific exception is granted by the RSE.  The body of the document MAY contain other normalization forms as declared necessary by the authors.  Non-ASCII characters are only allowed in Author names and contact information, examples, and references.  Author names will also require an ASCII representation to encourage  broader indexing.
+Strawman principles (similar to RFC 6912 approach):
+  - If something could affect interoperability or would block an implementer from being able to implement, any use of UTF-8 must be accompanied by an ASCII transliteration. The transliteration must be called out in a way to make it clear it is a transliteration (instead of just a part of the original item).
+  - Do not assume that any non-ASCII character will necessarily be rendered correctly (or at all)
+  - ...

RSE Wiki Archive

User Tools

Site Tools

Differences

Page Tools