This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
design:utf8-requirements [2013/10/15 14:23] rsewikiadmin created |
design:utf8-requirements [2013/11/06 18:40] rsewikiadmin |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | All documents will be UTF-8 encoded and MUST apply Normalization Form C to all metadata fields such as document name, authors, and references unless a specific exception is granted by the RSE. The body of the document MAY contain other normalization forms as declared necessary by the authors. Non-ASCII characters are only allowed in Author names, contact information, examples, and References. Author names will also require an ASCII representation to encourage broader indexing. | + | Author names: Valid Unicode is required, and for non-ASCII names, an ASCII-only identifier is required. |
+ | |||
+ | Bibliographic text: The reference entry must be in English; whatever subfields are present MUST be available in ASCII. As long as good sense is used, they MAY also include non-ASCII characters at author discretion. This applies to both normative and informative references. | ||
+ | |||
+ | Keywords: US-ASCII only | ||
+ | |||
+ | Body: The mention of non-ASCII characters requires Unicode code points, encourage characters, allow Unicode character names. General use does not require any clarifying identifiers or Unicode names. (Note: use versus mention distinction) | ||
+ | |||
+ | We would NOT apply in the use case and we WOULD apply in the mention case. So, | ||
+ | CATEGORY NUMBER | ||
+ | naïve 300 | ||
+ | but | ||
+ | CATEGORY EXAMPLES | ||
+ | Latin naïve (U+0063 U+0061 U+00EF U+0076 U+0065) | ||
+ | |||
+ | Tables: Tables follow the same rules for identifiers and characters as the body. If it is sensible (i.e., more understandable for a reader) for a given document to have two tables, one including the identifiers and characters, one with just the characters, that will be allowed on a case by case basis. | ||
+ | |||
+ | U+ notation must be used except within a code component where you must follow the rules of the programming language in which you are writing the code | ||
+ | |||
+ | Normalization forms: If the normalization matters to the content, the authors must submit in a normalization-resistant form. Do not expect normalization forms to be preserved. | ||
+ | |||
+ | Codepoint numbers ("U+0394") and Unicode character names ("Greek Capital Letter | ||
+ | Delta") are normalization-resistant forms. The characters themselves may not be. | ||
+ | |||
+ | |||
+ | All documents should identify themselves as being UTF-8. Both the canonical XML format and the non-canonical HTML format must contain metadata that specifies that the encoding is UTF-8. The non-canonical text-only format must begin with a UTF-8 BOM. | ||
| | ||
- | An implementer must be able to implement the specification without any confusion or ambiguity introduced by the use of UTF-8 rather than ASCII. | + | An implementer must be able to implement the specification without any confusion or ambiguity introduced by the use of UTF-8 rather than ASCII. |
| | ||
- | Must be able to reference (cite) the document from elsewhere in a standard way, including from documents that only support ASCII. | + | People must be able to reference (cite) the RFC from elsewhere in a standard way, including from documents that only support ASCII. |
| | ||
- | Must be able to reference (cite) other documents in an unambiguous way. | + | The RFC must be able to reference (cite) other documents in an unambiguous way. |
| | ||
- | Cross-references (including references to other documents) must be unambiguous even from a printed document. | + | Cross-references (including references to other documents) must be unambiguous even from a printed document. |
| | ||
- | Must be able to index the document in various ways, so searching by keyword, author name, etc. can work. | + | Tools must be able to index the RFC in various ways, so searching for keywords, author names, and so on can work. |