Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
design:utf8-requirements [2013/10/16 12:22] rsewikiadmin |
design:utf8-requirements [2013/11/06 13:53] rsewikiadmin |
| |
All documents will be UTF-8 encoded and MUST apply Normalization Form C to all metadata fields such as document name, authors, and references unless a specific exception is granted by the RSE. The body of the document MAY contain other normalization forms as declared necessary by the authors. Non-ASCII characters are only allowed in author names, contact information, examples, and References. Author names will also require an ASCII representation to encourage broader indexing. | Author names: Valid Unicode is required, and for non-ASCII names, an ASCII-only identifier is required. |
| |
| Bibliographic text: The reference must point to something that has been translated to English; whatever subfields |
| are present MUST be available in ASCII (translated to English when appropriate), as long as good sense is used, they MAY also appear in non-ASCII characters at author discretion. This applies to both normative and informative references. |
| |
| Keywords: US-ASCII only |
| |
| Body: The mention of non-ASCII characters requires identifiers, encourage characters, allow unicode names. General use does not require any clarifying identifiers or unicode names. (Note: use versus mention distinction) |
| |
| We would NOT apply in the use case and we WOULD apply in the mention case. So, |
| CATEGORY NUMBER |
| naïve 300 |
| but |
| CATEGORY EXAMPLES |
| Latin naïve (U+0063 U+0061 U+00EF U+0076 U+0065) |
| |
| Tables: Tables follow the same rules for identifiers and characters as the body. If it is sensible (i.e., more understandable for a reader) for a given document to have two tables, one including the identifiers and characters, one with just the characters, that will be allowed on a case by case basis. |
| |
| U+ notation must be used except within a code component where you must follow the rules of the programming language in which you are writing the code |
| |
| Normalization forms: If the normalization matters to the content, the authors must submit in a normalization-resistant form. Do not expect normalization forms to be preserved. |
| |
All documents should identify themselves as being UTF-8. Both the canonical XML format and the non-canonical HTML format must contain metadata that specifies that the encoding is UTF-8. The non-canonical text-only format must begin with a UTF-8 BOM. | All documents should identify themselves as being UTF-8. Both the canonical XML format and the non-canonical HTML format must contain metadata that specifies that the encoding is UTF-8. The non-canonical text-only format must begin with a UTF-8 BOM. |