User Tools

Site Tools


design:utf8-requirements

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
design:utf8-requirements [2013/10/16 12:19]
rsewikiadmin
design:utf8-requirements [2013/11/06 18:40] (current)
rsewikiadmin
Line 1: Line 1:
  
-All documents will be UTF-8 encoded and MUST apply Normalization Form C to all metadata fields such as document name, authors, and references unless a specific exception is granted by the RSE. The body of the document MAY contain other normalization forms as declared necessary by the authors. ​ Non-ASCII ​characters are only allowed in author ​names, ​contact information,​ examples, and References. ​ Author names will also require ​an ASCII representation to encourage broader indexing.+Author names: Valid Unicode is required, and for non-ASCII names, an ASCII-only identifier is required.
  
-All documents should identify themselves as being UTF-8.  ​For formats that are not inherently defined ​as UTF-only, the document must identify ​itself ​as being in UTF-8.  ​E.g. encoding="UTF-8" for XML, a BOM for plain text, etc.+Bibliographic text: The reference entry must be in English; whatever subfields are present MUST be available in ASCII.  ​As long as good sense is used, they MAY also include non-ASCII characters at author discretion. This applies to both normative and informative references. 
 + 
 +Keywords: US-ASCII ​only 
 + 
 +Body: The mention of non-ASCII characters requires Unicode code pointsencourage characters, allow Unicode character names. ​ General use does not require any clarifying identifiers or Unicode names. (Note: use versus mention distinction) 
 + 
 +  We would NOT apply in the use case and we WOULD apply in the mention case.  So, 
 +    CATEGORY ​       NUMBER 
 +    naïve ​          300 
 +  but 
 +   ​CATEGORY ​       EXAMPLES 
 +    Latin           ​naïve (U+0063 U+0061 U+00EF U+0076 U+0065) 
 + 
 +Tables: Tables follow the same rules for identifiers and characters as the body.  If it is sensible (i.e., more understandable for a reader) for a given document ​to have two tables, one including the identifiers and characters, one with just the characters, that will be allowed on a case by case basis. 
 + 
 +U+ notation ​must be used except within a code component where you must follow the rules of the programming language in which you are writing the code 
 + 
 +Normalization forms: If the normalization matters to the content, the authors must submit in a normalization-resistant form.  Do not expect normalization forms to be preserved. ​  
 + 
 +  Codepoint numbers ("​U+0394"​) and Unicode character names ("​Greek Capital Letter  
 +  Delta"​) are normalization-resistant forms. ​ The characters themselves may not be. 
 + 
 + 
 +All documents should ​identify ​themselves ​as being UTF-8.  ​Both the canonical XML format and the non-canonical HTML format must contain metadata that specifies that the encoding ​is UTF-8. The non-canonical text-only format must begin with UTF-8 BOM.
     ​     ​
 An implementer must be able to implement the specification without any confusion or ambiguity introduced by the use of UTF-8 rather than ASCII. An implementer must be able to implement the specification without any confusion or ambiguity introduced by the use of UTF-8 rather than ASCII.
design/utf8-requirements.1381951156.txt.gz · Last modified: 2013/10/16 12:19 by rsewikiadmin