User Tools

Site Tools


design:formats

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
design:formats [2013/10/15 14:45]
rsewikiadmin
design:formats [2013/10/15 19:07] (current)
rsewikiadmin
Line 38: Line 38:
  
 ===== Text ===== ===== Text =====
 +=== ASCII vs. UTF-8 for Text Output ===
 +
 +As of 2013-10-09, it is not clear whether or not the text output will be ASCII or UTF-8. The following assumes ASCII. If the format is UTF-8, then the following is wrong.
 +
 +The text-only format must have the same character-set limitations as the current RFC format. For new RFCs that have non-ASCII characters in them, each such character must be represented as //​[*U+xxxx*]//,​ where //xxxx// is a 4- or 6- character hex value. The use case here is that it must be possible to convert all of the encoded versions of the non-ASCII characters in the text-only document exactly to the correct characters in the canonical document. The choice of //​[*U+xxxx*]//​ was made because it is extremely unlikely for that sequence to be part of a normal RFC, even one that talks about Unicode code points by their hex values. For example, an author'​s name that is represented in the canonical format as "​Martin Dürst"​ would be represented in the text-only format as "​Martin D[*U+00FC*]rst"​. This requires that lines in the text-only format be longer than 80 columns if those lines contain non-ASCII characters.
 +
 +//Dave thinks: disagree with the above paragraph. I'm leaning towards saying there should be a separate UTF-8 (e.g. .utf8) text version. ​ And for either version I don't think any U+ sequence should appear for a person'​s name.//
 +
 +//Paul thinks: if there are two versions, the .txt should be UTF-8 and the ASCII version should be .asc. If there is an all-ASCII version, we need to ask the authors how they want their names (mis)spelled in ASCII.//
  
 Initial proposal: There should be multiple text outputs: ASCII-only with page breaks, ASCII-only without page breaks, UTF-8 with page breaks, UTF-8 without page breaks. Initial proposal: There should be multiple text outputs: ASCII-only with page breaks, ASCII-only without page breaks, UTF-8 with page breaks, UTF-8 without page breaks.
design/formats.txt · Last modified: 2013/10/15 19:07 by rsewikiadmin