User Tools

Site Tools


design:formats

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
design:formats [2013/10/15 14:45]
rsewikiadmin
design:formats [2013/10/15 19:07]
rsewikiadmin
Line 38: Line 38:
  
 ===== Text ===== ===== Text =====
 +=== ASCII vs. UTF-8 for Text Output ===
 +
 +As of 2013-10-09, it is not clear whether or not the text output will be ASCII or UTF-8. The following assumes ASCII. If the format is UTF-8, then the following is wrong.
 +
 +The text-only format must have the same character-set limitations as the current RFC format. For new RFCs that have non-ASCII characters in them, each such character must be represented as //[*U+xxxx*]//, where //xxxx// is a 4- or 6- character hex value. The use case here is that it must be possible to convert all of the encoded versions of the non-ASCII characters in the text-only document exactly to the correct characters in the canonical document. The choice of //[*U+xxxx*]// was made because it is extremely unlikely for that sequence to be part of a normal RFC, even one that talks about Unicode code points by their hex values. For example, an author's name that is represented in the canonical format as "Martin Dürst" would be represented in the text-only format as "Martin D[*U+00FC*]rst". This requires that lines in the text-only format be longer than 80 columns if those lines contain non-ASCII characters.
 +
 +//Dave thinks: disagree with the above paragraph. I'm leaning towards saying there should be a separate UTF-8 (e.g. .utf8) text version.  And for either version I don't think any U+ sequence should appear for a person's name.//
 +
 +//Paul thinks: if there are two versions, the .txt should be UTF-8 and the ASCII version should be .asc. If there is an all-ASCII version, we need to ask the authors how they want their names (mis)spelled in ASCII.//
  
 Initial proposal: There should be multiple text outputs: ASCII-only with page breaks, ASCII-only without page breaks, UTF-8 with page breaks, UTF-8 without page breaks. Initial proposal: There should be multiple text outputs: ASCII-only with page breaks, ASCII-only without page breaks, UTF-8 with page breaks, UTF-8 without page breaks.
design/formats.txt · Last modified: 2013/10/15 19:07 by rsewikiadmin