[rfc-i] v3imp #1 Control over spacing and line-breaking

Sean Leonard dev+ietf at seantek.com
Fri Jan 23 01:01:58 PST 2015


[this is to rfc-interest@, as well as to media-types@ and urn@, due to 
the need for <br> in registration templates in the xml2rfc v3 vocabulary]
Improvement Need
#1 Control over spacing and line-breaking

This improvement calls for greater control over spacing and 
line-breaking in the spec-text. This includes being able to preserve 
exact spacing {@xml:space="preserve" or equivalents}, and setting 
arbitrary non-paragraph separating line breaks <br>. Furthermore, the 
vocabulary needs to be able to identify places where line breaks are 
prohibited {<nobr> equivalent} or optionally allowed {<wbr> or equivalent}.

I note that <br> has been adopted in the vocabulary, which is a good 
thing. However, it seems to be limited to tables. It should be allowed 
in <t> elements, as well as possibly in other places where unstructured 
text is permitted. My main request is in <t> elements, however.

Extensive examples of the need are in draft-josefsson-pkix-textual (see 
draft-josefsson-pkix-textual-10 txt and pdf), which has been approved by 
the IESG for RFC publication. Further examples are in I-Ds and RFCs that 
include registration templates, including 
draft-ietf-appsawg-text-markdown-05 (media type registrations) and URN 
registrations of all kinds.

In pkix-textual, there are protocol elements that are mentioned in the 
spec-text ("text") where spacing and line-breaking are *absolutely 
critical, non-negotiable properties* of the textual productions. 
Furthermore, the text takes pains not to invent new and unnecssary terms 
out of whole cloth. Thus there are productions "-----BEGIN ", "-----END 
", "-----", etc. that need to be kept exactly as-is, with no line breaks 
between any hyphens, and exactly one space between the "BEGIN" and "END" 
characters and the end of the string.

I made heroic efforts to use existing xml2rfc tools to make these things 
work out, to no avail. Ultimately I gave up and modified the xml2rfc 
source myself to support various existing semi-documented (e.g., <spanx 
xml:space="preserve">) and novel commands.

The use of NSBP (NON-BREAKING SPACE) and NBHY (NON-BREAKING HYPHEN) 
Unicode characters are inappropriate. The repertoire of these protocol 
elements is US-ASCII; if someone copied and pasted the "-----" and " " 
characters out of the (PDF or XML) document, they would get the 
incorrect code points. Furthermore, xml2rfc had a stubborn tendency to 
strip off whitespace characters from the ends of plain text XML fragments.

Authors know what they are doing, and if a protocol or an in-text 
discussion requires exactly one space, two spaces, ten spaces with 
hyphens, Unicode EM spaces or figure dashes or ideographic spaces or 
what-have-you, it is the author's choice to use those characters. The 
vocabulary really cannot prohibit these things at a technical level.

Regarding line breaks: I now have a lot more experience filling out 
registration templates. Many registration templates use line breaks to 
separate field names from field values, such as the RFC 6838-mandated text:

    Additional information:

      Deprecated alias names for this type:
      Magic number(s):
      File extension(s):
      Macintosh file type code(s):


Stuffing the entire registration template in a <figure><artwork> is 
absurd and borders on actually being insulting. When over 50% of the RFC 
is ensconced in a single figure, you know you have a problem. Templates 
such as media type and URN registrations are "structured spec-text", 
i.e., specification text intended for humans to read that is broken up 
in a regularized way. Whether a <br> or a new paragraph is appropriate 
depends on context; the editorial process can sort this out.

Note that if you allow authors to preserve spacing and linebreaks, e.g., 
with @xml:space="preserve" or an equivalent, the author can always stuff 
a CRLF into the space-preserved text. So if you're going to allow one, 
you may as well allow the other one.

What ended up happening with a lot of these drafts, such as the xmlns 
and rdf URN registration and the text/markdown media type registration, 
is that I gave up on xml2rfc entirely and authored them in nroff. I 
encourge other authors to revolt similarly until <br> and other 
appropriate constructs are in the v3 vocabulary.

Because white-space controls can be a fundamental property of certain 
kinds of text, I encourage an attribute @white-space to apply to all 
text-containing elements, with similar values to CSS 
<http://www.w3.org/TR/CSS21/text.html#white-space-prop>: {normal}, 
{pre}, {nowrap}, {pre-wrap}, and {pre-line}.

Nevertheless, I would be satisfied if white-space controls are limited 
to certain elements, such as the elements discussed in Improvement #3 
(forthcoming). I would be partially satisfied if white-space controls 
are limited to two options: "default" ≈ "normal" and "preserve" ≈ "pre" 
(which roughly correspond to the values of xml:space).

Sean


More information about the rfc-interest mailing list