[rfc-i] Bring Back the BR! Re: typesetting poetry

Sean Leonard dev+ietf at seantek.com
Mon Oct 12 07:40:47 PDT 2015

On 10/12/2015 1:15 AM, Miek Gieben wrote:
> [ Quoting <dev+ietf at seantek.com> in "Re: [rfc-i] Bring Back the BR! 
> Re: ..." ]
>>> On 9/5/2015 7:56 AM, Donald Eastlake wrote:
>>>> Section 1.1 of RFC 6325 is also a poem.
>>> See: a modern reference!
>> To clarify and modulate my outburst:
>> There is in fact a <br> (in v3 xml2rfc-15 and beyond), but it only 
>> appears as the child of a <th> or <td> element; it cannot appear in a 
>> <t> element (aka paragraph).
>> I suppose that means that poetry, and pseudocode, can appear inside 
>> tables (namely tables with one column and one row), but not inside 
>> "text" per-se. Inside <th> and <td>, <t> is mutually exclusive with 
>> the other elements including <br>. I suppose that if your poetry or 
>> pseudocode has multiple paragraph-like divisions (e.g., multiple 
>> stanzas, multiple verses, multiple function blocks), you need to put 
>> them in multiple cells or rows. Not really sure about the wisdom of 
>> that.
> <br> only solves the line-break. Leading whitespace (for instance) is 
> also a
> problem. Basically: for poetry you'll need a paragraph-type where *all*
> whitespace is significant.

I advocated for this very thing about "700 e-mails" ago—but in any 
event, the grammar supports it. xml2rfc has always permitted 
@xml:space="preserve" in various places; however, implementations have 
not consistently implemented it so various detractors have thought it 
wise to deprecate it.

The most normal and natural sense of "preserve" would be to preserve all 
white space. Compare this with the CSS white-space property. In keeping 
with that property, a discussion should also be had about line breaking 
to fill line boxes. Controlling line breaking to fill line boxes might 
be achieved with new markup, or with the many breaking format characters 
in Unicode.

I would like to quote from the XML 1.0 Fifth Edition Standard:

    S <http://www.w3.org/TR/2008/PER-xml-20080205/#NT-S>(white space)
    consists of one or more space (#x20) characters, carriage returns,
    line feeds, or tabs.

              White Space

    [3] 	|S| 	   ::= 	|(#x20 | #x9 | #xD | #xA)+|


    The presence of #xD in the above production is maintained purely for
    backward compatibility with theFirst Edition
    <http://www.w3.org/TR/1998/REC-xml-19980210>. As explained in*2.11
    End-of-Line Handling*
    <http://www.w3.org/TR/2008/PER-xml-20080205/#sec-line-ends>, all #xD
    characters literally present in an XML document are either removed
    or replaced by #xA characters before any other processing is done.
    The only way to get a #xD character to match this production is to
    use a character reference in an entity value literal.

          2.10 White Space Handling

    In editing XML documents, it is often convenient to use "white
    space" (spaces, tabs, and blank lines) to set apart the markup for
    greater readability. Such white space is typically not intended for
    inclusion in the delivered version of the document. On the other
    hand, "significant" white space that should be preserved in the
    delivered version is common, *for example in poetry and source code*
    /[emphasis mine]/.

    AnXML processor
    <http://www.w3.org/TR/2008/PER-xml-20080205/#dt-xml-proc>/must/always pass
    all characters in a document that are not markup through to the
    application. Avalidating XML processor
    <http://www.w3.org/TR/2008/PER-xml-20080205/#dt-validating>/must/also inform
    the application which of these characters constitute white space
    appearing inelement content

What we see here is that white[ ]space is only SP, HT, CR, and LF. At 
the "application level", xml2rfc-23 says:

        Tools interpreting the XML described here will collapse horizontal
        whitespace and linebreaks to a single whitespace (except inside
        <artwork> and <sourcecode>), and will trim leading and trailing

The natural conclusion of this is that only SP, HT, CR, and LF are going 
to get smushed.

Therefore, you are welcome to use all of these delightful space 
characters in the Unicode standard: 

May I suggest few U+2003 EM SPACEs for your recipes: nice and big. Add 
to pot to taste. You may also consider a few dashes of U+00A0 NO-BREAK 
which are wrappable.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20151012/150cc604/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3705 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20151012/150cc604/attachment-0001.p7s>

More information about the rfc-interest mailing list