[rfc-i] rfc-interest Digest, Vol 89, Issue 9

John C Klensin john+rfc at jck.com
Wed Mar 21 16:55:57 PDT 2012

--On Wednesday, March 21, 2012 15:12:25 -0700 Leonard Rosenthol
<lrosenth at adobe.com> wrote:
> Joe - I think those are a great starting point for
> requirements.
> I just want to point out, as I did in my presentation, that I
> don't believe that a "one size fits all" is necessary.
> Meaning that the authoring format need not necessary be the
> archival format.  In fact, I would say that it's important
> that they are NOT the same format because (as you are doing
> below), the requirements for each differ.

Leonard and others,

Yes, up to a point.   As others have pointed out less directly,
we have a historical requirement for being able to create new
documents by hacking away at older ones.   If the authoring
format and the archival format are different, then the authoring
format effectively has to meet the same archival requirements as
the archival format.  For example, we have to be able to
guarantee that editors and processors for the authoring format
will remain readily available as far into the future as we care
about (plus a bit), that the authoring format identify the tools
and versions needed to process it to the extent necessary, and
so on.  

We also have to worry about things staying in synch.  For
example, if a given source file is pushed through a given
processor or set of steps to product PDF/A today, can it be
guaranteed that a processor that will be available and
functional a decade or two from now, when applied to the same
source, will produce the same PDF/A output.  If that condition
is not met, then we don't have a dual archival (source and PDF/A
output) format in practice: we have an archival PDF/A version
and a source version that can, if we are lucky, more or less
reproduce it.   

I note that, for different reasons, if we went back even four or
five years, xml2rfc and MSWord cannot meet that condition unless
historical versions of executable code were kept around and
continue to be executable.  It is not clear whether HTML (even
HTML3.x or 4) could meet that requirement either.

Once we relax that requirement, I note that there is an archival
format with which we have far more historical experience from
which to predict future usability than even PDF/A.   The
promises that have been made about PDF/A has a demonstrated
track record of around six or seven years.  I don't suggest
going there because the format is not well-suited for digital
distribution, but I note that we have many hundreds of years of
experience with documents printed in well-known type styles on
high-quality paper -- including the ability to reconstruct
source files that can produce similar output by type
style-sensitive OCR mechanisms.   

Again, I'm not suggesting that and would be violently opposed to
our regressing from ASCII to it, but I think it illustrates that
we need to be _really_ careful when we say "this archival format
is ok even though it effectively requires a separate source
format".  We need to be especially careful if it doesn't easily
support accurate source reconstruction and therefore depends on
archiving source (and everything needed to process the source)

In addition, while I'm actually a big fan of PDF/A for other
uses, I think the rather nice "Details Matter" article you cites
may not adequately cover the problem.  Adding to it the question
that Paul Hoffman (and maybe others asked), I've noticed that
there are a rather large number of programs on the market for
various platforms, all of which make extravagant claims about
producing PDF files.  In my experience, a rather large fraction
of them can't render PDF/A properly and it is really hard to
find out whether or not they can do so.   Worse, very few
desktop systems make it easy to use different rendering or
processing programs for conventional PDF and PDF/A files.  As
long as we are essentially translating our very simple ASCII
files into PDFs, the differences are not likely to be
significant and the rendering situation is not likely to be
problematic.   But most of the people who want an "advanced"
archival format want advanced features of one sort or another.
As soon as one starts down that path, reasonable criteria for
accessibility and usability may suggest (at least for the
present and the near future) that PDF/A has to be treated as a
proprietary format that can be rendered accurately only by
Acrobat and small (and generally unidentified) selection of
other tools.   

Sorry to be pessimistic about this, but...


More information about the rfc-interest mailing list