[rfc-i] Can the web be archived?

Nico Williams nico at cryptonector.com
Wed Jan 21 09:02:30 PST 2015


On Wed, Jan 21, 2015 at 08:47:55AM -0800, Sean Leonard wrote:
> I think that the RFC editor should make a cached facsimile (I avoid
> the word "copy") of the content and store it locally, possibly
> offline in a secure office or something.

Perhaps the RFC-Editor could assign a URN to every document published by
it (RFCs) or referenced from an RFC, and the RFC-Editor could then keep
an index of URN to {original, updated} URLs for each such URN.  Sure,
such an index might get out of date, but together with bibliographic
information it should be possible to find new locations for them.

Updating the index would be harder -- it's a job for a robot (which
presumably the RFC-Editor won't want to be in the business of running),
particularly if the RFC-Editor can keep a copy of the original for
partially matching contents to potential new versions at new locations.

Manual updates would require much labor (either to perform the updates
or to clean up spam), but if we needed them at all they could be farmed
out to volunteers.

(Surely such an index could not possibly violate copyrights more than
the referencing RFCs, since it would contain no more public contents
than the original bibliographic reference published in an RFC + newer
locators.)

> For documents that do not have a permissive copyright, just keep the
> document offline and allow parties to request looking at the
> facsimile in the office (or at IETF meetings) without making a new
> copy.

This.

> I suppose that one could make a fair-use argument for this purpose.
> I am not making that argument--that is up to the IETF's lawyers--but
> honestly nobody is really going to sue the IETF or anyone else if
> this is done selectively on an as-needed basis and the facsimiles
> are limited to archival purposes offline.

The liable entity here might be the RFC-Editor.

> I do not think that creating a new URI for the content is necessary.
> The original URI and the last accessed date are sufficient. In the
> legal community, when URIs are cited, the citation includes a "last
> accessed date"; these pieces of information are sufficient to
> identify the content as an historical fact (see the Blue Book).

An RFC-Editor-assigned URN might be convenient (odd, yes, but
convenient), at least for external documents for which a URN is not
assigned by their publishers.

> If you want to get all nerdy, you can include the most salient HTTP
> headers (Content-Type, Content-Length) and cryptographic hash (e.g.,
> SHA-256) in some archive related to the RFC. I would not advocate
> putting such information in the published RFC itself.

Indeed.

Nico
-- 


More information about the rfc-interest mailing list