errata logo graphic

Found 4 records.

Status: Reported (4)

RFC5663, "Parallel NFS (pNFS) Block/Volume Layout", January 2010

Source of RFC: nfsv4 (tsv)

Errata ID: 4139

Status: Reported
Type: Technical

Reported By: Christoph Hellwig
Date Reported: 2014-10-23

Section 2.7 says:

<section doesn't exist yet>

It should say:

2.7.  Volatile write caches

   Many storage devices implement volatile write caches that require an
   explicit flush to persist the data from write write operations to
   stable storage.  When a volatile write cache is used the pNFS server
   must ensure the volatile write cache has been committed to stable
   storage before the LAYOUTCOMMIT operation returns.  An example for
   this behavior are SCSI devices with the "Write Cache Enable" bit set,
   which require a "SYNCRONIZE CACHE (10)" or "SYNCRONIZE CACHE (16)"
   operation to write back the storage device cache.

Notes:

RFC5663 currently doesn't acknowledge the existence of volatile write caches, but they are common in consumer or SMB storage systems. Add a section that requires the server to take care of them.


Errata ID: 4141

Status: Reported
Type: Technical

Reported By: Christoph Hellwig
Date Reported: 2014-10-23

Section 2.2.2 says:


It should say:

   The volume size of a PNFS_BLOCK_VOLUME_SIMPLE volume must be obtained
   by the client from the storage subsystem as no size is provided in
   the XDR. All volumes listed in bsv_volumes of a
   struct pnfs_block_stripe_volume_info4 must be the same size.  If
   the size of the volumes listed in a stripe set does not align
   to the bsv_stripe_unit, the last stripe should be treated as
   having a size of volume size modulo the stripe size.
   The volume size of a PNFS_BLOCK_VOLUME_SLICE volume is the sum
   of the volume sizes of each component listed in bsv_volumes.
   The volume size of a PNFS_BLOCK_VOLUME_CONCAT volume is the sum
   of the volume sizes of each component listed in bcv_volumes.

Notes:

RFC5663 provides no explanation of the volume types except for a few sparse comments in the XDR. Explain at least basic size related rules.


Errata ID: 4142

Status: Reported
Type: Technical

Reported By: Christoph Hellwig
Date Reported: 2014-10-23

Section 2.8 says:

<section does not exist yet>

It should say:

2.8.  Device-ID-to-device-address

   A pNFS block volume layout server MAY signal
   device-ID-to-device-address changes to the client using the
   CB_NOTIFY_DEVICEID callback operation.

   If the change is compatible and does not require outstanding layouts
   to be recalled the server can issue a notification of type
   NOTIFY_DEVICEID4_CHANGE.

   A device-ID-to-device-address mapping change signaled by
   NOTIFY_DEVICEID4_CHANGE must not change the storage system specific
   addressing of the volume, and can only add new storage to the
   existing device.  In particular the following changes are allowed:

     o increasing the size of the underlying block device of a
       PNFS_BLOCK_VOLUME_SIMPLE volume.
     o increasing the size of a PNFS_BLOCK_VOLUME_SLICE volume if the
       underlying block device of the PNFS_BLOCK_VOLUME_SIMPLE volume
       it refers to is big enough to fit the new size.
     o increasing the size of each volume in a bsv_volumes of a
       PNFS_BLOCK_VOLUME_SLICE volume by the same amount.
     o increasing the size of the last volume in bcv_volumes of a
       PNFS_BLOCK_VOLUME_CONCAT volume.
     o adding new members to the end of bcv_volumes of a
       PNFS_BLOCK_VOLUME_CONCAT volume.

Notes:

Specify what device configuration changes can be supported without recalling layouts.


Errata ID: 4140

Status: Reported
Type: Editorial

Reported By: Christoph Hellwig
Date Reported: 2014-10-23

Section 2.3.5 says:

   Block/volume class storage devices are not required to perform read
   and write operations atomically.  Overlapping concurrent read and
   write operations to the same data may cause the read to return a
   mixture of before-write and after-write data.  Overlapping write
   operations can be worse, as the result could be a mixture of data
   from the two write operations; data corruption can occur if the
   underlying storage is striped and the operations complete in
   different orders on different stripes.  When there are multiple
   clients who wish to access the same data, a pNFS server can avoid
   these conflicts by implementing a concurrency control policy of
   single writer XOR multiple readers.  This policy MUST be implemented
   when storage devices do not provide atomicity for concurrent
   read/write and write/write operations to the same data.

It should say:

   Block/volume class storage devices do not provide byte granularity
   access and can only perform read and write operations atomically at
   block granularity, and thus require read-modify-write cycles to write
   data smaller than the block size.  Overlapping concurrent read and
   write operations to the same data thus may cause the read to return
   a mixture of before-write and after-write data.  Additionally, data
   corruption can occur if the underlying storage is striped and the
   operations complete in different orders on different stripes.  When
   there are multiple clients who wish to access the same data, a pNFS
   server MUST avoid these conflicts by implementing a concurrency
   control policy of single writer XOR multiple readers for a given data
   region.

Notes:

No device classified as block device can support concurrent writes at arbitrary byte granularity, so reword the section to not confuse the reader. Also make it explicit that the reader XOR writer policy only applies to different clients, as existing client implementation require layouts not to be recalled due to their own LAYOUTGET operations. Note that fixing this on the client also isn't feasible as the block layout unfortunately decided to introduce it's own extent concept instead of using layouts to describe individual I/O mappings.


Report New Errata