RFC Errata
RFC 5663, "Parallel NFS (pNFS) Block/Volume Layout", January 2010
Note: This RFC has been updated by RFC 6688
Source of RFC: nfsv4 (wit)
Errata ID: 4140
Status: Rejected
Type: Editorial
Publication Format(s) : TEXT
Reported By: Christoph Hellwig
Date Reported: 2014-10-23
Rejected by: Martin Stiemerling
Date Rejected: 2016-02-02
Section 2.3.5 says:
Block/volume class storage devices are not required to perform read and write operations atomically. Overlapping concurrent read and write operations to the same data may cause the read to return a mixture of before-write and after-write data. Overlapping write operations can be worse, as the result could be a mixture of data from the two write operations; data corruption can occur if the underlying storage is striped and the operations complete in different orders on different stripes. When there are multiple clients who wish to access the same data, a pNFS server can avoid these conflicts by implementing a concurrency control policy of single writer XOR multiple readers. This policy MUST be implemented when storage devices do not provide atomicity for concurrent read/write and write/write operations to the same data.
It should say:
Block/volume class storage devices do not provide byte granularity access and can only perform read and write operations atomically at block granularity, and thus require read-modify-write cycles to write data smaller than the block size. Overlapping concurrent read and write operations to the same data thus may cause the read to return a mixture of before-write and after-write data. Additionally, data corruption can occur if the underlying storage is striped and the operations complete in different orders on different stripes. When there are multiple clients who wish to access the same data, a pNFS server MUST avoid these conflicts by implementing a concurrency control policy of single writer XOR multiple readers for a given data region.
Notes:
No device classified as block device can support concurrent writes at arbitrary byte granularity, so reword the section to not confuse the reader. Also make it explicit that the reader XOR writer policy only applies to different clients, as existing client implementation require layouts not to be recalled due to their own LAYOUTGET operations. Note that fixing this on the client also isn't feasible as the block layout unfortunately decided to introduce it's own extent concept instead of using layouts to describe individual I/O mappings.
--VERIFIER NOTES--
David Black: " The new text effectively states that block I/O operations are always atomic at block granularity. That is not correct for all SCSI devices. The existing text suffices to warn implementers about what can go wrong here."