RFC Errata
RFC 5661, "Network File System (NFS) Version 4 Minor Version 1 Protocol", January 2010
Note: This RFC has been obsoleted by RFC 8881
Note: This RFC has been updated by RFC 8178, RFC 8434
Source of RFC: nfsv4 (wit)
Errata ID: 2751
Status: Rejected
Type: Technical
Publication Format(s) : TEXT
Reported By: Ricardo Labiaga
Date Reported: 2011-03-21
Rejected by: Magnus Westerlund
Date Rejected: 2019-10-25
Throughout the document, when it says:
It should say:
12.5.4.1. LAYOUTCOMMIT and change/time_modify becomes 12.5.4.2. LAYOUTCOMMIT and change/time_modify 12.5.4.2. LAYOUTCOMMIT and size becomes 12.5.4.3. LAYOUTCOMMIT and size 12.5.4.3. LAYOUTCOMMIT and layoutupdate becomes 12.5.4.4. LAYOUTCOMMIT and layoutupdate Add new Section 12.5.4.1 Implications of LAYOUTCOMMIT on file layouts For file layouts, WRITEs to a Data Server that return a stable_how4 value of FILE_SYNC4 guarantee that data and file system metadata are on stable storage. This means that a LAYOUTCOMMIT is not needed in order to make the data and metadata visible to the metadata server and other clients. For file layouts, when WRITE to the data server returns UNSTABLE4 or DATA_SYNC4 and the NFL4_UFLG_COMMIT_THRU_MDS flag is set, the client MUST send the COMMIT to the metadata server. A successful COMMIT to the metadata server guarantees that data and file system metadata are on stable storage. Therefore, any time that NFS4_UFLG_COMMIT_THRU_MDS is set, a LAYOUTCOMMIT (of the byte range specified by the layout) is not needed. For file layouts, when NFL4_UFLG_COMMIT_THRU_MDS flag is not set, and WRITE or COMMIT to the data server return DATA_SYNC4, the client MUST send the LAYOUTCOMMIT to the metadata server in order to synchronize file metadata. The following table summarizes the rules when a LAYOUTCOMMIT is needed, and the effects of a COMMIT to a data server and metadata server. +------------+------------+------------+------------+----------+ | NFL4_UFLG_ | WRITE to | Meaning of | Meaning | LAYOUT | | COMMIT_ | DS returns | COMMIT to | of COMMIT | COMMIT | | THRU_MDS | | DS | to MDS | required | +------------+------------+------------+------------+----------+ | Not Set | UNSTABLE4 | DATA_SYNC4 | Nothing | Yes | | Not Set | DATA_SYNC4 | Nothing | Nothing | Yes | | Not Set | FILE_SYNC4 | Nothing | Nothing | NO | | Set | UNSTABLE4 | Nothing | FILE_SYNC4 | NO | | Set | DATA_SYNC4 | Nothing | FILE_SYNC4 | NO | | Set | FILE_SYNC4 | Nothing | Nothing | NO | +------------+------------+------------+------------+----------+ Note that a client can always demand FILE_SYNC4 or DATA_SYNC4 in WRITE's arguments. Also note that specifying these stability levels may adversely impact performance. If a LAYOUTCOMMIT is required, it should be sent before CLOSE to maintain close-to-open semantics. If required, it should be sent before LOCKU, OPEN_DOWNGRADE, LAYOUTRETURN, and when the application issues fsync() [25]. Again, if LAYOUTCOMMIT is required, it should be sent periodically to keep the file size and modification time synchronized. This allows use cases like tail -f [56] which copies its input file to the standard output and updates the output as new lines become available in the input file. It is up to the client implementation to determine how frequently LAYOUTCOMMIT is issued. Possible policies include every N'th COMMIT to a data server, every N'th unit of time, or after writing a stripe to a set of data servers. Even if a required LAYOUTCOMMIT is not issued by the client, the data server and metadata servers have a set of responsibilities to fulfill in order to guarantee data consistency: 1) Data servers MUST commit data and synchronize modification and size attributes with the metadata server before a layout is revoked as described in section 12.5.4. 2) Data servers SHOULD commit data and synchronize modification and size attributes with the metadata server after the metadata server reboots. In theory the client should commit the data, but this avoids the problem where both the client and metadata server crash at the same time. 3) The metadata server MAY periodically poll data servers to synchronize modification and size attributes. Section 13.9.2.3 says: For the NFSv4.1-based data storage protocol, it is necessary to re- synchronize state such as the size attribute, and the setting of mtime/change/atime. Should say: For the NFSv4.1-based data storage protocol, it may be necessary to re- synchronize state such as the size attribute, and the setting of mtime/change/atime. Section 13.10 says: For the case above, this means that a LAYOUTCOMMIT will be done at close (along with the data WRITEs) and will update the file's size and change attribute. Should say: For the case above, this means that, if necessary, a LAYOUTCOMMIT will be done at close (along with the data WRITEs) and will update the file's size and change attribute. Section 18.3.4 says: The COMMIT operation is similar in operation and semantics to the POSIX fsync() [25] system interface that synchronizes a file's state with the disk (file data and metadata is flushed to disk or stable storage). COMMIT performs the same operation for a client, flushing any unsynchronized data and metadata on the server to the server's disk or stable storage for the specified file. Should say: The COMMIT operation is similar in operation and semantics to the POSIX fsync() [25] system interface that synchronizes a file's state with the disk (file data and metadata is flushed to disk or stable storage). COMMIT performs the same operation for a client, flushing any unsynchronized data and metadata on the server to the server's disk or stable storage for the specified file. When using pNFS, if a WRITE returned UNSTABLE4 and NFL4_UFLG_COMMIT_THRU_MDS is not set, then the client MUST COMMIT to the data server. The COMMIT may result in flushing the data but not the metadata. In this case, the metadata MUST be flushed with a subsequent LAYOUTCOMMIT to the metadata server. A complete set of pNFS rules for flushing data and metadata is described in section 12.5.4.1. Section 18.3.4 says: The above description applies to page-cache-based systems as well as buffer- cache-based systems. In the former systems, the virtual memory system will need to be modified instead of the buffer cache. Should say: The above description applies to page-cache-based systems as well as buffer- cache-based systems. In the former systems, the virtual memory system will need to be modified instead of the buffer cache. Refer to Section 12.5.4.1 for a discussion of the effects of data stability levels on data servers or metadata servers. Section 18.32.4 says: However, since it is possible for a WRITE to be done with a special stateid, the server needs to check for this case even though the client should have done an OPEN previously. Should say: However, since it is possible for a WRITE to be done with a special stateid, the server needs to check for this case even though the client should have done an OPEN previously. Refer to Section 12.5.4.1 for a discussion of the effects of data stability levels on data servers or metadata servers. Section 20.3.4 says: In the case of modified data being written while the layout is held, the client must use LAYOUTCOMMIT operations at the appropriate time; as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. Should say: In the case of modified data being written while the layout is held, the client may be required to use LAYOUTCOMMIT operations at the appropriate time; if LAYOUTCOMMIT is required, it must be done before the LAYOUTRETURN. Add new informative reference to Section 23.2 [56] The Open Group, "section 'tail' of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition, HTML Version (www.opengroup.org), ISBN 1931624453, 2004.
Notes:
A new section describing the implications of LAYOUTCOMMIT on file layouts is
defined in this errata, along with updates to existing sections of the spec.
The technical details in this errata were agreed upon at the IETF Interim
Meeting in Sunnyvale, CA on Feb 18-19, 2011.
--VERIFIER NOTES--
This errata was rejected based on formal process grounds that Errata is not allowed to change the WG consensus at the time of publication, and also is very extensive. This issue do need to be addressed in an update to the RFC.