RFC 9004: Updates for the Back-to-Back Frame Benchmark in RFC 2544
- A. Morton
Abstract
Fundamental benchmarking methodologies for network interconnect devices of interest to the IETF are defined in RFC 2544. This memo updates the procedures of the test to measure the Back-to-Back Frames benchmark of RFC 2544, based on further experience.¶
This memo updates Section 26.4 of RFC 2544.¶
Status of This Memo
This document is not an Internet Standards Track specification; it is published for informational purposes.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are candidates for any level of Internet Standard; see Section 2 of RFC 7841.¶
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
https://
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://
1. Introduction
The IETF's fundamental benchmarking methodologies are defined in [RFC2544], supported by the terms and definitions in [RFC1242]. [RFC2544] actually obsoletes an earlier specification, [RFC1944]. Over time, the benchmarking community has updated [RFC2544] several times, including the Device Reset benchmark [RFC6201] and the important Applicability Statement [RFC6815] concerning use outside the Isolated Test Environment (ITE) required for accurate benchmarking. Other specifications implicitly update [RFC2544], such as the IPv6 benchmarking methodologies in [RFC5180].¶
Recent testing experience with the Back-to-Back Frame test and
benchmark in Section 26.4 of [RFC2544] indicates that an
update is warranted [OPNFV-2017] [VSPERF-b2b]. In particular, analysis of the results indicates
that buffer size matters when compensating for interruptions of software-packet processing, and this finding increases the importance of the
Back-to-Back Frame characterizatio
[RFC2544]
provides its own requirements language consistent with [RFC2119], since [RFC1944] (which it obsoletes) predates [RFC2119]. All three memos share common authorship.
Today, [RFC8174] clarifies the usage of requirements
language, so the requirements language presented in this memo are expressed in accordance with
[RFC8174]. They are intended for those
performing
2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
3. Scope and Goals
The scope of this memo is to define an updated method to unambiguously perform tests, measure the benchmark(s), and report the results for Back-to-Back Frames (as described in Section 26.4 of [RFC2544]).¶
The goal is to provide more efficient test procedures where possible and expand reporting with additional interpretation of the results. The tests described in this memo address the cases in which the maximum frame rate of a single ingress port cannot be transferred to an egress port without loss (for some frame sizes of interest).¶
Benchmarks as described in [RFC2544] rely on test conditions with
constant frame sizes, with the goal of understanding what network-device
capability has been tested. Tests with the smallest size stress the
header
Section 3 of [RFC8239] describes buffer-size testing for physical networking devices in a data center. Those methods measure buffer latency directly with traffic on multiple ingress ports that overload an egress port on the Device Under Test (DUT) and are not subject to the revised calculations presented in this memo. Likewise, the methods of [RFC8239] SHOULD be used for test cases where the egress-port buffer is the known point of overload.¶
4. Motivation
Section 3.1 of [RFC1242] describes the rationale for the Back-to-Back Frames benchmark. To summarize, there are several reasons that devices on a network produce bursts of frames at the minimum allowed spacing; and it is, therefore, worthwhile to understand the DUT limit on the length of such bursts in practice. The same document also states:¶
Tests of this parameter are intended to determine the extent of data buffering in the device.¶
Since this test was defined, there have been occasional discussions of the stability and repeatability of the results, both over time and across labs. Fortunately, the Open Platform for Network Function Virtualization (OPNFV) project on Virtual Switch Performance (VSPERF) Continuous Integration (CI) [VSPERF-CI] testing routinely repeats Back-to-Back Frame tests to verify that test functionality has been maintained through development of the test-control programs. These tests were used as a basis to evaluate stability and repeatability, even across lab setups when the test platform was migrated to new DUT hardware at the end of 2016.¶
When the VSPERF CI results were examined [VSPERF-b2b], several aspects of the results were considered notable:¶
Further, if the Throughput tests of Section 26.1 of [RFC2544] are conducted as a prerequisite, the number of
frame sizes required for Back-to-Back Frame benchmarking can be reduced
to one or more of the small frame sizes, or the results for large frame
sizes can be noted as invalid in the results if tested anyway. These are
the larger frame sizes for which the Back-to-Back Frame rate cannot
exceed the frame
The material below provides the details of the calculation to estimate the actual buffer storage available in the DUT, using results from the Throughput tests for each frame size and the Max Theoretical Frame Rate for the DUT links (which constrain the minimum frame spacing).¶
In reality, there are many buffers and packet
So, in the Back-to-Back Frame testing:¶
Knowledge of approximate buffer storage size (in time or bytes) may
be useful in estimating whether frame losses will occur if DUT forwarding
is temporarily suspended in a production deployment due to an
unexpected interruption of frame processing (an interruption of duration
greater than the estimated buffer would certainly cause lost frames). In
Section 6, the calculations for the correct buffer time use the
combination of offered load at Max Theoretical Frame Rate and header
The presentation of OPNFV VSPERF evaluation and development of
enhanced search algorithms [VSPERF-BSLV] was given and discussed at
IETF 102. The enhancements are intended to compensate for transient
processor interrupts that may cause loss at near-Throughput levels of offered
load. Subsequent analysis of the results indicates that buffers within
the DUT can compensate for some interrupts, and this finding increases
the importance of the Back-to-Back Frame characterizatio
5. Prerequisites
The test setup MUST be consistent with Figure 1 of [RFC2544], or Figure 2 of that document when the tester's sender and receiver are different devices. Other mandatory testing aspects described in [RFC2544] MUST be included, unless explicitly modified in the next section.¶
The ingress and egress link speeds and link-layer protocols MUST be specified and used to compute the Max Theoretical Frame Rate when respecting the minimum interframe gap.¶
The test results for the Throughput benchmark conducted according to
Section 26.1 of [RFC2544] for all frame sizes RECOMMENDED by [RFC2544] MUST be available to reduce
the tested
Note that:¶
The Back-to-Back Benchmark described in Section 3.1 of [RFC1242] MUST be measured directly by the tester, where buffer size is inferred from Back-to-Back Frame bursts and associated packet-loss measurements. Therefore, sources of frame loss that are unrelated to consistent evaluation of buffer size SHOULD be identified and removed or mitigated. Example sources include:¶
Mitigations applicable to some of the sources above are discussed in Section 6.2, with the other measurement requirements described below in Section 6.¶
6. Back-to-Back Frames
Objective: To characterize the ability of a DUT to process Back-to-Back Frames as defined in [RFC1242].¶
The procedure follows.¶
6.1. Preparing the List of Frame Sizes
From the list of RECOMMENDED frame sizes (Section 9 of [RFC2544]), select the subset of frame sizes whose Measured Throughput (during prerequisite testing) was less than the Max Theoretical Frame Rate of the DUT/test setup. These are the only frame sizes where it is possible to produce a burst of frames that cause the DUT buffers to fill and eventually overflow, producing one or more discarded frames.¶
6.2. Test for a Single Frame Size
Each trial in the test requires the tester to send a burst of frames (after idle time) with the minimum interframe gap and to count the corresponding frames forwarded by the DUT.¶
The duration of the trial includes three REQUIRED components:¶
The upper search limit for the time to send each burst MUST be configurable to values as high as 30 seconds (buffer time results reported at or near the configured upper limit are likely invalid, and the test MUST be repeated with a higher search limit).¶
If all frames have been received, the tester increases the length of the burst according to the search algorithm and performs another trial.¶
If the received frame count is less than the number of frames in the burst, then the limit of DUT processing and buffering may have been exceeded, and the burst length for the next trial is determined by the search algorithm (the burst length is typically reduced, but see below).¶
Classic search algorithms have been adapted for use in benchmarking, where the search requires discovery of a pair of outcomes, one with no loss and another with loss, at load conditions within the acceptable tolerance or accuracy. Conditions encountered when benchmarking the infrastructure for network function virtualization require algorithm enhancement. Fortunately, the adaptation of Binary Search, and an enhanced Binary Search with Loss Verification, have been specified in Clause 12.3 of [TST009]. These algorithms can easily be used for Back-to-Back Frame benchmarking by replacing the offered load level with burst length in frames. [TST009], Annex B describes the theory behind the enhanced Binary Search with Loss Verification algorithm.¶
There are also promising works in progress that may prove useful in Back-to-Back Frame benchmarking. [BMWG-MLRSEARCH] and [BMWG-PLRSEARCH] are two such examples.¶
Either the [TST009] Binary Search or Binary Search with Loss Verification algorithms MUST be used, and input parameters to the algorithm(s) MUST be reported.¶
The tester usually imposes a (configurable) minimum step size for burst length, and the step size MUST be reported with the results (as this influences the accuracy and variation of test results).¶
The original Section 26.4 of [RFC2544] definition is stated below:¶
The back-to-back value is the number of frames in the longest burst that the DUT will handle without the loss of any frames.¶
6.3. Test Repetition and Benchmark
On this topic, Section 26.4 of [RFC2544] requires:¶
The trial length MUST be at least 2 seconds and SHOULD be repeated at least 50 times with the average of the recorded values being reported.¶
Therefore, the Back-to-Back Frame benchmark is the average of burst-length values over repeated tests to determine the longest burst of frames that the DUT can successfully process and buffer without frame loss. Each of the repeated tests completes an independent search process.¶
In this update, the test MUST be repeated N times (the number of repetitions is now a variable that must be reported) for each frame size in the subset list, and each Back-to-Back Frame value MUST be made available for further processing (below).¶
6.4. Benchmark Calculations
For each frame size, calculate the following summary statistics for longest Back-to-Back Frame values over the N tests:¶
Further, calculate the Implied DUT Buffer Time and the Corrected DUT Buffer Time in seconds, as follows:¶
The formula above is simply expressing the burst of frames in units of time.¶
The next step is to apply a correction factor that accounts for the DUT's frame forwarding operation during the test (assuming the simple model of the DUT composed of a buffer and a forwarding function, described in Section 4).¶
where:¶
The term on the far right in the formula for Corrected DUT Buffer Time accounts for all the frames in the burst that were transmitted by the DUT while the burst of frames was sent in. So, these frames are not in the buffer, and the buffer size is more accurately estimated by excluding them. If Measured Throughput is not available, an acceptable approximation is the received frame rate (see Forwarding Rate in [RFC2889] measured during Back-to-back Frame testing).¶
7. Reporting
The Back-to-Back Frame results SHOULD be reported in the format of a table with a row for each of the tested frame sizes. There SHOULD be columns for the frame size and the resultant average frame count for each type of data stream tested.¶
The number of tests averaged for the benchmark, N, MUST be reported.¶
The minimum, maximum, and standard deviation across all complete
tests SHOULD also be reported (they are referred to as "Min,Max,Std
The Corrected DUT Buffer Time SHOULD also be reported.¶
If the tester operates using a limited maximum burst length in frames, then this maximum length SHOULD be reported.¶
Static and configuration parameters (reported with Table 1):¶
If the tester has a specific (actual) frame rate of interest (less than the Throughput rate), it is useful to estimate the buffer time at that actual frame rate:¶
and report this value, properly labeled.¶
8. Security Considerations
Benchmarking activities as described in this memo are limited to
technology characterizatio
The benchmarking network topology will be an independent test setup and MUST NOT be connected to devices that may forward the test traffic into a production network or misroute traffic to the test management network. See [RFC6815].¶
Further, benchmarking is performed on an "opaque-box" (a.k.a. "black-box") basis, relying solely on measurements observable external to the Device or System Under Test (SUT).¶
The DUT developers are commonly independent from the personnel and institutions conducting benchmarking studies. DUT developers might have incentives to alter the performance of the DUT if the test conditions can be detected. Special capabilities SHOULD NOT exist in the DUT/SUT specifically for benchmarking purposes. Procedures described in this document are not designed to detect such activity. Additional testing outside of the scope of this document would be needed and has been used successfully in the past to discover such malpractices.¶
Any implications for network security arising from the DUT/SUT SHOULD be identical in the lab and in production networks.¶
9. IANA Considerations
This document has no IANA actions.¶
10. References
10.1. Normative References
- [RFC1242]
-
Bradner, S., "Benchmarking Terminology for Network Interconnection Devices", RFC 1242, DOI 10
.17487 , , <https:///RFC1242 www >..rfc -editor .org /info /rfc1242 - [RFC2119]
-
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10
.17487 , , <https:///RFC2119 www >..rfc -editor .org /info /rfc2119 - [RFC2544]
-
Bradner, S. and J. McQuaid, "Benchmarking Methodology for Network Interconnect Devices", RFC 2544, DOI 10
.17487 , , <https:///RFC2544 www >..rfc -editor .org /info /rfc2544 - [RFC6985]
-
Morton, A., "IMIX Genome: Specification of Variable Packet Sizes for Additional Testing", RFC 6985, DOI 10
.17487 , , <https:///RFC6985 www >..rfc -editor .org /info /rfc6985 - [RFC8174]
-
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10
.17487 , , <https:///RFC8174 www >..rfc -editor .org /info /rfc8174 - [RFC8239]
-
Avramov, L. and J. Rapp, "Data Center Benchmarking Methodology", RFC 8239, DOI 10
.17487 , , <https:///RFC8239 www >..rfc -editor .org /info /rfc8239 - [TST009]
-
ETSI, "Network Functions Virtualisation (NFV) Release 3; Testing; Specification of Networking Benchmarks and Measurement Methods for NFVI", Rapporteur: A. Morton, ETSI GS NFV-TST 009 v3.4.1, , <https://
www >..etsi .org /deliver /etsi _gs /NFV -TST /001 _099 /009 /03 .04 .01 _60 /gs _NFV -TST009v030401p .pdf
10.2. Informative References
- [BMWG-MLRSEARCH]
-
Konstantynowicz, M., Ed. and V. Polák, Ed., "Multiple Loss Ratio Search for Packet Throughput (MLRsearch)", Work in Progress, Internet-Draft, draft
-ietf , , <https://-bmwg -mlrsearch -00 tools >..ietf .org /html /draft -ietf -bmwg -mlrsearch -00 - [BMWG-PLRSEARCH]
-
Konstantynowicz, M., Ed. and V. Polák, Ed., "Probabilistic Loss Ratio Search for Packet Throughput (PLRsearch)", Work in Progress, Internet-Draft, draft
-vpolak , , <https://-bmwg -plrsearch -03 tools >..ietf .org /html /draft -vpolak -bmwg -plrsearch -03 - [OPNFV-2017]
-
Cooper, T., Rao, S., and A. Morton, "Dataplane Performance, Capacity, and Benchmarking in OPNFV", , <https://
wiki >..anuket .io /download /attachments /4404001 /VSPERF -Dataplane -Perf -Cap -Bench .pdf ?version =1 &modification Date =1621191833500 &api =v2 - [RFC1944]
-
Bradner, S. and J. McQuaid, "Benchmarking Methodology for Network Interconnect Devices", RFC 1944, DOI 10
.17487 , , <https:///RFC1944 www >..rfc -editor .org /info /rfc1944 - [RFC2889]
-
Mandeville, R. and J. Perser, "Benchmarking Methodology for LAN Switching Devices", RFC 2889, DOI 10
.17487 , , <https:///RFC2889 www >..rfc -editor .org /info /rfc2889 - [RFC5180]
-
Popoviciu, C., Hamza, A., Van de Velde, G., and D. Dugatkin, "IPv6 Benchmarking Methodology for Network Interconnect Devices", RFC 5180, DOI 10
.17487 , , <https:///RFC5180 www >..rfc -editor .org /info /rfc5180 - [RFC6201]
-
Asati, R., Pignataro, C., Calabria, F., and C. Olvera, "Device Reset Characterizatio
n" , RFC 6201, DOI 10.17487 , , <https:///RFC6201 www >..rfc -editor .org /info /rfc6201 - [RFC6815]
-
Bradner, S., Dubray, K., McQuaid, J., and A. Morton, "Applicability Statement for RFC 2544: Use on Production Networks Considered Harmful", RFC 6815, DOI 10
.17487 , , <https:///RFC6815 www >..rfc -editor .org /info /rfc6815 - [VSPERF-b2b]
-
Morton, A., "Back2Back Testing Time Series (from CI)", , <https://
wiki >..anuket .io /display /HOME /Traffic+Generat or+Testing#Traff ic Generator Testing -Appendix B :Back2Back Testing Time Series (from CI ) - [VSPERF-BSLV]
-
Rao, S. and A. Morton, "Evolution of Repeatability in Benchmarking: Fraser Plugfest (Summary for IETF BMWG)", , <https://
datatracker >..ietf .org /meeting /102 /materials /slides -102 -bmwg -evolution -of -repeatability -in -benchmarking -fraser -plugfest -summary -for -ietf -bmwg -00 - [VSPERF-CI]
-
Tahhan, M., "OPNFV VSPERF CI", , <https://
wiki >..anuket .io /display /HOME /VSPERF+CI
Acknowledgments
Thanks to Trevor Cooper, Sridhar Rao, and Martin Klozik of the VSPERF
project for many contributions to the early testing [VSPERF-b2b]. Yoshiaki Itou has also investigated the topic
and made useful suggestions. Maciek Konstantyowicz and Vratko Polák also
provided many comments and suggestions based on extensive integration
testing and resulting search