Report from the IAB Workshop on Analyzing IETF Data (AID) 2021University of Amsterdammail@nielstenoever.netUniversity of Cambridgecorinnecath@gmail.comEricssonmirja.kuehlewind@ericsson.comUniversity of Glasgowcsp@csperkins.orgdata sciencedata analysisThe "Show me the numbers: Workshop on Analyzing IETF Data (AID)" workshop was convened by the Internet Architecture Board (IAB) from November 29 to December 2, 2021 and hosted by the IN-SIGHT.it project at the University of Amsterdam; however, it was converted to an online-only event. The workshop was organized into two discussion parts with a hackathon activity in between. This report summarizes the workshop's discussion and identifies topics that warrant future work and consideration.Note that this document is a report on the proceedings of the workshop. The views and positions documented in this report are those of the workshop participants and do not necessarily reflect IAB views and positions.Status of This Memo
This document is not an Internet Standards Track specification; it is
published for informational purposes.
This document is a product of the Internet Architecture Board
(IAB) and represents information that the IAB has deemed valuable
to provide for permanent record. It represents the consensus of the Internet
Architecture Board (IAB). Documents approved for publication
by the IAB are not candidates for any level of Internet Standard; see
Section 2 of RFC 7841.
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
() in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document.
Table of Contents
. Introduction
. Workshop Scope and Discussion
. Tools, Data, and Methods
. Observations on Affiliation and Industry Control
. Community and Diversity
. Publications, Process, and Decision Making
. Environmental Sustainability
. Hackathon Report
. Position Papers
. Tools, Data, and Methods
. Observations on Affiliation and Industry Control
. Community and Diversity
. Publications, Process, and Decision Making
. Environmental Sustainability
. Informative References
. Data Taxonomy
. Program Committee
. Workshop Participants
IAB Members at the Time of Approval
Acknowledgments
Authors' Addresses
IntroductionThe IETF, as an international Standards Developing Organization
(SDO), hosts a diverse set of data about the IETF's history and development, current standardization activities, Internet protocols, and the institutions that comprise the IETF. A large portion of this data is publicly available, yet it is underutilized as a tool to inform the work in the IETF or the broader research community that is focused on topics like Internet governance and trends in information and communication technologies (ICT) standard setting.The aim of the "IAB Workshop on Analyzing IETF Data (AID) 2021" workshop was to study how IETF data is currently used, to understand what insights can be drawn from that data, and to explore open questions around how that data may be further used in the future.These questions can inform a research agenda drawing from IETF data that fosters further collaborative work among interested parties, ranging from academia and civil society to industry and IETF leadership.Workshop Scope and DiscussionThe workshop was organized with two all-group discussion slots at the beginning and the end of the workshop. In between, the workshop participants organized hackathon activities based on topics identified during the initial discussion and in submitted position papers. The following topic areas were identified and discussed.Tools, Data, and MethodsThe IETF holds a wide range of data sources. The main ones used are the mailinglist archives, RFCs, and the datatracker. The latter provides information on participants, authors, meeting proceedings, minutes, and more. Furthermore, there are statistics for the IETF websites, the working group Github repositories, and the IETF survey data. There was discussion about the utility of download statistics for the RFCs themselves from different repos.There is a wide range of tools to analyze this data produced by IETF participants or researchers interested in the work of the IETF. Two projects that presented their work at the workshop were BigBang and Sodestream's IETFdata library. The RFC Prolog Database was described in a submitted paper; see . These projects could provide additional insight into existing IETF statistics and datatracker statistics, e.g., gender-related information. Privacy issues and the implications of making such data publicly available were discussed as well.The datatracker itself is a community tool that welcomes contributions; for example, for additions to the existing interfaces or the statistics page directly, see the Datatracker Database Overview. At the time of the workshop, instructions about how to set up a local development environment could be found at IAB AID Workshop Data Resources. Questions or discussion about the datatracker and possible enhancements can be sent to tools-discuss@ietf.org.Observations on Affiliation and Industry ControlA large portion of the submitted position papers indicated interest in researching questions about industry control in the standardization process (as opposed to individual contributions in a personal capacity), where industry control covers both a) technical contributions and the ability to successfully standardize these contributions and b) competition on leadership roles. To assess these questions, investigating participant affiliations, including "indirect" affiliations (e.g., by tracking funding and changes in affiliation) was discussed. The need to model company characteristics or stakeholder groups was also discussed. Discussion about the analysis of IETF data shows that affiliation dynamics are hard to capture due to the specifics of how the data is entered and because of larger social dynamics. On the side of IETF data capture, affiliation is an open text field that causes people to write their affiliation down in different ways (e.g., capitalization, space, word separation, etc). A common data format could contribute to analyses that compare SDO performance and behavior of actors inside and across standards bodies. To help with this, a draft data model was developed during the hackathon portion of the workshop; the data model can be found in .Furthermore, there is the issue of mergers, acquisitions, and subsidiary companies. There is no authoritative exogenous source of variation for affiliation changes, so hand-collected and curated data is used to analyze changes in affiliation over time. While this approach is imperfect, conclusions can be drawn from the data. For example, in the case of mergers or acquisition where a small organization joins a large organization, this results in a statistically significant increase in likelihood of an individual being put in a working group chair position (see the document by Baron and Kanevskaia).Community and DiversityThe workshop participants were highly interested in using existing data to better understand who the current IETF community is. They were also interested in the community's diversity and how to potentially increase it and thereby increase inclusivity, e.g., understanding if there are certain factors that "drive people away" and why. Inclusivity and transparency about the standardization process are generally important to keep the Internet and its development process viable. As commented during the workshop discussion, when measuring and evaluating different angles of diversity, it is also important to understand the actual goals that are intended when increasing diversity, e.g., in order to increase competence (mainly technical diversity from different companies and stakeholder groups) or relevance (also regional diversity and international footprint).The discussion on community and diversity spanned from methods that draw from novel text mining, time series clustering, graph mining, and psycholinguistic approaches to understand the consensus mechanism to more speculative approaches about what it would take to build a feminist Internet. The discussion also covered the data needed to measure who is in the community and how diverse it is.The discussion highlighted that part of the challenge is defining what diversity means and how to measure it, or as one participant highlighted, defining "who the average IETFer is". There was a question about what to do about missing data or non-participating or underrepresented communities, like women, individuals from the African continent, and network operators. In terms of how IETF data is structured, various researchers mentioned that it is hard to track conversations because mail threads split, merge, and change. The ICANN-at-large model came up as an example of how to involve relevant stakeholders in the IETF that are currently not present. Conversely, it is also interesting for outside communities (especially policy makers) to get a sense of who the IETF community is and keep them updated.The human element of the community and diversity was highlighted. In order to understand the IETF community's diversity, it is important to talk to people (beyond text analysis). In order to ensure inclusivity, individual participants must make an effort to, as one participant recounted, tell them their participation is valuable.Publications, Process, and Decision MakingA number of submissions focused on the RFC publication process, on the
development of standards and other RFCs in the IETF, and on how the IETF
makes decisions.
This included work on technical decisions about the content of the
standards, on procedural and process decisions, and on questions around
how we can understand, model, and perhaps improve the standards process.
Some of the work considered what makes an RFC successful, how RFCs are
used and referenced, and what we can learn about the importance of a topic
by studying the RFCs, Internet-Drafts, and email discussions.There were three sets of questions to consider in this area. The first question related to the success and failure of standards and considered:
What makes a successful and good RFC?
What makes the process of making RFCs successful?
How are RFCs used and referenced once published?
Discussion considered how to better understand the path from an Internet-Draft to an RFC, to see if there are specific factors that lead to successful
development of an Internet-Draft into an RFC. Participants explored the extent to
which this depends on the seniority and experience of the authors, on the
topic and IETF area, on the extent and scope of mailing list discussion, and other
factors, to understand whether success of an Internet-Draft can be predicted and
whether interventions can be developed to increase the likelihood of
success for work.The second question focused on decision making.
How does the IETF make design decisions?
What are the bottlenecks in effective decision making?
When is a decision made? And what is the decision?
Difficulties here lie in capturing decisions and the results of consensus calls early in the process, and understanding the factors that lead to effective decision making.Finally, there were questions regarding what can be learned about protocols by
studying IETF publications, processes, and decision making. For example:
Are there insights to be gained around how security concerns are discussed and considered in the development of standards?
Is it possible to verify correctness of protocols and detect ambiguities?
What can be learned by extracting insights from implementations and activities on implementation
efforts?
Answers to these questions will come from analysis of IETF emails, RFCs and
Internet-Drafts, meeting minutes, recordings, Github data, and external
data such as surveys, etc.Environmental SustainabilityThe final discussion session considered environmental sustainability. Topics included what the IETF's role with respect to climate change, both in
terms of what is the environmental impact of the way the IETF develops
standards and in terms of what is the environmental impact of the
standards the IETF develops.Discussion started by considering how sustainable IETF meetings are,
focusing on the amount of carbon dioxide (CO2) emissions IETF meetings are responsible for
and how can we make the IETF more sustainable. Analysis looked at the
home locations of participants, meeting locations, and carbon footprint
of air travel and remote attendance to estimate the CO2 costs of an
IETF meeting. While the analysis is ongoing, initial results suggest that the costs of holding multiple
in-person IETF meetings per year are likely unsustainable in terms of CO2
emission.
The extent to which climate impacts are
considered during the development and standardization of Internet
protocols was discussed. RFCs and Internet-Drafts of active working groups
were reviewed for relevant keywords to highlight the extent to
which climate change, energy efficiency, and related topics were
considered in the design of Internet protocols. This review revealed the limited
extent to which these topics have been considered. There is ongoing work to get
a fuller picture by reviewing meeting minutes and mail archives as well, but initial
results show only limited consideration of these important issues.
Hackathon ReportThe middle two days of the workshop were organized as a hackathon. The aims of the hackathon were to 1) acquaint people with the different data sources and analysis methods, 2) seek to answer some of the questions that came up during presentations on the first day of the workshop, and 3) foster collaboration among researchers to grow a community of IETF data researchers.At the end of Day 1, the plenary presentation day, people were invited to divide themselves into groups and select their own respective facilitators. All groups had their own work space and could use their own communication methods and channels. Furthermore, daily check-ins were organized during the two hackathon days. On the final day, the hackathon groups presented their work in a plenary session.According to the co-chairs, the objectives of the hackathon have been met, and the output significantly exceeded expectations. It allowed more interaction than academic conferences and produced some actual research results by people who had not collaborated before the workshop.Future workshops that choose to integrate a hackathon could consider asking participants to submit issues and questions beforehand (potentially as part of the position papers or the sign-up process) to facilitate the formation of groups.Position PapersTools, Data, and MethodsSebastian Benthall, "Using Complex Systems Analysis to Identify Organizational Interventions"Stephen McQuistin and Colin Perkins, "The ietfdata Library"Marc Petit-Huguenin, "The RFC Prolog Database"Jari Arkko, "Observations about IETF process measurements"Observations on Affiliation and Industry ControlJustus Baron and Olia Kanevskaia, "Competition for Leadership Positions in Standards Development Organizations"Nick Doty, "Analyzing IETF Data: Changing affiliations"Don Le, "Analysing IETF Data Position Paper"Elizaveta Yachmeneva, "Research Proposal"Community and DiversityPriyanka Sinha, Michael Ackermann, Pabitra Mitra, Arvind Singh, and Amit Kumar Agrawal, "Characterizing the IETF through its consensus mechanisms"Mallory Knodel, "Would feminists have built a better internet?"Wes Hardaker and Genevieve Bartlett, "Identifying temporal trends in IETF participation"Lars Eggert, "Who is the Average IETF Participant?"Emanuele Tarantino, Justus Baron, Bernhard Ganglmair, Nicola Persico, and Timothy Simcoe, "Representation is Not Sufficient for Selecting Gender Diversity"Publications, Process, and Decision MakingMichael Welzl, Carsten Griwodz, and Safiqul Islam, "Understanding Internet Protocol Design Decisions"Ignacio Castro et al., "Characterising the IETF through the lens of RFC deployment"Carsten Griwodz, Safiqul Islam, and Michael Welzl, "The Impact of Continuity"Paul Hoffman, "RFCs Change"Xue Li, Sara Magliacane, and Paul Groth, "The Challenges of Cross-Document Coreference Resolution in Email"Amelia Andersdotter, "Project in time series analysis: e-mailing lists"Mark McFadden, "A Position Paper by Mark McFadden"Environmental SustainabilityChristoph Becker, "Towards Environmental Sustainability with the IETF"Daniel Migault, "CO2eq: Estimating Meetings' Air Flight CO2 Equivalent Emissions: An Illustrative Example with IETF meetings"Informative ReferencesAnalysing IETF Position PaperArticle 19Analyzing IETF Data: Changing affiliationsDocument StatisticsWho is the Average IETF Participant?Welcome to BigBang's documentation!BigBangCO2eq: Estimating Meetings' Air Flight CO2 Equivalent Emissions: An Illustrative Example with IETF meetingUsing Complex Systems Analysis to Identify Organizational InterventionsInformation Law InstituteCharacterizing the IETF through its consensus mechanismsThe Impact of ContinuityThe Challenges of Cross-Document Coreference Resolution in EmailDatatracker Database Overviewfor the IAB AID WorkshopIAB AID Workshop Data ResourcesDatatrackerIETFStatisticsIETFUnderstanding Internet Protocol Design DecisionsProject in time series analysis: e-mailing listsTowards Environmental Sustainability with the IETFWould feminists have built a better internet?Representation is Not Sufficient for Selecting Gender DiversityRFCsIETFWeb analyticsIETFIETF DataInternet Protocols Laboratorycommit c53bf15The ietfdata LibraryCompetition for Leadership Positions in Standards Development OrganizationsMail ArchiveIETFObservations about IETF process measurementsA Position PaperThe RFC Prolog DatabaseResearch ProposalCharacterising the IETF through the lens of RFC deploymentRFCs ChangeIETF Community Survey 2021IETFIdentifying temporal trends in IETF participationData Taxonomy
A Draft Data Taxonomy for SDO Data:
Organization:
Organization Subsidiary
Time
Email domain
Website domain
Size
Revenue, annual
Number of employees
Org - Affiliation Category (Labels) ; 1 : N
Association
Advertising Company
Chipmaker
Content Distribution Network
Content Providers
Consulting
Cloud Provider
Cybersecurity
Financial Institution
Hardware vendor
Internet Registry
Infrastructure Company
Networking Equipment Vendor
Network Service Provider
Regional Standards Body
Regulatory Body
Research and Development Institution
Software Provider
Testing and Certification
Telecommunications Provider
Satellite Operator
Org - Stakeholder Group : 1 - 1
Academia
Civil Society
Private Sector -- including industry consortia and associations;
state-owned and government-funded businesses
Government
Technical Community (IETF, ICANN, ETSI, 3GPP, oneM2M, etc)
Intergovernmental organization
SDO:
Membership Types (SDO)
Members (Organizations for some, individuals for others...)
Membership organization
Regional SDO
ARIB
ATIS
CCSA
ETSI
TSDSI
TTA
TTC
Consortia
Country of Origin:
Country Code
Number of Participants
Patents
Organization
Authors - 1 : N - Persons/Participants
Time
Region
Patent Pool
Standard Essential Patent
If so, for which standard
Participant (An individual person)
Name
1: N - Emails
Time start / time end
1 : N : Affiliation
Organization
Position
Time start / end
1 : N : Affiliation - SDO
Position
SDO
Time
Email Domain (personal domain)
(Contribution data is in other tables)
Document
Status of Document
Internet Draft
Work Item
Standard
Author -
Name
Affiliation - Organization
Person/Participant
(Affiliation from Authors only?)
Data Source - Provenance for any data imported from an external data set
Meeting
Time
Place
Agenda
Registrations
Name
Email
Affiliation
Program CommitteeThe workshop Program Committee members were (Chair, University of Amsterdam), (Chair, IRTF, University of Glasgow), (Chair, Oxford Internet Institute), (IAB, Ericsson), (IAB, Huawei), and (IAB, USC/ISI).Workshop ParticipantsThe Workshop Participants were ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
, and
.IAB Members at the Time of Approval Internet Architecture Board members at the time this document was
approved for publication were:
AcknowledgmentsThe Program Committee wishes to extend its thanks to for logistics support and to for note-taking.We would like to thank the Ford Foundation for their support that made participation of , , and possible (grant number, 136179, 2020).Efforts put in this workshop by were made possible through funding from the Dutch Research Council (NWO) through grant MVI.19.032 as part of the program 'Maatschappelijk Verantwoord Innoveren (MVI)'.Efforts in the organization of this workshop by were supported in part by the UK Engineering and Physical Sciences Research Council under grant EP/S036075/1.Authors' AddressesUniversity of Amsterdammail@nielstenoever.netUniversity of Cambridgecorinnecath@gmail.comEricssonmirja.kuehlewind@ericsson.comUniversity of Glasgowcsp@csperkins.org