Network Service YANG Modules [RFC8199] describe the configuration, state data, operations, and notifications of abstract representations of services implemented on one or multiple network elements.¶
Service orchestrators use Network Service YANG Modules that will infer network-wide configuration and, therefore, the invocation of the appropriate device modules (Section 3 of [RFC8969]).
Knowing that a configuration is applied doesn't imply that the provisioned service instance is up and running as expected.
For instance, the service might be degraded because of a failure in the network, the service quality may be degraded, or a service function may be reachable at the IP level but does not provide its intended function.
Thus, the network operator must monitor the service's operational data at the same time as the configuration (Section 3.3 of [RFC8969]).
To fuel that task, the industry has been standardizing on telemetry to push network element performance information (e.g., [RFC9375]).¶
A network administrator needs to monitor its network and services as a whole, independently of the management protocols.
With different protocols come different data models and different ways to model the same type of information.
When network administrators deal with multiple management protocols, the network management entities have to perform the difficult and time-consuming job of mapping data models,
e.g., the model used for configuration with the model used for monitoring when separate models or protocols are used.
This problem is compounded by a large, disparate set of data sources (e.g., MIB modules, YANG data models [RFC7950], IP Flow Information Export (IPFIX) information elements [RFC7011], syslog plain text [RFC5424], Terminal Access Controller Access-Control System Plus (TACACS+) [RFC8907], RADIUS [RFC2865], etc.).
In order to avoid this data model mapping, the industry converged on model-driven telemetry to stream the service operational data, reusing the YANG data models used for configuration.
Model-driven telemetry greatly facilitates the notion of closed-loop automation, whereby events and updated operational states streamed from the network drive remediation change back into the network.¶
However, it proves difficult for network operators to correlate the service degradation with the network root cause,
for example, "Why does my layer 3 virtual private network (L3VPN) fail to connect?" or "Why is this specific service not highly responsive?"
The reverse, i.e., which services are impacted when a network component fails or degrades, is also important for operators,
for example, "Which services are impacted when this specific optic decibel milliwatt (dBm) begins to degrade?",
"Which applications are impacted by an imbalance in this Equal-Cost Multipath (ECMP) bundle?", or "Is that issue actually impacting any other customers?"
This task usually falls under the so-called "Service Impact Analysis" functional block.¶
This document defines an architecture implementing Service Assurance for Intent-based Networking (SAIN).
Intent-based approaches are often declarative, starting from a statement of "The service works as expected" and trying to enforce it.
However, some already-defined services might have been designed using a different approach.
Aligned with Section 3.3 of [RFC7149], and instead of requiring a declarative intent as a starting point,
this architecture focuses on already-defined services and tries to infer the meaning of "The service works as expected".
To do so, the architecture works from an assurance graph, deduced from the configuration pushed to the device for enabling the service instance.
If the SAIN orchestrator supports it, the service model (Section 2 of [RFC8309]) or the network model (Section 2.1 of [RFC8969]) can also be used to build the assurance graph.
In that case and if the service model includes the declarative intent as well, the SAIN orchestrator can rely on the declared intent instead of inferring it.
The assurance graph may also be explicitly completed to add an intent not exposed in the service model itself.¶
The assurance graph of a service instance is decomposed into components, which are then assured independently.
The top of the assurance graph represents the service instance to assure, and its children represent components identified as its direct dependencies; each component can have dependencies as well.
Components involved in the assurance graph of a service are called subservices.
The SAIN orchestrator updates the assurance graph automatically when the service instance is modified.¶
When a service is degraded, the SAIN architecture will highlight where in the assurance service graph to look, as opposed to going hop by hop to troubleshoot the issue.
More precisely, the SAIN architecture will associate to each service instance a list of symptoms originating from specific subservices, corresponding to components of the network.
These components are good candidates for explaining the source of a service degradation.
Not only can this architecture help to correlate service degradation with network root cause/symptoms, but it can deduce from the assurance graph the list of service instances impacted by a component degradation/failure.
This added value informs the operational team where to focus its attention for maximum return.
Indeed, the operational team is likely to focus their priority on the degrading/failing components impacting the highest number of their customers, especially the ones with the Service-Level Agreement (SLA) contracts involving penalties in case of failure.¶
This architecture provides the building blocks to assure both physical and virtual entities and is flexible with respect to services and subservices of (distributed) graphs and components (Section 3.7).¶
The architecture presented in this document is implemented by a set of YANG modules defined in a companion document [RFC9418].
These YANG modules properly define the interfaces between the various components of the architecture to foster interoperability.¶