How to achieve end-to-end network visibility? | Blog

The need for adaptive service assurance

With the move to 5G, service providers face three immediate challenges: 1) networks are becoming orders of magnitude more complex, 2) as they virtualize and go cloud-native there is a loss of network visibility, and 3) users have increasingly higher expectations of quality of experience. Network complexity is driven by phenomena such as the separation of control and user planes (CUPS), 5G standalone’s cloud-native network functions, services-based architectures (5G), network densification, etc. As a result, more parts of the telecom stack are generating more performance, fault and telemetry data. This is leaving most service providers simply overwhelmed, unable to respond to and resolve service degradations promptly.

Yet, the need for timely identification and resolution of network performance impairments has never been greater. The business cases driving 5G investment require levels of throughput, responsiveness, and availability far above what service providers have typically committed to. However, this time they are guaranteed by service level agreements, so there is an explicit cost associated when the network fails to live up to standards.

While the move to a software-based 5G core was predicated on simplifying the telecom stack, the deployment options on offer only serve to complicate the effort. The eventual move to Open RAN, which decomposes previously purpose-built hardware into a multi-vendor IT-centric tech stack, will further complicate the picture. Service providers have a pressing need to understand the performance of Kubernetes-based orchestration environments, as well as individual node/pod and container metrics before attempting to relate them to the customer experience.

In short, the old ways of identifying and resolving performance impairments no longer work at scale. Today’s service assurance tools are no longer able to assure service in a way that enables service providers to deliver on their customer commitments and their service level agreements. Tools designed for 4G—and earlier—networks cannot handle the volumes of data being generated by the network nor the multiplication of data sources. 5G performance impairments cross domains, layers and services, making it difficult to achieve end-to-end network visibility or empowering the right teams to resolve problems.

A new approach to 5G service assurance is needed. Something that we call adaptive service assurance. The key word is “adaptive”. Why is that important though? There are several reasons that an adaptive approach to service assurance is now necessary:

Only way to achieve end-to-end network visibility across a broad range of data sources
Only way to deal with the volume of data being generated
Only way to cover complexity across physical, virtualized and cloud-native networks
Only way to scale to address multiplication of network function instances and densification of the network footprint
Only way to correlate network and IT infrastructure performance with customer QoE
More cost-effective deployment model, particularly for consumption-based cloud-native solutions

Achieving adaptive service assurance

We have established the need for a new kind of service assurance, one that adapts to a service provider’s specific telecom and IT stack and the performance, fault and telemetry data that it generates. Let’s now consider how service providers can go about achieving an adaptive service assurance posture for their organizations.

Use existing data sources, supplementing them where possible.

First, make use of the data that is currently available. This can come from existing EXFO or third-party probes and agents, such as passive probes or synthetic testing solutions. It can also be sourced from OSS and BSS systems that offer northbound data feeds or API access.

Next, that data should be supplemented with new sources from along the service path and from within the tech/infrastructure stack, including virtual workloads. With customer experience being so critical to the success of 5G, it’s important to have network visibility at different points from core to RAN and up and down the IT stack.

Finally, service providers should consider adding entirely new sources of data such as social media sentiment data, vehicular traffic data, or street-level crowd data to understand the conditions at a specific time and place in the networks.

Sample data adaptively

The old approach of collecting and storing everything in case it might be needed later for analysis is no longer tenable. In a cloud-native world where telecom providers are hosting some (or all) of their network functions on hyperscale platforms and consumption charges are the norm, it’s no longer economically feasible to store everything. Thus, the need for adaptive data sampling. This means collecting only the data that is required to correlate network performance with the customer experience.

Adaptive transport and storage

Due to the sheer volume of data, let alone the economics of cloud-native environments, it’s important to limit the volumes of data that need to be transported over the network. Instead of transporting raw data from the RAN to the core for analysis, instead analyze the data locally and then stream the resulting KPIs and KQIs over the network. Done right, transport volumes can be reduced by up to 90% or more. This approach keeps the raw performance data local, while maintaining availability for deep troubleshooting.

Adaptive data collection

Where and when data is collected is also an important consideration. It’s critical to be able to instantiate monitoring points and test agents as network conditions change. In addition to deploying probes and agents all along the service path, it’s important to instrument the service mesh where inter-service communication takes place as well as in the full compute stack. In the case of the latter, there is value in being able to tie resource usage (i.e., CPU, network card, memory) to network performance and customer experience.

Adaptive to infrastructure

An adaptive service assurance platform is one that works with the network and IT infrastructure that are already in place. It should feature cloud-native components—essentially future proofing it—facilitating deployment and Kubernetes-based orchestration on the major hyperscalers’ cloud environments, as well as on discrete platforms such as the Oracle Cloud and Red Hat OpenShift. It can also be deployed in hybrid contexts as well as through virtual machines hosted on premises. With respect to the RAN, it should work with sole source and multi-vendor environments, with native parsing of data generated by equipment from Ericsson, Nokia, Samsung, Huawei, ZTE, and Corning, among others.

Adaptive to work scenarios and use cases

Finally, an adaptive service platform meets the needs of various service provider teams as well as use cases. Ideally, it will feature a shared user experience, with common monitoring, reporting and troubleshooting tools. These have been shown to improve the productivity of personnel as well as reduce time to resolution. Use cases enabled by such an approach that crosses network domains, layers and services include:

QoE-optimized 5G transport
5G transport QoS assurance.
Xhaul fiber monitoring
Mobile edge computing (MEC) assurance
5G infrastructure assurance
Private network assurance

The EXFO adaptive service assurance platform

To meet the needs of adaptive service assurance, EXFO has developed a platform that offers the following characteristics:

Source-agnostic data ingestion with support for data from network functions, infrastructure, probes, parsers, EXFO service assurance solutions, OSS and BSS systems, and third-party service assurance solutions.
Adaptive data collection, with the ability to collect and store data at the network edge, streaming only data required for KPI/KQI creation used in dashboards, monitoring and alerting, while maintaining accessibility of original data for troubleshooting drilldowns. Adaptive data collection reduces the cost of probe data capture, transport and storage by up to 90% or more.
A rich data-processing layer with the ability to correlate, enrich and stitch data records to provide unique insight into actual network performance and the lived customer experience.
A suite of applications that work together to provide a unified, intuitive user experience for troubleshooting and reporting across multiple service assurance use cases.
Platform-agnostic deployment with support for the big 3 hyperscaler cloud platforms and Kubernetes environments, Oracle Cloud and Red Hat OpenShift, and virtualized and bare metal scenarios.

The need for a new type of telecom service assurance: adaptive service assurance delivers end-to-end network visibility

Mark Hiseman