QoS Measurement Standpoint | Blog

Timing accuracy

Accurate multipoint measurements require precise synchronization of clocks of the devices used to measure. If, for example, time stamps are used in packets, careful clock synchronization is needed between end devices in order to calculate delay to both directions—this clock synchronization now acts as the shared knowledge of the separate measurement points.

Timing accuracy is not a trivial problem, since computer real-time clocks are inaccurate—as they might drift several seconds in a day. This is definitely not enough, since (at least) millisecond-order accuracy is usually needed in network-delay measurement.

Network time protocol (NTP) is a cost effective way to reach synchronization. Maximum accuracy for time synchronization using NTP is about one millisecond. To reach maximum accuracy, the NTP server must remain in the same local area network (LAN) as the NTP client in order to minimize delay and jitter between the server and the client. NTP is quite slow to obtain accurate synchronization; stabilization time of about an hour is sometimes needed.

Network address translation

Network address translation (NAT) is the process of modifying network address information in datagram packet headers while in transit across a traffic routing device for the purpose of remapping a given address space into another.

NAT has effects on measurements in IP networks because it intends to masquerade source IP addresses, it makes two-point measurements—relying on finding hard-to-find IP addresses or events that are impossible to perform.

Today, NAT is most often used in conjunction with network masquerading (or IP masquerading), which is a technique that hides an entire address space, usually consisting of private network addresses (RFC 1918) behind a single IP address in another, often public, address space.

This mechanism is implemented in a routing device that uses stateful translation tables to map the ‘hidden’ addresses into a single address and then rewrites the outgoing Internet protocol (IP) packets on exit, so that they appear to originate from the router. In the reverse communications path, responses are mapped back to the originating IP address using the rules (i.e., the state) stored in the translation tables. The translation table rules established in this fashion are flushed after a short period without new traffic refreshing their state.

As described, the method only allows transit traffic through the router when it is originating in the masqueraded network, since this establishes the translation tables. However, most NAT devices today allow the network administrator to configure translation table entries for permanent use. This feature is often referred to as "static NAT" or "port forwarding" and allows traffic originating in the "outside" network to reach designated hosts in the masqueraded network.

Quality-of-Service metrics

The most essential and general-performance, quality-of-service (QoS) metrics are:

Delay
Jitter
Throughput
Packet loss

Another somewhat different, yet important, metric is data-transmission efficiency, i.e., how much control information is needed in order to get the actual data through; however, this metric is not usually taken into consideration—even though in power-limited wireless devices, it is quite essential for power-consumption minimization.

Jitter is also related to delay, since it is understood as the variation of the delay. Jitter has several definitions, e.g., the maximum variation of the delay, but the most common one is probably the standard deviation of the delay. Jitter can be calculated from the exact delay measurements, but not from the average values. Jitter is easier to calculate than absolute delay, since it only needs the delay difference between sequential packets, but not absolute clock synchronization of the measurement points. Jitter can be controlled with buffers, but this is done at the expense of delay.

Packet loss is more or less an independent metric in comparison to the other metrics—even though some approximation of it can be drawn from raw throughput if the offered traffic load is known. Lost packets are usually detected via sequence numbers and automate repeat request (ARQ)-methods. Packet loss has its effect to the other metrics. For example, consider an application layer jitter measurement, where jitter is measured directly from sequentially arriving packets. If lower layers fail to deliver some packets correctly, jitter is increased because of the gaps of missing packets. It might also be the case that the application layer packets are all delivered correctly, but jitter is still increased because of the erroneous lower-layer packets and re-transmissions provided by ARQ-methods; in this case, delay is also increased.

One-way delay

One-way delay (OWD) is the elapsed time for traversal of a packet between measurement points. For example, OWD can be used to locate network bottlenecks, i.e., to locate which part of the network causes the most of the overall delay. OWD measurement is useful to ensure that the network operates in congestion situations—when QoS methods like differentiated services (DiffServ) are used in the network to ensure good QoS for critical applications.

Jitter

Jitter characterizes the variation in network delay; it is generally computed as the variation of the OWD for two consecutive packets (see Figure 2). Jitter manifests through distortion of voice or video due to consecutive packets arriving at irregular intervals, and severe jitter causes jittery or shaky voice quality, reducing intelligibility:

Propagation delay can vary as network topology changes—when a link fails, for example, or when a lower-layer network’s topology changes—which causes a sudden peak in jitter. Current IP backbone research suggests that these occurrences are more common than generally believed.
Switching delay can vary because some packets might require more processing than others. However, this effect is becoming less of a consideration because packet switching is increasingly implemented via hardware pipelines whose switching-delay characteristics are deterministic.
Scheduling delay variation occurs as scheduling queues oscillate between empty and full. Jitter buffers (also known as "play-out buffers") remove delay variation by turning variable network delays into constant delays at destination end systems.

Jitter buffers are added to end-to-end delay; therefore, networks engineered to support low-delay services (such as VoIP) should also be engineered for low jitter. Adaptive jitter buffers aim to reduce additional delay to a minimum by dynamically tuning jitter buffer size to the lowest acceptable value. The algorithms that adaptive jitter buffers use, however, can place constraints on the maximum rate of change in jitter between consecutive packets.

Jitter is of the following types:

Constant jitter: In this, the variation in delay is more or less constant.
Transient jitter: An unnatural incremental delay, sometimes only by single packets.
Short-term delay variation: It occurs due to changing routes and exhibits increasing delay for some packets as well as an increase in packet-to-packet delay.

Examples of delay:

System packet scheduling delay: It is a transient jitter. VoIP with softphones often experiences jitter as more than one program may be running on the CPU; thereby slowing it and transmission time jitter is introduced.
Congestion in the LAN: This is a transient jitter and occurs for short durations and is governed by the maximum back-off time and the delay between packets. If the LAN cannot be contacted by the VoIP endpoint and the back-off time limit is reached or if another packet is ready for transmission, then the previous packet may be dropped. 10 Mbit Ethernet has a high back-off time as compared to the VoIP packet spacing, and hence, the jitter limits are governed more by the packet spacing and are usually in the range of 10 to 30 milliseconds.
Firewall routers: When there are transient delays, as well as short-term variations, firewall routers, such as double socket routers, re-establish an IP flow on the inner side of the firewalls after they have terminated it on the outer side. This helps in regulating the payload that gets forwarded to the inner networks; however, this leads to variable delay.
Access links: These lead to short-term variations and are often responsible for jitter as they constitute a bottleneck in the network. As ISDN and cable modems have bandwidth problems, the jitter introduced due to access links can be severe, sometimes up to 30 milliseconds of delay for each packet.
Load sharing: Load sharing between IP service providers can lead to a constant jitter. Sometimes, multiple access links are routed through one IP service provider and this can lead to jitter if the delays across the links differ.
Load sharing by an IP service: It can lead to a constant jitter—when IP service providers route traffic over more than one internal route in order to even out the load on the network, the difference in delay on each route can lead to delay.
Load sharing within routers: It results in a constant jitter—when routers process packets in multiple queues in order to boost router capacity, it can lead to low levels of jitter. In order to support high capacity, some routers employ a multiprocessing approach in which packets are processed by multiple parallel queues that can introduce low levels of jitter due to short-term differences in queue size.
Routing table updates: These can lead to transient jitters. Routers perform periodic updates in order to ascertain packet priority and dispatch the high-priority packet first; this can lead to a delay in the transmission of some packets and sometimes some packets can experience very high delays.
Route flapping: This causes transient jitters and can be traced to varying levels of congestion and link breakdowns; route flapping occurs when a routing table is updated and is characterized by a low frequency oscillation.
Timing drifts: It causes transient jitters and can result in ‘jitter buffer events’, in which the buffer can either be overfilled or it has excess capacity; the timing can be reset if an NTP server is used.

Bandwidth and throughput

Each IP telephony call requires a minimum quantity of bandwidth. If the network throughput does not meet the minimum requirements, the call cannot be established and any ongoing calls will be discontinued. IP services are commonly sold with defined bandwidth that reflects the services’ access-link capacity. However, defined bandwidth is not always the same as achieved throughput. Throughput characterizes the available user bandwidth between the defined network’s ingress and egress points. The requirement for this service level agreement (SLA) parameter is obvious for point-to-point services such as virtual wires, as defined by the Pseudo Wire Emulation Edge to Edge working group within the IETF. For multipoint-to-multipoint services, such as IP virtual private networks, the SLA definition must exclude cases in which throughput loss is due to customer-based aggregation.

Achieved throughput for TCP/IP traffic largely depends on packet-loss probability and round-trip delay time. Consequently, achieved throughput might not relate to contracted access bandwidth.

Packet loss

Loss characterizes the packet drops that occur between defined network ingress and egress points. Network congestion results in the loss of voice and data packets across the network, disrupting speech and creating signaling problems. For VoIP codecs commonly support concealment algorithms, which can hide the effects of losing 30 ms of voice samples. The loss of two or more consecutive 20 ms voice samples thus results in a noticeable degradation of voice quality.

QoS measurement methods

Passive multipoint measurement

The passive method works as follows: QoS agents are connected non-intrusively (parallel) to the network path under test. By passing measurement points, packets are captured and analyzed by QoS agents. Capture and analysis results, such as network flows, packet IDs and packet time-stamps, are transmitted to other measurement points/QoS agents via a separate measurement connection. To calculate delay, flows are matched using IP addresses, protocols and ports. Packets are matched using packet IDs; delay is calculated from the packet time-stamps.

This method is referred to as ‘passive’ because the actual streams are not modified; hence it is not fully passive because measurement data needs to be transported between the measurement points. However, there is a big drawback to the passive method: it does not work in real networks. Network flows cannot be matched in case there is a NAT between measurement points. In order to use passive multipoint measurement, there cannot be a NAT between the measurement points, or NATs must be bypassed, using some kind of tunneling, such as mobile IP.

Active multipoint measurement

In the active measurement method, measurement points are connected to the network path under test so that traffic traverses through a measurement device. Delay is calculated from the probe packets sent between QoS agents; probe packets contain the transmit time-stamp—from which delay can be calculated. Tests, such as throughput tests, etc., are also possible, because QoS agents are connected serially to the network path.

The measurement connection is established between the measuring devices, so measurement can be performed also in a NAT environment.

Single-point measurements provide end-to-end performance information, which can be used to determine QoS. This setup enables the possibility to measure round-trip time (RTT), i.e., the time from the initiation of the service request to the reception of the service reply. This information is a valuable QoS metric and gives direct insight into the total performance of the system. From RTT (tRTT), it is easy to calculate the average throughput of the system if the amount of transferred data (Nd) is known:

In this way, delay and throughput are related, but the relation is not necessarily strict. Consider, e.g., a single-point passive measurement, where passing traffic is measured at some point of a network. In this, throughput can be easily measured, but one-way delay cannot.

What's more, network flow accounting can be used to monitor QoS from a single network point. NetFlow is a technology developed for flow accounting; in NetFlow-based accounting, the network flows are captured and collected from a single network point and the collected data is further stored to flow database for post processing. NetFlow technology efficiently provides the metering base for a key set of applications including network traffic accounting, usage-based network billing, network planning, as well as dial-services and network monitoring, as well as outbound marketing and data-mining capabilities.

Standpoint for quality-of-service measurement

EXFO