Network Performance Metrics
Network performance monitoring is the process used to track, evaluate and diagnose the performance of a network. With the variety of devices, technologies and network environments continuing to expand, the definition of optimal performance can vary significantly.
Network performance metrics are the measurable outputs that indicate how the infrastructure and services are operating as a part of short-term and long-term network performance evaluations. Real-time analysis of these metrics allows teams to identify potential problems on the network and prioritize IT resources and response according to impact. Over time network performance metrics support a long-term understanding of end-user demands and help in building an adaptive network that meets future business needs.
Network Performance Metrics Challenges
The sheer size and complexity of modern networks can lead to monitoring challenges for network teams. Recent trends in network architecture have made pinpointing internal and external encumbrances on network performance a more daunting task.
Cloud adoption has accelerated at a pace that even industry insiders had not anticipated. With nearly half of overall network volume traversing the external cloud, visibility has become increasingly difficult. Proxy-in-the cloud solutions put the responsibility on network performance analysts to decipher millions of conversations between the cloud and the original host. The type of cloud service, be it SaaS, IaaS or another delivery model, dictates the network performance evaluation methods and best metric choices to assess it.
The IoT and Device Overload
The Internet of Things (IoT) is projected to produce 5 quintillion bytes of data per day across over 30 million connected devices within the next year. With all the anticipation and potential, the inherent challenges to network performance monitoring metrics are often overlooked. Connected devices are expanding into new arenas like farm machinery, medicine, manufacturing and utilities. The networks that connect these disparate devices will be taxed beyond anything previously encountered. The variety of IoT formats and protocols add complexity for network traffic baselining and prioritization. Network performance metrics and tools must adapt to meet this challenge as well.
NetOps and SecOps
Network Operations (NetOps) and Security Operations (SecOps) share the common objective of keeping network traffic flowing efficiently and securely. The separation of theses two important functions can lead to challenges in network performance monitoring and metric selection. With network speeds, cloud adoption and encrypted data volumes all on the rise, visibility for both Ops categories has been hindered. As a result, running NetOps and SecOps in independent silos becomes increasingly wasteful and exacerbates existing challenges when the same instrumentation points and use cases are often used for each.
Increased collaboration, including the use of custom-designed network packet brokers (NPBs) to intelligently pre-process and distribute traffic, clear definition of hand-off and escalation points and utilization of automation practices for improved interaction between NetOps and SecOps are some of the ways these challenges can be addressed. A collaborative approach can allow for increased leveraging of insights and improved response and remediation.
With increased network size, speed and diversity, many of the greatest challenges lie in determining essential network performance metrics for assessing system performance and the relative weights these metrics should be assigned.
Network Monitoring Best Practices
The rapid pace of network architecture transformation coupled with an influx of new network performance monitoring tools can lead to confusion and indecision. For any network monitoring program, implementation of logical best practices should be considered a prerequisite to prudent metric selection.
Baselining network performance is high on the list of recommended practices. A network performance baseline is the set of metrics used to define normal working conditions on a network infrastructure. This baseline information is essential for setting standards against which future performance data will be compared, so accurate definition is essential. Careful analysis of traffic flow and utilization patterns are elements of reliable baselining.
As architecture becomes more complex, understanding the underlying composition of the network in detail is another essential best practice. The traditional infrastructure, consisting of hubs, switches, routers and workstations should be comprehensively indexed. This level of detail should also apply to any devices and applications running on the network, wireless networks, wide area networks (WANs), local area networks (LANs) and virtual LANs. Visibility into virtualized and wireless elements in concert with conventional network components is an essential precursor of service health and accurate network monitoring. Anything that touches an application, either directly or indirectly, can impact the user experience, so end-to-end visibility and analytics are characteristics of elite network performance monitoring with the user first and foremost.
Performance monitoring tool selection is another area where planning and best practice implementation can lead to greater efficiency. Tools should be selected with the most useful metrics in mind, rather than monitoring based on the capabilities of your current tools. Integrated dashboards that display multiple data types, including historical information, trends, alerts and real-time usage, are invaluable to network performance monitoring and optimized metric application. Fault detection and performance workflows are ideally driven by the same tool so that relevant metrics can be correlated to diverse conditions automatically, leading to speedier assessment of end-user impact and root cause analysis.
To make any smart interface relevant, a focus on data fidelity is essential. This includes the accuracy, timeliness, completeness and reliability of the data being translated into actionable reporting and alerts at the GUI level. Performance monitoring tools with superior precision and unaltered packet capture capabilities can drive down mean time to resolve (MTTR) and enhance the overall end user experience.
Network Performance Metrics Selection
A diverse array of network performance metrics are available to IT teams. Carefully selecting the most meaningful metrics to monitor can improve network availability, optimize traffic flow and improve quality on a continual basis. These informative metrics provide barometers of system performance and leading indicators for potential problems.
One effective measure of network efficiency is channel utilization which is the fraction of the transmission capacity of a communication channel that contains data packets. Increased utilization rates can detrimentally affect link and application performance. By monitoring utilization continually, with granularity in the millisecond range to capture elusive spikes, congestion issues can be anticipated and network expansion can be proactively planned. Increased utilization short term can be a quick indication of serious security and performance issues. Long-term rises will inform capacity planning decisions.
Packet loss can occur simply because buffers are not infinite, and packets that arrive to find a full buffer are discarded. Packets are also subject to damage or loss due to other factors. If this issue is resolved through retransmission, the problem can be amplified rather than corrected. For this reason, monitoring packet loss on a continual basis is a recommended best practice. Transmission Control Protocol (TCP) retransmission rate is an indicator of packet loss and therefore an effective network performance metric for monitoring network heath, since retransmissions beyond a small percentage will manifest in degraded application performance.
Round Trip Time
The round-trip time (RTT) is typically measured in milliseconds and is a measure of the amount of time it takes for a server to respond to a client packet. The RTT can range from a few milliseconds under optimal conditions to several seconds when network problems have developed. Bandwidth, security, traffic and hardware issues could all be potential causes of increased RTT, so baselining and monitoring this network performance metric continuously can help identify threats in real time.
Jitter is the variation in delay (latency) for received packets that can lead some packets to take longer to travel between the sam two points in a network. The causes of jitter include network congestion, timing drift and route changes. In real-time communication applications, excessive jitter can lead to audio and/or video artifacts that degrade quality. Since jitter is a commonly-observed network phenomenon, tracking the frequency of successive pulses is a wise practice.
Latency is a network performance metric quantified in time units and refers to any form of delay in communication that occurs over a network. This metric is tracked bi-directionally between the host and the server. Potential contributing factors to network latency include security processes, router errors, storage or disk access delays and software malfunctions. Beyond a perceptible threshold, latency can significantly impact the overall user experience.
End-User Experience Score
End-user experience scoring is an invaluable network performance metric, as the numeric score provides a comprehensive rating from the user perspective. Along with the 0-10 relative value, intuitive problem descriptions and performance visualizations provide real-time data for any user transaction. Using adaptive intelligence to monitor network conditions from the user perspective, advanced algorithms distill the most pertinent standard network metrics into a single value that can accelerate issue diagnosis and resolution.
The Importance of User Experience
An appropriate analogy lies in the modern medical technology which now avails real-time data to physicians for virtually every bodily system and function. Despite this impressive byproduct of modern medicine, “how do you feel” remains the most important question in the medical profession today, as always.
This same concept can be applied to network performance analysis metrics. With thousands of potential network performance metrics available for monitoring, the proliferation of data can be overwhelming, so focusing on the user first is the best way to ensure value-added practices and satisfied customers.
Data of any kind become more relevant within the context of user experience and satisfaction, regardless of any numeric improvement observed. This can be accomplished in numerous ways including real user-provided input on quality of experience (QoE), analysis of simulated user or synthetic transactions, or selectively tailoring metrics to coincide with the outputs deemed most essential to user experience.
Network performance metrics like latency and jitter become increasingly valuable when performance thresholds correspond to real-world performance degradation. The Observer Apex provides individually and logically grouped experience scores across multi-dimensional variables. Unique to VIAVI, it’s the first multi-dimensional score to measure the impact of performance on the experience of the end user, and eliminating the guess work of toggle between multiple network performance monitoring metrics.
Network Performance Metrics Conclusion
The Cloud, Gigabit Internet, and IoT have quickly moved from the drawing board to the real world on a massive scale. While these developments continue to enable new applications that enhance our lives in countless ways, the challenges in identifying effective network performance metrics have multiplied along with the technical complexity. Focusing on the user experience has always been the best way to prioritize overwhelming choice in the areas of data collection and prioritization. The user is perhaps the most effective real-time monitoring solution of all. Whether you are encountering a split-second performance blip or a slowly evolving trend of degradation, tying these observations to the user experience can improve baseline accuracy and response times while helping to predict unwanted events in the future.