10 Gb Monitoring: Learning from a Fortune 500 Company
Upgrading from Gigabit to 10 Gb
Is a current initiative for many IT teams. But to complete this task, teams continue to rely on older monitoring tools and methods for managing performance on their new higher-speed links. In this article, we’ll look at a Fortune 500 retailer’s migration to 10 Gb and critical adjustments they made to their performance management strategies and tools to ensure on-time application delivery in the face of higher speeds.
When shifting to 10 Gb, the retailer’s network team faced challenges in the following areas:
- Accessing traffic
- Monitoring at 10 Gb speeds
- Understanding overall performance
1) Accessing Traffic
Problem: The primary ways to provide monitoring tools access to network traffic are: port spanning, aggregation switches, and TAPs. While the retailer relied on spanning to access their gigabit network, they had fewer ports in their 10 Gb environment. Spanning on 10 Gb networks, also meant they faced a greater chance of dropped packets and that error packets would be filtered out by the span port.
Solution: To overcome these issues, the retailer used a combination of TAPs and aggregation switches. TAPs ensured all packets were copied and captured by the monitoring devices. Aggregation switches allowed the network team to aggregate multiple lowly-utilized links to a single 10 Gb analysis device for more cost effective monitoring.
2) Monitoring at 10 Gb Speed
Problem: The retailer’s network team relied on a combination of open-source and commercial software analysis tools to manage gigabit networks. In their full-duplex 10 Gb environment, these tools were overwhelmed.
Solution: To manage 10 Gb performance, the retailer purchased long-term packet capture appliances that could capture and save 10 Gb traffic to disk at line rate. When troubleshooting 10 Gb links, access to the packets was essential for quick and accurate resolution.
“With their older open-source tools, events or errors would pop up but they wouldn’t have any real indication of the source of the problem. Using the Network Instruments GigaStor™ long-term capture appliance, they could go back to the packets after the incident, and ascertain in detail why the problem occurred.”
3) Understanding Overall Performance
Problem: The retailer had previously been troubleshooting and managing performance within its three largest data centers independent of each other. Without an aggregated view of performance between the centers, it was difficult to assess the scale of problems, and prioritize troubleshooting efforts. The IT team also lacked specific details about critical applications, making it hard to isolate the source of problems to the network or application. Within 10 Gb environments tracking applications was a nightmare without the in-depth analytics.
Solution: The network team implemented a high-level reporting solution to aggregate performance between the three critical data centers. With a view of overall performance, they could immediately assess the scope and impact of problems. In addition, they used baselines to establish benchmarks for key applications, and set alarms to alert the team to significant deviations in performance. Having an analysis platform capable of providing application transaction analysis, they could also track specific application metrics and error details to isolate issues within the application.
Rather than relying upon a reactive approach for troubleshooting, the retailer developed a proactive monitoring strategy while implementing 10 Gb. The new analysis platform could keep pace with the higher network speeds and allowed the network team to be proactive in managing performance