4 Critical Cloud Monitoring Metrics
No doubt most of you are dealing with externally hosted or cloud applications. To successfully address problems and maintain performance, you’ll need to stay on top of these four categories of cloud metrics.
1. User Experience
In the world of externally hosted services, end-user experience is the only thing you can control. In measuring quality of experience, you need to think like an end user. Focus on service availability and performance metrics, and monitor response times from the user perspective. Metrics to track HTTP and specific URLs include:
- Application and server response times
- Network delay
- Server requests
- Application requests
- Application and server availability
- Successful transmissions
- Client and server errors
Next, consult Service Level Agreements (SLAs) for guidance on additional metrics. Finally, place probes closer to user locations to more accurately reflect performance the users are experiencing. Learn more about probe placement.
2. Performance Benchmarks
Some best practices don’t change whether the application is hosted internally or externally. Baseline and establish internal benchmarks for normal service behavior with regards to cloud service utilization, number of concurrent users, overall cloud service response time, and response time for specific transactions. Also, incorporate any relevant SLA thresholds into baseline reports.
3. Internal Infrastructure and Network
It may be obvious, but when dealing with the cloud vendor, the internal network will always be guilty until proven innocent. Track and trend performance and availability metrics for servers, routers, switches, and other service components. For servers this includes metrics like CPU utilization, memory usage, and disk space. For routers and switches, keep tabs on port, CPU, and memory utilization. Also, record client and server response times for your internal network. Finally, be sure to monitor specific protocol transactions for issues. For one Network Instruments customer we’ll discuss later, this was the proof he needed to prove to a cloud vendor that the error was on their side.
4. Availability and Route Monitoring
Once the internal network is ruled out, how do you determine the problem location? Set up analysis tools to regularly perform an operation with the cloud service via synthetic transactions. This is more complex than a ping, and should mimic user interactions with the service. From these results, your tools or network team can determine availability and uptime. If your tools track the route, you can also pinpoint where delay might be occurring for a specific problem.
Unlike most applications, cloud services may be managed by a department outside of IT, which can add new management complexities to performance monitoring. This was the case for a major US retailer and Network Instruments customer when the cloud vendor’s techs blamed the retailer’s network for causing service problems. The human resources department, who was in charge of managing the service, immediately turned to their network team to resolve the issue.
As their network engineer explained, “the externally-hosted program we used to verify that accounts had sufficient funds was locking up and freezing. Using pings and synthetic transactions, we were unable to get back the requested data from the site. With the GigaStor retrospective analysis appliance, we were able to verify that our data was going out, but we weren’t seeing the expected data coming back. We shared this information with the provider, and they went back and detected that they had a misconfiguration issue on their side and were responding to us on the wrong IP. Since proving this with GigaStor, things have been running problem free.”