How to Measure SLA: 4 Important Types of Metrics
An important part of the client-service provider relationship is a well-written Service Level Agreement (SLA). Most service providers and clients agree on this. What some service providers don’t know is exactly how they should measure SLA.
There is often a lot of confusion between the SLA metrics that define contractual agreements and the wide range of key performance indicators (KPIs) you can also use to monitor operations. They are both important, but they are not the same.
What Is an SLA?
An SLA is a contract between a customer and a provider that ensures they are in agreement and are protected regarding the type of service being provided. It will contain the terms, metrics, and protocols that both parties agree will determine whether the specified level of service is being provided. An SLA will also establish what happens when service levels aren’t met.
SLAs depend on accurate metrics. Without the right metrics in place, you or your clients will never be sure that you are providing what you agreed on. If you can’t measure the results, you can’t improve on them.
4 Types of SLA Metrics to Track
Which SLA metrics you track depends on the services you are providing. You can track many types of metrics, but you shouldn’t go overboard. It is important to review your operations, determine which metrics matter the most, and prioritize them. Tracking a bunch of metrics you don’t really need will only make the process more complex and your SLA reporting harder to understand.
The availability of a resource is the percentage of time it is working for visitors. This number will never be 100%, but it should be as close to that number as possible. Here are two metrics used to measure availability:
- Uptime: The percentage of time a resource is up and responds.
- Service availability: The percentage of time a resource responds with the expected response.
If you have 99.9% uptime, that’s pretty good if it is spread out throughout the year. That comes to about 9 hours a year or about 10 minutes every week. But it could also mean that your service was down for 2 hours in an outage, and you wouldn’t know it. Not unless you are also calculating downtime and setting up alerts when a server has been down for too long.
- Downtime: The exact amount of time a server has been down.
3. Response Time
Response time is also known as latency and is the time it takes a response to return after a request is made. This should be low to prevent affecting user experience. Slow services mean slow websites. Here are some ways response time is tracked:
- First Byte: The time between the browser making a request and receiving the first byte of the response.
- First Paint: The time elapsed before the browser renders significant content on the screen.
- Time to Interactive (TTI): The time elapsed before a user can see and actually interact with a website. This will always take longer than first paint.
It is also important to measure the amount of failed requests to each resource, not only to ensure that service level agreements are being upheld but also to surface potential bugs and misconfigurations. Here are two error metrics to track:
- HTTP Errors: The percentage of requests that returned an HTTP error code.
Providing Accurate SLA Metrics
SLA reporting doesn’t have to be hard or complicated, but SLA obligations need to be taken seriously. This begins with building accurate reporting into your operations. You can start with a good foundation of basic check types and actionable reporting and move into more advanced tools like RUM to provide granular reports. It is also good practice to display public SLA pages and run scheduled reports, so you can display SLA history for stakeholders whenever they’re needed. SLA metrics are baked into Uptime.com’s system, so reports are only a few clicks away.
Ready to begin? Try out Uptime.com for free now.
Minute-by-minute Uptime checks.
Start your 14-day free trial with no credit card required at Uptime.com.