5 Ways to Report on Your SLA Obligations

Service Level Agreements are designed to foster trust between your customers and your business. They help define the maximum amount of downtime your team finds acceptable. While they can have legal repercussions, SLAs are fundamentally about trust.

Your customers use your service because you’re the best at what you do. They remain loyal because they trust you to do what they need. Retention in SaaS is very fickle, and competition in certain spaces is quite stiff. Milliseconds really can make the difference, and if you want to save as many of those split seconds as possible, then these 5 ways to report on your SLA will help.

Create a Diverse Monitoring Suite

Comprehensive reporting begins with the data you collect. If you don’t build a strong monitoring suite, you’re letting downtime slip through the cracks. We recommend these basic check types for virtually every user:

  • HTTP(S) Checks for basic uptime monitoring
  • DNS and WHOIS checks to ensure your important certificates and your domain don’t expire

And we suggest these advanced check types for more in-depth performance and uptime reporting:

  • Transaction checks to focus on login flows and key customer pathways
  • Real User Monitoring for actual user performance data you can use to improve

While these checks are running, we have a host of tools anyone can use to spot check and improve their website. Our Page Speed Test is a good place for anyone to start their monitoring journey.

Find a location and run a test to see how well your site actually responds.

Tip: Run this test multiple times on the same URL from various locations. Compile those numbers and take the average of that value. Use that average value to define your Target Response Time SLA (secs).

Group Systems Into a Single Check

We typically recommend that customers adopt a system of tags they can use to organize checks into systems, system owners, or generally categorize based on functionality. But Group Checks can take this powerful system of organization one step further.

By grouping multiple check types into a single check, you unlock some powerful options for alerting and reporting. Set group behavior to define when the group is considered “Down” so you can respond to wider systems outages and report when multiple checks go down.

Or, collect the uptimes from several checks grouped together and report uptime as an average of those values.

Accuracy Matters

The state of a system is more than the state of a single check, even if that check is critical to system operation. Reporting on a group’s average uptime often helps high-level reporting on a single system. Engineering usually cares about the granularity most, while Marketing or Leadership likely want to know the simplest answer to “how much downtime have we experienced”.

This more holistic form of reporting allows better accuracy when reporting on the state of systems or applications, with the ability to go granular and report on data from individual check types.

Collect Real User Metrics on Performance

Real User Monitoring (RUM) requires an embedded HTML snippet and works like a tracking pixel, or Google Analytics. The script “fires” when a user reaches a page that contains it, and tracks the performance of their session throughout their journey on your website.

RUM gives you a lot more than simply global performance statistics. RUM tracks pageviews and performance against a customizable Apdex threshold. Apdex stands for Application Performance Index, or an open standard that helps convert raw performance data into a visualization of customer satisfaction.

In addition, RUM gives you detailed statistics on AJAX performance and errors encountered. If something is broken, a combination of RUM and Transaction checks will catch and report it.

Schedule and Deliver Detailed Reports

What are you going to do with this wealth of data once it’s generated? Our SLA reporting is built to visually convey the state of your systems, set to any recurrence you need. Typical intervals are weekly or monthly, but we also allow daily, yearly, and just about every interval in between.

SLA reporting is a matter of accountability, and data is recorded automatically based on the performance of your checks. Some checks, like Malware/Blacklist checks, don’t report on performance because they run once per day. But you can still get a sense of total uptime and control which checks are allowable in metrics:

Speaking of SLA, you might have noticed the Target SLA % and Target Response Time SLA (secs) fields above. The values you set here define the acceptable range for your SLA, so ensure you pick something manageable and attainable.

Those values also define the color-coded thresholds for your performance graphs and reports:

Target SLA % has these thresholds:

  • Above the Target SLA % (Blue)
  • Halfway to Target SLA % (Orange)
  • Below the Target SLA % (Red)

While Target Response Time SLA (secs) has these thresholds:

  • Below 70% of Target Response Time SLA (Blue)
  • 70% – 100% of Target Response Time SLA (Orange)
  • Above 100% of Target Response Time SLA (Red)

Create a Public-Facing Status Page

Finally, we come to the increasingly important status page. As customers begin relying on and using your service, they expect you to be up. But if you’re down, they won’t waste any time trying to figure out why before reporting the downtime on social media. If you’re a large organization, this pain is even worse when it makes the news cycle.

Public-facing status pages give you the opportunity to control the conversation with transparency, customized to fit your brand.

Incident management is becoming an important component to devops, and teams usually have some kind of “war room” or “disaster prep” that is done to try and resolve the issue. Maybe it’s an all-hands kind of operation, maybe the system owner takes charge. But part of your planning should be accurate and clear communication with your users.

We speak from experience, when a service is down you NEED to respond. But you can respond and do better for your user with some simple updates. Offer status page subscription options, like email or RSS, and your users won’t even need to visit the page. They’ll just know.

Tying it All Together

If you take your SLA obligations seriously – and you should – then reporting accurately on the state of your systems should be built into your operations. The foundation is good data, but how you present that data matters.

Devops is great at devops, but SLA reporting does not need to be thorn in your side.

Start with a good foundation of checks and build those checks into actionable reporting. Create the higher level reporting you need and use tools like RUM to get granular where devops can have the biggest impact.

Ready to get started? Try out the Uptime.com free trial now.

Minute-by-minute Uptime checks.
Start your 14-day free trial with no credit card required at Uptime.com.

Get Started

Don't forget to share this post!

Richard Bashara is Uptime.com's lead content marketer, working on technical documentation, blog management, content editing and writing. His focus is on building engagement and community among Uptime.com users. Richard brings almost a decade of experience in technology, blogging, and project management to help Uptime.com remain the industry leading monitoring solution for both SMBs and enterprise brands. He resides in California, enjoys collecting and restoring arcade machines, and photography.

Catch up on the rest of your uptime monitoring news