Uptime.com’s Guide to Weathering Outage Season
It’s already been a stormy quarter with notable outages exceeding 240 hours. This spring saw two substantial cloud provider outages between Atlassian’s 9 day outage and shorter outages with CloudFlare. As reliance on cloud-based tools and services increases you should be asking, what are the best ways to monitor your site and make sure the data you’re reporting accurately reflects your site’s downtime and SLAs?
Setting SLA Thresholds for Automatic Reporting
Whether you’ve defined an SLA or a baseline for performance, we have the tools you need to compare KPIs. Using your check’s advanced settings, you can define your Target SLA%, and your target response time in seconds. Customizing these fields will automatically adjust your SLA reporting thresholds for that check.
Pro Tip: Use Bulk Actions to update SLA targets for a large group of checks at once.
Filter Out False Positives
In the same advanced settings window you can adjust your Sensitivity and Number of Retries – these two fields control how many of your checks configured locations need to fail, and how many times they need to fail, in order for the check to be considered down.
Failsafes like this help rule out minor server connection issues, or other temporary outliers that would otherwise register as measurable downtime.
Use Locations & Real-Time Analysis to Gain Insight
The Probe Server Locations you configure in your checks determine which servers your check will test from. CloudFlare outages don’t always affect all servers, and it’s rare for every region to go down. Having multiple locations configured for your monitoring checks collects data you can use to determine if downtime is site related or a regional outage based on which locations fail when those outages occur.
To analyze this data, Uptime.com provides tools like Real-Time Analysis and real-time check status to provide timestamped alert data and location status.
Monitor Third-Party Providers
It’s hard to identify third-party downtime when only monitoring your own site. Unless you have SLA agreements that make these external companies accountable to you for their uptime, you will have no insights as these services don’t report to your system.
First, it’s important to remember that SLAs are a two-way street, and you should have these agreements with your providers and tools.
Secondly, it’s easy to put basic monitoring checks in place to be alerted if something that you rely on goes down. A basic HTTP(S) check monitors a URL or endpoint for status 200 OK, and this simple check can provide insightful data and alerting for the services you rely on. Here’s how to set one up in 3 minutes.
Our final piece of advice is to subscribe to the status pages of the services you use (including ours). Status page alerts not only let you know what is happening but can tell you why, or what the status of investigation is.
If your company offers a service and you house your monitoring with Uptime.com, take advantage of our customizable, and brandable, Status Page solution to keep your users informed.
Pro Tip: Stay tuned to Uptime.com, we’re constantly developing new ways to monitor third-party services 😉
Minute-by-minute Uptime checks.
Start your 21-day free trial with no credit card required at Uptime.com.