How to Use Uptime Monitoring tools to Check Website Uptime
One of the central questions we ponder in our work is: what does uptime mean in an interconnected world? You can do everything to ensure 100% reliability, yet still fail. How is this possible in an interconnected world? Shouldn’t there be enough redundancy to ensure nothing breaks unless it is actively broken?
That’s another way of saying technology is great when it works.
Our applications work because we made them work, and they remain up because we work hard to strengthen infrastructure and maintain our Service Level Agreements with users. Today, we’re going to look at how to use uptime monitoring tools to check the uptime of a website in service of SLA fulfillment.
What does DevOps need? How does monitoring work? Along the way, we’ll uncover some useful technical terms and set your team up for success.
Let’s get started!
What are Uptime Monitoring Tools?
You have lots of internal signals that things are going well for you: you can see bandwidth consumption, measure CPU, and you have visibility over your internal processes. But how do you know your customers are seeing everything is up and running?
How do you measure the uptime of your website?
Automated Monitoring | Check a Site’s Uptime Continuously
Automated monitoring has become a critical part of devops infrastructure at nearly every level. If you manage an application, you need uptime monitoring tools to tell you not just how it performs but if it’s up. The alternative is hearing about an outage through your customer base, which is less than ideal.
Uptime.com provides 18 different check types, with more to come, which can be used to provide an external perspective to your monitoring in fulfillment of your Service Level Agreements. We are an independent Service Level Indicator (SLI), meaning our probe servers are a reliable source of downtime details.
Is it up or down? What is broken? What happened, which error is being reported? With a combination of external and internal monitoring, you can know the answer in the time it takes to read an error message.
SLIs | Utilizing Effective Uptime Monitoring Tools for Rapid Alerting
Uptime.com can probe your server in various ways. For example, an HTTP(S) check will make sure the code returned is OK 200, while an SSL check will verify your SSL certificate and its details.
Other checks can be used to measure performance or user actions, such as Real User Monitoring and Transaction checks.
What’s the difference between synthetic and real user monitoring?
Let’s say your app checks the date and time upon sign in to determine what the user sees on his dashboard, such as news items or important alerts. Synthetic monitoring would provide insight into every component at work, verifying elements like images or usernames, along with the date and time functionality, for a fuller picture of every step in a goal funnel.
Real user monitoring would provide a glimpse into how all of these components perform based on real user sessions. Combine these check types to take performance optimization to the next level. You will be able to see the effect your updates have on server performance, with an accurate gauge of what the average user experience is like.
If you are also running internal monitoring, such as a probe from a private location, you will have even greater technical detail to help inform you. A prime example being payment APIs. If you know your shopping cart is down, and you have eyes on the internal components using those APIs, you increase observability and time to respond improves.
That’s effective monitoring!
How SLAs Enforce Website Uptime
Let’s examine a Service Level Agreement (SLA) in some detail to see how they help enforce website uptime. SLAs exist all over business and provide protection to both parties.
How SLAs Protect DevOps
Let’s consider for a moment your service from the customer perspective. They are paying you money for the ability to use the service. If it goes down, they can’t use the service. Your service is something they rely on.
They rely on it because you grow it and improve it. You serve the end user.
Your SLA is there to give you an objective score to shoot for. You’ve got a bare minimum set of objectives laid out and you know which stars need to align to get to those goals, but you also know how much breathing room you have.
SLAs can have legal repercussions. Failing to uphold an SLA is extremely costly for multiple reasons:
- Users may abandon the service
- Users or outside entities can bring litigation
SLAs are a protective measure to hold your service accountable, the customer needs you. Do your best to fulfill them. They can’t be parted out. It’s not as simple as saying you’re only accountable for X% over your SLA. When you fail, you fail.
Thresholds and Error Budgeting
Good SLAs have built-in thresholds sometimes called “Error budgets”. You can think of these as allotments of acceptable downtime. You don’t want to eat too much of your error budget at once, so you should build some stopgaps into this error budget.
First, you need to define a major downtime incident. For some, that’s an hour, for others four or longer. Based on our reporting from 2019, we know that 35% of the internet’s top performing websites have downtime at or near the 10-day mark, and incidents can range from minutes to hours. 10 days is a lot of downtime, so your thresholds should keep these numbers in mind as you build them.
It’s better to underpromise and over deliver in some respects, but you have to balance your industry’s standards. 99.9% uptime is more than just a hollow marketing cliche.
The bottom line on SLAs is that they allow you to remain feature-rich, but with some boundaries so you’re not feature-rich and broken. SLAs are not objects of fear. They help guide your organization to make good decisions in service of the end user.
Checking Your Website Uptime
Without a customer-facing perspective on your infrastructure, you’re really only getting a small glimpse into what’s working. It’s not efficient to schedule a human to check this infrastructure, and spot checking won’t catch outages while you’re asleep or at lunch.
As infrastructure grows in complexity, so to do the challenges of monitoring it. The only viable solution in SLA fulfillment is the deployment of an automated SLI system that continuously checks a URL for uptime.
Minute-by-minute Uptime checks.
Start your 21-day free trial with no credit card required at Uptime.com.