How is Uptime Calculated?
Any modern organization depends heavily on the health of its network and servers. If a server goes down, it can seriously impact a business’s ability to provide services for clients and customers to get work done.
If network admins don’t know a server went down, the problem could quickly worsen. No one may realize there is a problem until the support lines are loaded with calls, and everyone needs to scramble first to find the issue and then fix it.
Uptime monitoring can prevent this scramble. It can also ensure that your service providers are living up to their Service Level Agreements (SLAs) and that your customers can actually interact with your website 99.9, 99.99, or 99.999 percent of the time.
What are the five nines?
The “five nines” refer to a measure of a system’s reliability, where the system is expected to be available and functioning correctly for 99.999% of the time. This level of reliability is often considered the gold standard for mission-critical systems that require near-perfect uptime, such as telecommunications networks, financial trading platforms, and healthcare systems.
Achieving five nines of uptime requires a robust and redundant infrastructure, along with a comprehensive disaster recovery plan to minimize downtime in case of any unforeseen events.
What Is server uptime monitoring?
Server monitoring can let network admins know the instant a server starts having issues so they can fix the issue earlier, but just what is server monitoring? There are a few types, but in this instance, server uptime monitoring is a set of tools that ensures your servers are available to visitors.
Alerts from uptime monitoring tools will let admins know when a server is down. This service will also calculate the uptime for your servers. Server uptime is usually a number from 99% to 99.999%, which indicates the percentage of time your server is up.
But unless you know how this number is calculated, it is hard to know exactly what it means to your business.
How uptime is calculated
Basic uptime calculation is pretty simple to understand. You take the number of seconds that a server was down in a specific time frame and divide it by the total number of seconds you were monitoring the server during that same time frame.
The result you get is the downtime percentage. To get the uptime percentage, subtract the downtime percentage from 100. This is usually a number of 99% or higher.
But an uptime monitor will only check at specific intervals. Between those check intervals, uptime is like Schrödinger’s cat. The server could be up. The server could be down. No one really knows.
If uptime is your goal, not just a number to show your customers, check intervals should be short. For example, if you have a check interval of every 10 minutes, a server could go down a few seconds after a check. This would leave customers without service for almost the whole interval, plus the time it takes to find and fix the issue.
Setting your check interval lower will result in more accurate uptime calculations and allow admins to respond to outages more quickly.
The top 3 metrics you should track
- Availability (Uptime): the percentage of time that a system or service is operational and accessible to users. It is calculated by dividing the total uptime by the total time the system was supposed to be available.
- Mean Time Between Failures (MTBF): the average time between two consecutive failures of a system or service. It is calculated by dividing the total uptime by the number of failures.
- Mean Time to Repair (MTTR): the average time it takes to repair a failed system or service. It is calculated by dividing the total downtime by the number of failures.
These metrics provide an overall picture of a system’s reliability, and they help organizations identify areas for improvement to enhance their system’s performance and uptime.
Suppose you were monitoring your website’s performance over the course of 1 month, or 30 days. During that period, your website experienced three outages, each lasting 30 minutes, resulting in a total of 90 minutes of downtime or 5,400 seconds. To calculate the uptime and downtime percentages, use the following formula:
- Total Time the Website Was Down:
- 5,400 seconds
- Total Time the Website Was Monitored:
- 2,592,000 seconds (30 days x 24 hours x 60 minutes x 60 seconds)
- Downtime Percentage:
- 5,400 seconds / 2,592,000 seconds = 0.0021 = 0.21%
- Availability (Uptime) Percentage:
- 100% – 0.21% = 99.79%
- Mean Time Between Failures (MTBF):
- 3 outages / 30 days = 10 days
- Mean Time to Repair (MTTR):
- 3 outages / 90 minutes = 30 minutes
By using this calculation, you can assess the website’s reliability over an extended period and make necessary adjustments to improve its performance.
What about response time?
A server may be responding, but if it takes 20 seconds for a page to load, that is not good either. It only takes a visitor .05 seconds to decide whether they are going to bounce or not. So uptime is important, but if your site is slow, you will still lose visitors.
Again, the solution here is monitoring. With Real User Monitoring (RUM), you get reporting on how visitors interact with your website. Load testing can help you find some issues with response time, but load testing won’t show you how this is affecting actual users.
Using RUM, you can gauge user experience on your site and find those areas that could be improved so that response time can be decreased and fewer visitors to your site bounce. You can also set a Target Response Time SLA and a Target SLA % to cover all the bases and ensure accountability.
Ensuring you get 99.9% uptime
While you can tell your customers that uptime is 99% or even 99.999%, it will only make them feel good until they can’t get to their website. Your customers expect and rely on you to be up and ready.
But unless you are monitoring your servers yourself, you can’t really be sure you’re getting what you paid for or that your customers are getting the high availability your company promised. This is where Uptime’s website monitoring system can help.
Uptime.com is an SLI (Service Level Indicator) with SLA (Service Level Agreement) reporting. SLI means it is designed to continuously monitor your system’s integrity and SLA reporting means you know exactly what percentage of time your servers are up and how quickly they are responding. It gives you the power to respond more quickly to downtime issues and provides the reports you need to hold service providers accountable.
Don’t let a bunch of 9s on an SLA fool you into complacency, and don’t lose sleep over uptime and service availability.
Minute-by-minute Uptime checks.
Start your 14-day free trial with no credit card required at Uptime.com.