Why Your Status Page Matters and How to Use It
When an outage hits your service, everybody starts talking. Your engineers are talking about what caused the problem, and how to fix it; your management is asking about when it’ll be fixed; and your customers are telling the world that they’re not happy.
But there’s an even more important conversation you should be having: communicating with your users about the issue. Every online service should have a public-facing status page where users can check if there’s a problem, what services it’s affecting, and how long it’s anticipated to last.
The first place a user will look to find out whether your site is down is your status page. The second place will be somewhere they can vent, and you cannot control the message.
A status page update should become baked into your incident response, such that the moment you begin your diagnosis or make an assessment, your users have full transparency alongside you.
Today, we’ll walk through how to use a status page effectively for communication.
Table of Contents
What Does a Status Page Accomplish?
We’ve all had the frustrating experience, as users reported during the recent LastPass outage, of seeing a vendor status page that says everything’s fine, when we know it’s not.
We are aware of and actively investigating reports from some LastPass customers who are experiencing issues and receiving errors when attempting to log in. At this time no service issues have been identified.
— LastPass Status (@LastPassStatus) January 20, 2020
The status page is effectively useless communicating “All Systems Operational” when the reality is anything but.
The first role of a status page is to convey exactly that: the state of your services, up or down, in the simplest possible terms. Users don’t want to have to click around for additional information, so if there’s an incident going on, this should be highly visible on the page.
What Makes a Good Status Update?
A good status update says what’s down, why it’s down, and when it’ll be back up. It’s a roadmap that informs the user when each bump in the road is expected.
Status pages also serve another important role: visibility in organic search. When someone searches for your brand, your status page is likely to be one of the top results. Savvy brands who can communicate effectively during an outage can set the guidelines for the conversation.
Good status updates are also key to great incident management. The last place you want to hear about an outage is on social media. If you bake status updates into your incident management process, keeping users up to date will become a natural part of the flow.
Trust Building Through Status Pages
How can companies use public status pages to build trust and communicate promptly with their users? Bitfield Consulting’s John Arundel, a site reliability consultant with decades of experience in the industry, says that the first lesson is not to host your status page on your own infrastructure:
“When trouble strikes, your status page is the one thing that absolutely has to work. So it makes sense to decouple the status page from your own site. Don’t get high on your own supply.” Arundel also points out that honesty is the best policy. “When users come to your status page, they *know* you’re down. That’s why they’re there. So you may as well say so.”
Status Pages are More Efficient Than a Support Ticket
Status pages also reduce the pressure on your support folks, because instead of opening a zillion separate support tickets, customers can get the information they need right from the page. And Arundel’s insider tip for SREs is to set up an alert that fires when traffic spikes on your status page. “If lots of people are suddenly hitting the status page, chances are there’s a problem you don’t know about.”
When your monitoring alerts you to a problem that affects customers, the first thing you should do is update your status page to let people know about the incident, Arundel advises. “Get out in front of the problem and tell users what is happening, and what it means for them. Will their data be affected? Are there alternate services they can use? Have your social media channels point users to the status page as the place to look for information.”
Public or Private? How Should Your Status Page Work?
This question is really one of visibility, and who should have it. If you’re a small startup with big ideas, a single internal status page is probably enough to track the uptime of your services. It’s succinct, so you can remain agile, and you can provision access to your team if the status page service is a good one.
As you scale, you have to start thinking about how to communicate during outages. Think about the worst-case outage you could have. How much worse does that look without any way of talking to your customers about it? A public status page can help a lot here.
Ultimately, both private and public status pages are valuable. Customers need to know if the system is down; your engineers need to know why. The best-prepared companies have communication channels for both internal and external stakeholders.
Status Updates | What to Say, When to Say It
While the incident is still in progress, regular status updates can really help. “Update your customers at least hourly, even if it’s only to say that you know there’s a problem and you’re working on it” Arundel suggests. Show genuine empathy for affected users and reassure them that you’re taking the issue seriously; be human, not a corporate spokesdroid. Don’t try to weasel out of responsibility; plain speaking earns respect.
After the incident, make sure you reach out to all customers who might have been affected, explain what happened, and let them know what you’re doing to make sure it doesn’t happen again.
Arundel says that some companies don’t like to publish detailed post-incident reviews on their status pages, but this attitude is a mistake. “Owning your problems makes you look good, not bad. It shows that you take uptime seriously, that you have a process in place for improving it, and that you’re honest and transparent with customers. We appreciate that.”
You can also use your status page to publish your uptime stats, and performance against your service level objectives. Don’t be afraid of the numbers. If they’re great, tell the world! If they’re not good, work to make them better. Either way, your status page tells your users everything they need to know about you.
There’s one final person who needs to see what’s on your status page, and that’s you. Physicist Richard Feynman said that, in science, “The first principle is that you must not fool yourself, and you are the easiest person to fool.” If your status page makes for uncomfortable reading, don’t close the tab; open a conversation instead. If, despite all your team’s best efforts, the service just isn’t reliable, what’s wrong? What could you do differently to improve it? When your efforts to take site reliability to the next level start paying off, the status page will be the first place you’ll see the results.
What’s Your Status?
Incident reporting must become second nature in your incident management process. When you’re responding to an outage, someone on your team should be thinking about how to update the public. The more empowered this role is to update the status page, the less burden on the team.
Anyone who has been through major downtime events understands that those hardest at work on the problem need the most amount of isolation to get the job done right. Of those individuals, someone must take the public-facing role so others can work effectively.
A status page provides succinct access to what users need to know most: is it up or down? The rest is details, but those details control the narrative. Baking this communication into your incident response is a critical component of downtime resolution. So when the conversation about your site reliability starts, make sure your voice is heard too.
Minute-by-minute Uptime checks.
Start your 21-day free trial with no credit card required at Uptime.com.