What Causes False Positive Alerts?
According to Orca Security’s 2022 Cloud Security Report, 59% of respondents received over 500 alerts a day, with more than 42% of them being false positive alerts. And 62% of them said it has contributed to employee turnover.
With numbers like this, it’s no wonder why developers dread the false positive alert. They waste time, energy, and money for everyone in every technology space, whether it is cloud or web services.
It’s time to change that.
In this article, we dive into what causes these false positive alerts and what you can do about it.
What Is Server Monitoring?
Server monitoring is the act of checking your web servers with HTTP checks, connection or transaction requests, and other methods to inspect the health, performance, and downtime status of your servers. Alerts, if done right, should notify you of site outages and errors that affect customer experience through this monitoring system.
False positive alerts, however, indicate an issue when there isn’t one. The causes of these types of alerts usually come in three forms:
- Misconfigured Checks: When checks are configured too strictly, they can be too sensitive and alert you to every temporary error that should not be considered a real issue.
- Location Anomalies: Server regions can have localized issues that can affect network availability, like natural disasters, bad weather, and power outages which can trigger false positives.
- Self-Resolving Issues: Servers have blips and often recover quickly on their own, like during scheduled downtimes.
How to Prevent False Positive Alerts?
Yes, false positive alerts are bad, but server monitoring tools like Uptime.com have already created solutions to lower this number significantly with a few features and tips.
Use Escalations and Intervals
You can configure your alerts to only trigger if the check is in a “DOWN” status for longer than a specific time interval. This allows some time for the problem to resolve itself if it can and prevents checks from not DDoSing your site with rapid requests.
You can also set an escalation chain where you notify different contacts based on how long an alert has been active. A common escalation chain can look like this:
- 0 min: Check goes into “DOWN” status
- 6 mins: Developers are alerted in a company messaging app like Slack, Microsoft Teams, or Google Chat
- 10 mins: Developers are alerted by phone or SMS
- 20 mins: Senior Tech Lead is alerted
- 30 mins: Team Manager is alerted
This spreads the responsibility among workers using different communication methods to make sure the alert is handled.
Set the Correct Level of Sensitivity and Retries
Sensitivity and retry levels control when and how your servers trigger alerts. Uptime.com provides configurable advanced checks to prevent users from being notified of false positives. These checks allow you to customize the level of sensitivity of alerts and automated retries.
Sensitivity is the threshold of locations that need to fail before the status is considered “DOWN.” If you select three locations to monitor, you can set the sensitivity to only alert if two-thirds of the locations are down. This way, you’re only alerted if the system is truly in danger of going offline rather than if a single location is down.
Retries are the number of times the monitoring system will retry a connection before considering a location’s status as “DOWN.” A retry level set to 1 could cause a high number of false positives because it may be a temporary error of the monitoring system’s connection that will self-heal. Setting the retry threshold to 2 would bring a healthier balance.
Analyze the Root Cause
Sometimes what you think is a false positive may actually become a real problem that needs to be resolved. The only way to know the difference is to drill down on the details of the alert.
Take the situation where you have 5 locations monitoring a check, and you receive an alert. You may think only one location has gone offline, but in reality, some locations have been down for hours or days, and the one that went down happened to set off the alert. In this case, a false positive alert is actually pointing to a real issue.
This is where a root cause analysis tool can help you.
Uptime.com provides real-time and root-cause analysis that presents alert metrics and drills down on the chronological series of events (including response codes, return values, and screenshots) that led to the alert trigger. Only with these tools can you really determine if a false positive is indeed a real issue and how you can solve it if it is.
No More False Positives
We’ve gone over what causes false positive alerts and how to prevent them in the future. With this knowledge by your side, you can now save money, time, and resources and get longer nights of peaceful, alert-free sleep.
Minute-by-minute Uptime checks.
Start your 14-day free trial with no credit card required at Uptime.com.