How Website Monitoring Fits Into CI/CD Methodologies
Big features make headlines, but they can be a challenge to deploy without staying dedicated to the finish line. Even the best laid projects can fall victim to time creep as more features are added and the idea gets refined. What’s on paper looks great! What’s being worked on, who knows…
Continuous integration (CI) and continuous delivery (CD) refer to a set of operating principles, and collection of practices that enable application development teams to deliver code changes more frequently and reliably. In other words: deploy more, plan better, work smaller in scale.
- Continuous integration is a coding philosophy and set of practices that drive development teams to implement small changes and check in code to version control repositories frequently.
- Continuous delivery picks up where continuous integration ends. CD automates the delivery of applications to selected infrastructure environments.
These two principles allow your team some agility, incrementing changes that built toward a larger end goal while safeguarding your codebase. But you cannot continuously deliver on a system that is constantly in a state of malfunction. You cannot continuously integrate small changes when each change brings new problems. It is a strategy for madness, not business.
You can turn this ship around with some focus on what matters to you and your organization. Today, we will break down how to approach monitoring and alerting in a way that allows for iteration, and lets you sleep at night.
The Hidden Challenge with CI/CD
What we like about CI/CD is the incremental approach. You’re always deploying something, and if you do need fixes they become a simple matter with a rapid release schedule.
The part that can be tricky is the ever-changing nature of your systems.
You probably aren’t running the same systems, environments, or in some cases code in every instance. As any amateur developer knows, changes to code always require testing to ensure the code executes as intended. As any experienced developer knows, QA cannot anticipate every user interaction. Even well-trained teams working for major enterprises miss bugs that appear to the end user to be, “so obvious”.
When you’re under a rapid deployment schedule, and a part you don’t manage breaks while you’re on call, you could find yourself in a bind (to put it lightly).
In short: It’s going to break at 3 AM. So, how are you going to deal with that?
Structuring Web Monitoring
Without getting into the complexities of SLA fulfillment, let’s assume you’re Jon/Jill Startup and you just want to make sure users have access, and your deployments aren’t breaking anything critical. You could just monitor your homepage. That will do the trick to tell you if your website is down, but it won’t tell you anything about what the user sees once logged in.
Transaction Checking + RUM | The Smart Choice for Complex Systems
What if you had a user who 24/7 ran through a series of steps, and reported back to you which step (if any) failed to pass his rigorous standards?
That’s the essence of a Transaction check. 24/7 coverage that alerts you to the precise problem encountered. Transaction checks are the smart implementation because they offer more than just reassurance a function is working as intended. They tell you what went wrong so you know where to look when it breaks. For your on-call engineer, that information is infinitely useful during a 3 AM fire.
When you add RUM monitoring into this mix, you get the benefit of real user data. Deploy your RUM code alongside your projects to track performance. As your application grows, your reporting will start to reflect how it has matured.
Solving the Enterprise CI/CD Problem
We know from our reporting that when let unchecked, downtime festers. One of the primary goals for DevOps, then, is creating total visibility. For Jon/Jill Startup, a seemingly finite number of services can potentially go down. For Quincy J. Enterprise, things get a little more complex.
Dear old Quincy probably has outages right now while he’s reading this. He needs to worry about in-house infrastructure, third-party services, and a host of other networking components adding various complexity that can just decide to fail at any point in the day.
Enterprise is like synchronized swimming: minor outages serving as training for the big moment where extreme coordination is needed. All the while managing new builds and allocating resources to support the growing infrastructure.
Primarily, website monitoring is going to do the most good in an enterprise environment if it is easy to deploy and requires little upkeep over time. Being agile doesn’t leave a ton of time to spend on researching new ways to do something. It’s important the monitoring you use is something your team can work with.
Casting the Net
Let’s think about which early indicators might tell us a problem is coming. API server metrics monitoring speed and response of requests is a good starting point. But as infrastructure grows, it becomes equally important to automate monitoring for new services.
With our REST API, developers can build these check creation requests into their workflow. Simply use a check creation endpoint, such as the HTTP(S) check below:
{ "name": "My Test Check", "contact_groups": [ "Tier 1 Support" ], "locations": [ "US East", "US West", "US Central" ], "tags": [ "API endpoints” ], "msp_interval": 3, "msp_sensitivity": 2, "msp_num_retries": 2, }
You can even add a Response time to measure against your SLA. Bake this into your deployments to create a check alongside each release.
Oversight and Accountability
We have solved the problem of creating and monitoring new infrastructure. And if we took the time to set our SLA value above, we’ve also solved accountability. We’ll know how long a service has been down for, and the uptime percentage for the period. From there, we can add components direct to an internal status page our team has created. Again, direct from the API:
{ "name": "string", "description": "string", "is_group": true, "group_id": 0, "service_id": 0, "status": "operational|major-outage|partial-outage|degraded-performance|under-maintenance", "auto_set_status": "major-outage|partial-outage|degraded-performance|under-maintenance" }
Adjust status and auto_set_status to determine how a check’s downtime affects its status, whether you are setting the outage to partial or degraded performance or a major outage.
You can also assign the component to a group, useful when you have several deployments that are linked together in functionality or share the same resources.
Tying Monitoring to Deployments: A Winning Strategy for CI/CD
Yes, the extra step does add some work but programmatically adding monitoring as you deploy can be streamlined if you nail down the settings you need up front. You won’t often need more than one or two check types for a deployment, so you can easily recycle the same settings for the next operation.
Minute-by-minute Uptime checks.
Start your 14-day free trial with no credit card required at Uptime.com.