Operational Excellence Means Automation

Share on:

People use the term “operational excellence” in a lot of different ways. In its vaguest sense, it means continuous improvement as applied to operations. But you’re interested in what it means in the context of technology operations. And I’m here to tell you that it means automation.

Operational Excellence is one of the five pillars of the AWS Well-architected Framework. The AWS whitepaper lists six design principles for achieving operational excellence. I’ve paraphrased these principles for clarity. Here they are:

Define everything as code

This is easily the most obvious. Turn everything into code that can be automatically executed by a machine. This includes the building of infrastructure, application deployments, testing, recovery, and anything that requires or benefits from being defined in a runbook. If it’s a repeatable process, code it and let a machine do it.

Documentation as input and output

The delightful side-effect of defining everything as code is that code can serve as documentation. It becomes trivial to have a machine take code as input, execute it, and then generate some pretty documentation based on a template. The resulting documentation can then be used by another machine. All automatically, of course.

Changes should be as small and frequent as possible

Without getting into the rationale behind this, the point is that the only way to make small changes as frequently as possible is to use automation. Pushing a code change to a repo whence it’s automagically built, deployed, and documented is faster than doing any of that manually. Reversing a change automatically is faster, too.

Look for things to automate

If you’re not automating something, and you can, then do it. Of course, you should avoid automating a bad process. Fix the process and automate it. And if there’s nothing to automate right now, keep looking, because changes will inevitably bring opportunities for automation.

Inject failures

Break things to cause failures. If recovering from those failures requires manual intervention, automate the recovery steps.

Tell other people in the organization to automate

The idea is to share what you’ve learned with others. Of course, what you’ve learned is that automation is the key to achieving operational excellence. So just keep it simple and tell them to automate.

But isn’t operational excellence more than just automation?

What operational excellence actually looks like depends on the organization. But no matter how you slice it, you’re closer to operational excellence if you automate than you are if you don’t. So yes, there is more to it than just automation, just as there’s more to driving than going from point A to point B. But if operational excellence is the goal, you need the vehicle to get there, and the only vehicle that will do it is automation.