Operational Excellence Means Automation

People use the term “operational excellence” in a lot of different ways. In its vaguest sense, it means continuous improvement as applied to operations. But you’re interested in what it means in the context of technology operations. And I’m here to tell you that it means automation.

Operational Excellence is one of the five pillars of the AWS Well-architected Framework. The AWS whitepaper lists six design principles for achieving operational excellence. I’ve paraphrased these principles for clarity. Here they are:

Define everything as code

This is easily the most obvious. Turn everything into code that can be automatically executed by a machine. This includes the building of infrastructure, application deployments, testing, recovery, and anything that requires or benefits from being defined in a runbook. If it’s a repeatable process, code it and let a machine do it.

Documentation as input and output

The delightful side-effect of defining everything as code is that code can serve as documentation. It becomes trivial to have a machine take code as input, execute it, and then generate some pretty documentation based on a template. The resulting documentation can then be used by another machine. All automatically, of course.

Changes should be as small and frequent as possible

Without getting into the rationale behind this, the point is that the only way to make small changes as frequently as possible is to use automation. Pushing a code change to a repo whence it’s automagically built, deployed, and documented is faster than doing any of that manually. Reversing a change automatically is faster, too.

Look for things to automate

If you’re not automating something, and you can, then do it. Of course, you should avoid automating a bad process. Fix the process and automate it. And if there’s nothing to automate right now, keep looking, because changes will inevitably bring opportunities for automation.

Inject failures

Break things to cause failures. If recovering from those failures requires manual intervention, automate the recovery steps.

Tell other people in the organization to automate

The idea is to share what you’ve learned with others. Of course, what you’ve learned is that automation is the key to achieving operational excellence. So just keep it simple and tell them to automate.

But isn’t operational excellence more than just automation?

What operational excellence actually looks like depends on the organization. But no matter how you slice it, you’re closer to operational excellence if you automate than you are if you don’t. So yes, there is more to it than just automation, just as there’s more to driving than going from point A to point B. But if operational excellence is the goal, you need the vehicle to get there, and the only vehicle that will do it is automation.

Why I Don’t Teach The OSI Model

I recently got an email from a viewer of my Practical Networking course who asked how the TCP/IP networking terms I used mapped to the Open Systems Interconnect (OSI) model.

First, a bit of background. The OSI model is a generic networking model that is supposed to describe conceptually how networks carry data. Within the last four decades or so, 99.9% of all computer networking curricula for beginners has started by rehashing the OSI model.

When I first started out learning networking, I paid my dues by memorizing the 7 layers of the OSI model: application, presentation, session, transport, network, data link, and physical. But I found it almost useless in understanding how modern TCP/IP networks actually work.

When I began teaching networking, I found that it was clearer to simply explain things without ever explaining the OSI model. It’s an approach that’s worked well, as evidenced by the many compliments I’ve gotten on my networking courses and books.

The sad fact is that you don’t need to know the OSI model. All you need to know is how people use the terms. Here you go:

Layer 1 – Physical

The electrical signaling, physical connections, the bits. “We have a layer 1 problem” sometimes means “a rat chewed through the cable” or “it’s raining and the humidity is attenuating the signal.”

Layer 2 – Data link

Ethernet technologies, including MAC addresses, Ethernet frames, VLANs and VLAN tags; serial encapsulation such as the point-to-point protocol (PPP). Much of the time “the problem is at layer 2” means “it’s in the wrong VLAN”.

Layer 3 – Network

IP addressing, IP routing, and address resolution protocol (ARP); IPv6, neighbor discovery (ND), and the like. “We have a layer 3 problem” can mean “we have a routing problem” or “someone put in the wrong IP address.”

Layer 4 – Transport

Transmission control protocol (TCP) and User datagram protocol (UDP); This includes TCP and UDP port numbers. Incidentally, few people use this in conversation. Instead, they say “layer 7” when they mean layer 4, which brings us to…

Layer 7 – Application

Technically, this is just the data payload that the network carries. Strangely, in troubleshooting conversations, “a layer 7 problem” often means “a firewall is blocking that port”, referring to a TCP or UDP port number, distinctly a layer 4 problem. The confusion arises from the fact that most standard applications have a registered port number they use. For example, TCP port 80 is for the HTTP application, so people use the two interchangeably.

What about the other layers?

Nobody uses them. Seriously. In TCP/IP networks, session and presentation are rolled up into the application layer, which is itself just the data that you’re sending across the network. In fact, when you think about it, it makes perfect sense. What’s the point of a network? To transport data. What’s the highest layer that actually is part of the network infrastructure? That’s right, the transport layer.

Using AWS Systems Manager to Upgrade WordPress

After years of manually upgrading my self-hosted WordPress installation, I decided it was finally time to apply some devops principles (namely automation) to this process.

This site runs on an EC2 instance on AWS, so I decided to use AWS Systems Manager (aka SSM). I started out by creating the following Command Document (which happens to be in YAML format because JSON is ugly):

---
schemaVersion: "2.2"
description: "Download and install WordPress"
mainSteps:
- action: "aws:runShellScript"
  name: "example"
  inputs:
    runCommand:
    - "wget https://wordpress.org/latest.zip"
    - "mv latest.zip /var/www/html"
    - "cd /var/www/html"
    - "service httpd stop"
    - "unzip -o latest.zip"
    - "service httpd start"
    - "rm -f latest.zip"

The Command Document executes the bash commands in the runCommand section. It downloads the latest version of WordPress, stops Apache, unzips the files, restarts Apache, and then cleans up.

SSM uses an agent to carry out the bash commands. My instance runs Amazon Linux which comes with the agent preinstalled, so I didn’t need to install it.

Systems Manager can execute the Command Document at regular intervals to keep up with the typical WordPress release schedule of every 1-2 months. I can also trigger it manually if there’s a security or bugfix release I need.

To avoid catastrophe, I have the Amazon Data Lifecycle Manager for EBS Snapshots take daily snapshots of the instance, just in case something goes terribly wrong with an upgrade.

Architecting for Security on AWS

My latest course “Architecting for Security on AWS” is now available on Pluralsight!

You’ll learn how to secure your data and AWS services using a defense-in-depth approach, including:

  • Protecting your AWS credentials using identity and access management
  • Capturing and analyze logs using CloudTrail, CloudWatch, and Athena
  • Implementing network and instance security
  • Encrypting data at rest and in-transit
  • Setting up data backup, replication, and recovery

Go check it out!

AWS Networking Deep Dive Courses

Puzzled by networking on AWS? Check out my AWS networking deep dive series!

AWS Networking Deep Dive: Route 53 DNS

Configure Route 53 for any domain name, and configure health checks and routing policies.

 

AWS Networking Deep Dive: Virtual Private Cloud (VPC)

Create secure and scalable VPCs. Implement multi-VPC topologies, build peering connections, network address translation, and more.

 

AWS Networking Deep Dive: Elastic Load Balancing (ELB)

Securely configure load balancing for any public or private application. Implement HTTPS, path-based routing, and idle timeouts.

AWS Networking Deep Dive: Route 53 DNS

Many of you have been asking for months when my Route 53 course would release. Well, it’s finally here! AWS Networking Deep Dive: Route 53 DNS is now available on Pluralsight.

Topics covered include:

  • Configuring Route 53 to work with any domain name, even one registered with a different registrar
  • DNS concepts and how Route 53 fits in with the internet’s domain name system
  • Creating public hosted zones, health checks, and routing policies
  • Using private hosted zones with multiple VPCs

 

101 Public DNS Servers Sorted by Speed

You probably know the popular Google DNS server IP addresses by heart: 8.8.8.8 and 8.8.4.4. Before those were around you might have even used Level3’s 4.2.2.1 and 4.2.2.2. Of course, everyone else uses these too, which means these popular servers are under a pretty heavy load.

Fortunately, there are faster public DNS servers out there. Much faster.

101 DNS Servers

I’ve compiled a list of 101 public DNS servers (PDF), sorted in order of fastest to slowest (for me).

A few things to keep in mind

This is not an exhaustive list of all public name servers, nor are these necessarily the fastest servers that exist. But if you’re using one of the more popular public name servers, you can easily see how other servers rank against those in terms of speed.

Not all DNS servers behave the same way. Some will return intentionally incorrect responses, usually if the query is for a malicious domain. Others will return inconsistent results, which can be problematic if you’re testing for recently changed records.

One name server in particular seemed to rate-limit my queries, and this behavior seemed to change based on the query type. For instance, queries for * (all) would time out, while queries for SOA records would work. After waiting a little while and trying again, the server answered all my queries quickly.

The lesson here is test the server thoroughly and get familiar with its quirks before using it everywhere.

Installing PowerShell Core on Amazon Linux

In preparation for my latest course in the AWS Networking Deep Dive series, I wanted to install PowerShell Core on an Amazon Linux instance to test out cross-platform compatibility for some scripts.

Specifically, I wanted to see if I could use methods in the System.Net.Dns class to perform name resolution. The dnsclient PowerShell module provides some cmdlets for this very purpose, but that module is Windows-only, and I needed something that would work on across different platforms.

To my surprise, it wasn’t as easy as just running sudo yum -y install powershell. Fortunately, it wasn’t as difficult as building from source. Here’s what I did:

Install the dependencies

sudo yum install -y curl libunwind libicu libcurl openssl libuuid.x86_64

Download the installation script

This script just fetches the tarball and extracts it to /opt/microsoft/powershell

wget https://raw.githubusercontent.com/PowerShell/PowerShell/master/docker/InstallTarballPackage.sh

Set the script to be executable

chmod +x InstallTarballPackage.sh

Run the script, specifying the PowerShell version (6.0.1) and package tarball as the arguments:

sudo ./InstallTarballPackage.sh 6.0.1 powershell-6.0.1-linux-x64.tar.gz

If you want to install a specific version (like the latest), then refer to the releases on the PowerShell repo.

Run PowerShell!

The command is pwsh, as in “Present Working SHell” (clever points). Be sure to use sudo, as it does require root privileges:

sudo pwsh

Get Your .NET On

The whole point of this exercise was to see if I could use .NET to perform DNS name resolution without any of the cmdlets in the Windows-only dnsclient module. Did it work? Let’s see.

PowerShell v6.0.1
Copyright (c) Microsoft Corporation. All rights reserved.
 
https://aka.ms/pscore6-docs
Type 'help' to get help.
PS /home/ec2-user> [System.Net.Dns]::GetHostAddresses("benpiper.com").IPAddressToString 
52.205.213.4

Yes indeed! Of course, I can still use the usual PowerShell tricks to extract just the data I want:

PS /home/ec2-user> [System.Net.Dns]::GetHostByName("pluralsight.com") | Select-Object AddressList
 
AddressList
 -----------
 {54.213.174.143, 35.164.44.204, 52.39.160.43}

I can also drill down to pick out just the first IP address in the list:

PS /home/ec2-user> ([System.Net.Dns]::GetHostByName("pluralsight.com")).AddressList[0].IpAddressToString
54.213.174.143

Run it again, and I get a different address:

PS /home/ec2-user> ([System.Net.Dns]::GetHostByName("pluralsight.com")).AddressList[0].IpAddressToString
52.39.160.43

Looks like round-robin DNS! But will this command work cross-platform? Let’s try it on my Windows 10 machine:

PS C:\Users\admin> ([System.Net.Dns]::GetHostByName("pluralsight.com")).AddressList[0].IpAddressToString
35.164.44.204

Yes! This is exactly why I chose PowerShell. The same command that works on Linux also works on Windows, which makes it perfect for an OS-agnostic course.

Ready to learn more PowerShell? Sign up for a free trial with Pluralsight and get unlimited access to every course in their humongous library!

Is Social Media Bad?

Most of us have tossed around the idea of restricting our social media consumption, or even giving it up altogether. It’s not that we don’t like it. We love it, sometimes too much. But inherently, something about social media just seems wrong. But what is it?

Social is Not a Neutral Tool

People often say that social media is just a tool, and like any other tool, it can be abused, but it can also be used for good. After mulling on this for several months, I have to disagree. Social media is not a tool. It’s not neutral. And that has nothing to do with the platform. Social can’t be neutral because it’s comprised of people, and people are not neutral.

Think of it this way. Imagine you’re at your favorite hangout. Maybe it’s a coffee shop, restaurant, the library, zoo, whatever. You’re having a good time, when suddenly, a large group of people appears. They all start talking to each other, LOUDLY, and what they’re saying is seriously ticking you off. They’re spewing some of the most unpleasant, irritating, obnoxious garbage you’ve ever heard.

What do you do? Most likely you’d put on your headphones, if you have any, or you’d leave. Yeah, you might engage some of the people for a while, if that’s your personality. But would you purposely subject yourself to that noxious experience day after day? Probably not.

And yet, when it comes to social media, many continually subject themselves to that kind of toxic social interaction multiple times a day.

Technical Solutions Don’t Work

Even before social media was big, people have tried to come up with a technical solution to this problem. Banning, shadowbanning, muting, blocking, throttling, etc. have all been tried.

But none of it has worked, or even helped much. If anything, social media interaction has gotten worse, not better. These solutions are predicated on the notion that social media is just a neutral platform, and if we enforce the right rules, we can maintain that neutrality. But that’s a false notion. Again, social media can’t be neutral because people are not neutral.

The only way for social media to work is for people to properly police their own behavior. This is exactly what happens in real life social situations. Nobody dares to walk into a noisy restaurant and launch into a profanity-laced tirade against a perfect stranger. But much of the time, that inhibition is driven by self-preservation rather than a moral imperative. Remove the risk of getting physically assaulted, and many won’t hesitate to say the vilest, ugliest stuff. That’s what social media does.

Bad Behavior is Contagious

Is social media bad? Not inherently. But it’s not inherently good either. People choose to behave in morally good or bad ways. When immoral speech spills over into social media, it spreads like a cancer. We love to think of ourselves as being in control of our own thoughts and choosing our own influences. But that’s just not true. Bad company corrupts good morals, as the Apostle Paul said, and being exposed to trash on social media day after day does affect you, even if you don’t consciously realize it.

How often have you gotten viscerally angry at something you read on Twitter or Facebook? Sure, you can get mad reading something on any website or even in a book. But those occurrences are few and far between. On social, they’re the norm. When you saw something that made you really mad, how long did you stew about it? Did your mood affect your interactions with other people?

This domino effect isn’t unique to social, of course, but it is amplified. Our use of social media is 180 degrees out of phase. For our own sanity, it should comprise seconds, maybe minutes of our day. Our one-on-one and in-person interactions should be the bulk. That tweet that made you burning mad should be a once-a-week event. Most of your disagreements with another person should be hashed out one-on-one, privately, not in a public forum with spectators.

Is Social Media Worth It?

Should you stop using social media? I think that’s the wrong question. A better question is why should you use it? What value does it hold for you? And is it worth the price you pay in terms of time, sanity, and relationships with others?

It’s not unusual to see someone take a break from social, usually for a week or two, but sometimes a month or more. This is common and doesn’t usually raise any eyebrows. But if you heard someone say, “I’m taking a break from all social interaction for a month!” you’d immediately think something was amiss. This is evidence that instinctively, we know something is off about social. You only take long breaks from something when you detect that it’s not healthy to continue at your current pace.

As ironic as it seems, cutting back on social media would probably make everyone more social.