Finding Suspicious Traffic using CloudWatch Log Insights and VPC Flow Logs

While playing around with AWS CloudWatch Log Insights to analyze VPC flow logs, I thought of a couple of fun ways to identify (probably) malicious traffic.

Finding Vulnerability Scanners

These are the guys that hammer your box looking for anything from silly SQL injection attacks (so 2005) to CSRF vulnerabilities. The tell: look for hosts that reuse the same source port.

The Query

filter (srcPort > 1024 and srcAddr != "private-IP") |
stats count(*) as records by srcAddr,srcPort |
sort records desc |
limit 5

The Results

Suspicious traffic from the same source port

Finding Port Scanners

They just want to know if anybody’s listening. The tell: sending packets to a bunch of closed ports.

The Query

filter (action="REJECT") |
stats count_distinct(dstPort) as portcount by srcAddr |
sort portcount desc |
limit 5

The Results

The same source sending packets to a bunch of different ports

Science is About Discovering the Truth

As someone who works in IT, I hear and read a lot of comments about science. One common but unfortunate claim is that “science is not about finding truth.” While I won’t get into the underlying philosophical reasons behind this claim, I do want to at least respond to it on its face.

Etymology of the word “science”

The word science comes from the Latin scientia, meaning knowledge.

Plato said that knowledge is “justified true belief.” I’m not a big fan of Plato, but this is a good definition. Put another way, knowledge is what you believe to be true (a) that actually is true and (b) for which you have reason to believe is true. That’s less concise, but it hits all the important points.

If that’s not convincing, we could just skip to Encyclopedia Britannica, which says:

In general, a science involves a pursuit of knowledge covering general truths or the operations of fundamental laws.[1]

What’s the point of science?

As practical matter, if science isn’t about finding truth, then why should anyone care about it at all? If the purpose of science isn’t to discover truth, then it’s nothing more than fictional storytelling.

Science should be about finding truth. The concept of truth us, at its core, a fundamental component of logic. The proposition that 2 +2 = 4 is either true or false. Some have said that science deals with facts and not truth, but this is a distinction without a difference. Science has to make decisions about facts and come to conclusions based on them. Saying that science deals with facts and not truth is like saying math deals with numbers but not equations. It’s, well, false.

The imprecise language of “science communicators” doesn’t help

Although scientists carefully think about their craft, many “science lovers” and “science communicators” do not. They throw around words like “facts” in completely wrong ways. One of the more common cliches is, “gravity is a fact.” Gravity is a force. It’s no more a “fact” than electromagnetism. A fact would be a measurement of gravitational energy. This might sound nit-picky, but when you’re dealing with science, nit-pickiness is important. You can’t just fudge definitions and assume everyone understands what you mean. But this is exactly what happens in popular science.

It’s probably science?

There’s been a shift towards saying that science deals not with certainty but only with probabilities. The insinuation is that the probability of a scientific claim is never 100%. Hence, you’ll see a scientific claim couched with a “probably” and usually with the disclaimer that it’s “the best explanation”. Take for example this bit from the National Institutes of Health:

Depression, like other mental illnesses, is probably caused by a combination of biological, environmental, and social factors, but the exact causes are not yet known.[2]

You’ll likely never see a research paper that claims depression is certainly, without a doubt caused by a combination of those factors. The only assurance you get is that it’s probably the best explanation.

But now you have another problem: What’s the probability that is really is the best explanation? Perhaps some obscure researcher has a better explanation and hasn’t published it yet. The notion of “a best explanation” isn’t possible when you are only allowed to deal in probabilities. The “best explanation” then becomes “probably the best explanation.” So the whole thing falls apart.

Earth is certainly, and not probably, a sphere

Let’s take a more concrete example. There’s a 100% probability that Earth is a sphere. It’s not 99.9% or 99.0% or anything less. In fact, we can just forget probabilities altogether. It’s a scientific certainty that Earth is round. But as soon as you adopt the belief that science deals only in probabilities, you can no longer claim with 100% certainty that it’s a sphere. Instead, you’d have to say, “It’s probably a sphere” or “Earth being a sphere is the best explanation for why it looks round.” That’s ridiculous!

Knowledge is necessarily certain

Science means knowledge, and knowledge is necessarily certain. If you think you know something but aren’t certain, then you don’t truly know it.

Part of the solution then to the watering down of the word “science” is to avoid misusing it. That means, to the chagrin of many, removing the moniker of “science” from disciplines that don’t always deal in certainties, i.e. knowledge. That means psychology, anthropology, and history aren’t science, just to name a few. That doesn’t mean they’re less valuable or less worthy of study, it just means they don’t meet the strict criteria of science.

Studying for the AWS Certified Solutions Architect: Associate Exam (SAA-C01)

Study Guides

The AWS Certified Solutions Architect Study Guide: Associate SAA-C01 Exam 2nd Edition ($30) by David Clinton and myself covers more than you need to know to pass the exam. If you don’t believe me, just click the link and look at the reviews on Amazon.

If you are fairly new to AWS, you’re better off starting with the AWS Certified Cloud Practitioner Study Guide: CLF-C01 Exam, also by David Clinton and yours truly. Even if you don’t plan to take the entry-level Cloud Practitioner exam, this book will give you a solid foundation on which to build.

Both study guides include hundreds of assessment questions and answers as well as online access.

Video Courses

The following courses are on Pluralsight. In addition to videos, you get exercise files so that you can follow along with the demonstrations, access to discussion boards, and graded assessments. If you’re not a subscriber, you can still take advantage of a 10-day free trial. After that it’s $35/month or $299/year.

If you don’t have a solid networking background, you may find AWS networking a bit confusing. To get you up to speed, I’ve created three AWS networking deep-dive courses:

AWS Networking Deep Dive: Virtual Private Cloud (VPC)
AWS Networking Deep Dive: Elastic Load Balancing (ELB)
AWS Networking Deep Dive: Route 53 DNS

The Solutions Architect: Associate exam focuses heavily on the Well-architected Framework. The following courses cover these:

Architecting for Reliability on AWS
Architecting for Security on AWS
Architecting for Performance Efficiency on AWS
Architecting for Cost on AWS
Architecting for Operational Excellence on AWS

Operational Excellence Means Automation

People use the term “operational excellence” in a lot of different ways. In its vaguest sense, it means continuous improvement as applied to operations. But you’re interested in what it means in the context of technology operations. And I’m here to tell you that it means automation.

Operational Excellence is one of the five pillars of the AWS Well-architected Framework. The AWS whitepaper lists six design principles for achieving operational excellence. I’ve paraphrased these principles for clarity. Here they are:

Define everything as code

This is easily the most obvious. Turn everything into code that can be automatically executed by a machine. This includes the building of infrastructure, application deployments, testing, recovery, and anything that requires or benefits from being defined in a runbook. If it’s a repeatable process, code it and let a machine do it.

Documentation as input and output

The delightful side-effect of defining everything as code is that code can serve as documentation. It becomes trivial to have a machine take code as input, execute it, and then generate some pretty documentation based on a template. The resulting documentation can then be used by another machine. All automatically, of course.

Changes should be as small and frequent as possible

Without getting into the rationale behind this, the point is that the only way to make small changes as frequently as possible is to use automation. Pushing a code change to a repo whence it’s automagically built, deployed, and documented is faster than doing any of that manually. Reversing a change automatically is faster, too.

Look for things to automate

If you’re not automating something, and you can, then do it. Of course, you should avoid automating a bad process. Fix the process and automate it. And if there’s nothing to automate right now, keep looking, because changes will inevitably bring opportunities for automation.

Inject failures

Break things to cause failures. If recovering from those failures requires manual intervention, automate the recovery steps.

Tell other people in the organization to automate

The idea is to share what you’ve learned with others. Of course, what you’ve learned is that automation is the key to achieving operational excellence. So just keep it simple and tell them to automate.

But isn’t operational excellence more than just automation?

What operational excellence actually looks like depends on the organization. But no matter how you slice it, you’re closer to operational excellence if you automate than you are if you don’t. So yes, there is more to it than just automation, just as there’s more to driving than going from point A to point B. But if operational excellence is the goal, you need the vehicle to get there, and the only vehicle that will do it is automation.

Why I Don’t Teach The OSI Model

I recently got an email from a viewer of my Practical Networking course who asked how the TCP/IP networking terms I used mapped to the Open Systems Interconnect (OSI) model.

First, a bit of background. The OSI model is a generic networking model that is supposed to describe conceptually how networks carry data. Within the last four decades or so, 99.9% of all computer networking curricula for beginners has started by rehashing the OSI model.

When I first started out learning networking, I paid my dues by memorizing the 7 layers of the OSI model: application, presentation, session, transport, network, data link, and physical. But I found it almost useless in understanding how modern TCP/IP networks actually work.

When I began teaching networking, I found that it was clearer to simply explain things without ever explaining the OSI model. It’s an approach that’s worked well, as evidenced by the many compliments I’ve gotten on my networking courses and books.

The sad fact is that you don’t need to know the OSI model. All you need to know is how people use the terms. Here you go:

Layer 1 – Physical

The electrical signaling, physical connections, the bits. “We have a layer 1 problem” sometimes means “a rat chewed through the cable” or “it’s raining and the humidity is attenuating the signal.”

Layer 2 – Data link

Ethernet technologies, including MAC addresses, Ethernet frames, VLANs and VLAN tags; serial encapsulation such as the point-to-point protocol (PPP). Much of the time “the problem is at layer 2” means “it’s in the wrong VLAN”.

Layer 3 – Network

IP addressing, IP routing, and address resolution protocol (ARP); IPv6, neighbor discovery (ND), and the like. “We have a layer 3 problem” can mean “we have a routing problem” or “someone put in the wrong IP address.”

Layer 4 – Transport

Transmission control protocol (TCP) and User datagram protocol (UDP); This includes TCP and UDP port numbers. Incidentally, few people use this in conversation. Instead, they say “layer 7” when they mean layer 4, which brings us to…

Layer 7 – Application

Technically, this is just the data payload that the network carries. Strangely, in troubleshooting conversations, “a layer 7 problem” often means “a firewall is blocking that port”, referring to a TCP or UDP port number, distinctly a layer 4 problem. The confusion arises from the fact that most standard applications have a registered port number they use. For example, TCP port 80 is for the HTTP application, so people use the two interchangeably.

What about the other layers?

Nobody uses them. Seriously. In TCP/IP networks, session and presentation are rolled up into the application layer, which is itself just the data that you’re sending across the network. In fact, when you think about it, it makes perfect sense. What’s the point of a network? To transport data. What’s the highest layer that actually is part of the network infrastructure? That’s right, the transport layer.

Using AWS Systems Manager to Upgrade WordPress

After years of manually upgrading my self-hosted WordPress installation, I decided it was finally time to apply some devops principles (namely automation) to this process.

This site runs on an EC2 instance on AWS, so I decided to use AWS Systems Manager (aka SSM). I started out by creating the following Command Document (which happens to be in YAML format because JSON is ugly):

---
schemaVersion: "2.2"
description: "Download and install WordPress"
mainSteps:
- action: "aws:runShellScript"
  name: "example"
  inputs:
    runCommand:
    - "wget https://wordpress.org/latest.zip"
    - "mv latest.zip /var/www/html"
    - "cd /var/www/html"
    - "service httpd stop"
    - "unzip -o latest.zip"
    - "service httpd start"
    - "rm -f latest.zip"

The Command Document executes the bash commands in the runCommand section. It downloads the latest version of WordPress, stops Apache, unzips the files, restarts Apache, and then cleans up.

SSM uses an agent to carry out the bash commands. My instance runs Amazon Linux which comes with the agent preinstalled, so I didn’t need to install it.

Systems Manager can execute the Command Document at regular intervals to keep up with the typical WordPress release schedule of every 1-2 months. I can also trigger it manually if there’s a security or bugfix release I need.

To avoid catastrophe, I have the Amazon Data Lifecycle Manager for EBS Snapshots take daily snapshots of the instance, just in case something goes terribly wrong with an upgrade.

Architecting for Security on AWS

My latest course “Architecting for Security on AWS” is now available on Pluralsight!

You’ll learn how to secure your data and AWS services using a defense-in-depth approach, including:

  • Protecting your AWS credentials using identity and access management
  • Capturing and analyze logs using CloudTrail, CloudWatch, and Athena
  • Implementing network and instance security
  • Encrypting data at rest and in-transit
  • Setting up data backup, replication, and recovery

Go check it out!

AWS Networking Deep Dive Courses

Puzzled by networking on AWS? Check out my AWS networking deep dive series!

AWS Networking Deep Dive: Route 53 DNS

Configure Route 53 for any domain name, and configure health checks and routing policies.

 

AWS Networking Deep Dive: Virtual Private Cloud (VPC)

Create secure and scalable VPCs. Implement multi-VPC topologies, build peering connections, network address translation, and more.

 

AWS Networking Deep Dive: Elastic Load Balancing (ELB)

Securely configure load balancing for any public or private application. Implement HTTPS, path-based routing, and idle timeouts.