Finding Suspicious Traffic using CloudWatch Log Insights and VPC Flow Logs

While playing around with AWS CloudWatch Log Insights to analyze VPC flow logs, I thought of a couple of fun ways to identify (probably) malicious traffic.

Finding Vulnerability Scanners

These are the guys that hammer your box looking for anything from silly SQL injection attacks (so 2005) to CSRF vulnerabilities. The tell: look for hosts that reuse the same source port.

The Query

filter (srcPort > 1024 and srcAddr != "private-IP") |
stats count(*) as records by srcAddr,srcPort |
sort records desc |
limit 5

The Results

Suspicious traffic from the same source port

Finding Port Scanners

They just want to know if anybody’s listening. The tell: sending packets to a bunch of closed ports.

The Query

filter (action="REJECT") |
stats count_distinct(dstPort) as portcount by srcAddr |
sort portcount desc |
limit 5

The Results

The same source sending packets to a bunch of different ports

Architecting for Security on AWS

My latest course “Architecting for Security on AWS” is now available on Pluralsight!

You’ll learn how to secure your data and AWS services using a defense-in-depth approach, including:

  • Protecting your AWS credentials using identity and access management
  • Capturing and analyze logs using CloudTrail, CloudWatch, and Athena
  • Implementing network and instance security
  • Encrypting data at rest and in-transit
  • Setting up data backup, replication, and recovery

Go check it out!

Understanding the Meltdown Attack

This month, security researchers released a whitepaper describing the Meltdown attack, which allows anyone to read the full physical memory of a system by exploiting a vulnerability in Intel processors. If that sounds bad, that’s because it is. It means that if you’re running workloads on a public cloud provider, and you don’t have a dedicated server, an attacker can read what your workloads are putting into memory. This includes passwords, private keys, credit card numbers, your cat’s middle name, etc.

How Meltdown works

As a way of eking out every ounce of speed, modern processors perform out-of-order execution. Rather than executing a program one instruction at a time, the processor fetches multiple instructions at once and places them in an instruction queue. It then executes each instruction as soon as possible (and usually out-of-order) and stores each instruction’s output (if there is any) in a cache.

Now here’s the kicker. An instruction can do nasty things. Out-of-order execution runs instructions that the program isn’t allowed to run. Take the following assembly instruction:

; rcx = kernel address
 mov al, byte [rcx]

This copies one byte of data from the kernel memory and places it in a temporary portion of CPU memory, called a register. With in-order processing, the CPU would not allow this instruction to execute.

But with out-of-order execution, the CPU does execute the instruction, and it stores the resulting byte value in a temporary cache. The CPU then raises an exception and terminates the program.

But the byte value is still stored in the cache. This is where the flaw in Intel’s microcode lies. The CPU should clear the cache as soon as the exception is raised. But it doesn’t. It leaves it there, vulnerable to another type of side-channel attack called a cache attack.

Cache attacks have been around for years, and there are several different types which an hacker can use in conjunction with Meltdown to figure out the data stored in cache. For a great explanation of how these attacks work, check out Bert Hubert’s article on Spectre and Meltdown. It’s trivial for an attacker to pull one off, quickly.

Only Intel Processors were Shown to be Vulnerable

The researchers were not able to duplicate the results on AMD or ARM processors, which also use out-of-order execution. They speculate that they theoretically could, but they were not able to at the time they released the paper. You certainly shouldn’t assume that all non-Intel processors are immune to Meltdown.

Disabling out-of-order execution isn’t the answer

In order to be effective, the Meltdown attack depends on out-of-order execution as a necessary but not sufficient condition. Thus, simply having out-of-order execution enabled is not enough to make Meltdown possible. Disabling out-of-order execution does prevent Meltdown, but the performance impact is substantial.

What about Spectre?

Spectre is not the same as Meltdown. Although Spectre works against all CPUs, including Intel, AMD, and ARM, pulling off a Spectre attack is trickier. But both are equally bad and rely on out-of-order execution, which is why you usually see them lumped together.

You don’t need to replace your hardware

You don’t need to replace your hardware, and anyone who says you do is either trying to tell you something. Now for the obvious part. To mitigate Meltdown and Spectre, install the latest security patches for your operating system, BIOS, and CPU firmware — after sufficient testing, of course.

[amazon_link asins=’1484200659,1484224027′ template=’ProductCarousel’ store=’benpiperblog-20′ marketplace=’US’ link_id=’f5492b54-f283-11e7-b66c-83ded4902b58′]

4 Inconvenient but Effective Security Measures

Security usually requires sacrificing convenience (or money). So naturally, we tend to get away with as little security as possible. But if you’re a glutton for punishment, here are 4 very inconvenient but highly effective measures you can take right now to protect yourself from the  evils lurking on the interwebs.

Disable JavaScript

Yeah, I know. Every site made since the Web 2.0 days needs JavaScript just for a text input field to work right. It’s a shame, really. But disabling JavaScript isn’t an all-or-nothing deal. Browser extensions lets you allow JavaScript for sites you trust and block them for all others. If you’re still using Firefox (the most obnoxious browser today), you can use the NoScript extension. Chrome users, check out uMatrix.

Disable XSS

Cross-site scripting (XSS) occurs when you go to one website and it loads JavaScript from a different domain. Sadly, this practice has become normal with the advent of CDNs. What makes it risky is not so much the cross-site request, but the fact that it happens without you knowing it. You don’t see that is loading Nasty.js from someone’s hijacked blog site. Using one of the script blocking extensions I just mentioned will warn you about XSS and let you decide whether to allow it. I’ve stopped several malicious scripts this way over the years.

Block wide categories of websites

Taking a whitelist approach is too inconvenient, as there are just way too many sites out there to keep up with. Your next best option is to use content-based filtering to block websites by category. I use OpenDNS to achieve this, and below is my current list of blocked categories. As you can see, it’s pretty broad.

This list covers some sites I don’t want blocked, so I allow those on a case-by-case basis. You might notice that some of these categories tend to carry malware more than others, so blocking them wholesale is a pretty effective way to avoid fallout from clicking the wrong link.

Don’t use the app

Smartphones normalized a concept that would’ve been considered bizarre just a few years ago: installing an app for every website you use regularly. We’ve got Twitter, Facebook, Gmail, LinkedIn, etc. Can you imagine installing “the MySpace app” on your Windows XP machine in 2005? Some sites just don’t need an app. You can browse to them on your phone and they work fine.

When you install an app, you usually give it permissions to various system resources – photos, call logs, camera, microphone, etc. Chances are the core functionality doesn’t require most of those. If that app has a vulnerability – or worse, malicious code – then you’ve just turned your phone into a neat little hacker toolkit.

Blockchain is a Passing Fad

Whenever a tech fad comes to an end, it becomes so obvious why it failed. Yet during the hype, it’s easy to miss the problems lurking just below the surface. I want to explore some of the problems I see with public blockchain and why I think it’s not going to live up to the hype.

Blockchain can’t track real things

Whenever a new technology comes along, there’s always a temptation to use it in ways above and beyond it was originally intended. Blockchain came to popularity because of Bitcoin, and as Bitcoin grew, people became fascinated by its underlying technology.

But what made Bitcoin popular wasn’t the technology. The whole idea behind Bitcoin was to create a global currency that didn’t have a central monetary authority. Blockchain was just a good means to achieve that.

The fact that blockchain works well for cryptocurrency doesn’t mean that it works well for any sort of transactional database. The idea of digitally moving funds from one account to another doesn’t translate to moving goods along a supply chain. Why not? Because with Bitcoin, the blockchain is currency that you’re moving around. You can’t separate a Bitcoin from the blockchain. If you do, the Bitcoin ceases to exist.

When it comes to supply chain, you’re only moving representations of goods, not the goods themselves. This is a key distinction that people miss. You can assert that a certain string of data represents a tangible thing in the real world, but now that linkage is based on your assertion, not on the blockchain itself. Hence, the blockchain doesn’t add much value.

Anyone can create a blockchain

A blockchain is a database, and as anyone who has dealt with those knows, a database is worthless if no one uses it. There are hundreds, probably thousands of different blockchains. If people who work together every day can’t even agree on where to eat lunch, how is everyone in the world going to agree on a single blockchain for any given application? It’s not going to happen.

Ridiculous bandwidth and storage requirements

Right now blockchain is being touted as a security panacea, especially for IoT. There’s just one big problem: IoT devices have small storage and bandwidth capacity, and blockchain requires enormous amounts of storage and bandwidth.

Inconvenient but not more secure

Security always requires giving up some convenience. But the inverse isn’t necessarily true. When I go to pay for my coffee, I can use a piece of plastic, cash, or scan a barcode on my phone. It’s convenient and mostly secure. But if I want to pay with Bitcoin or some other cryptocurrency, I have to drop some bits onto a blockchain and wait minutes or even hours for the hivemind to “confirm” my transaction.

And what benefit do I get in return? Nothing. No, it’s worse than nothing. I lose my ability to dispute the transaction or get a refund because blockchains are designed to be unchangeable (aka immutable).

Controlled by anonymous

Public blockchains are not inherently decentralized. Distributed, yes. Decentralized, no.

When dealing with a credit or debit card, your bank is in charge of keeping track of the transactions. When dealing with cash, keeping up with your spending is entirely up to you. But when it comes to blockchain, thousands of anonymous strangers are in charge of your transactions.

These anonymous strangers are divided into two groups. You’ve got the developers who create and maintain the software required to interact with the blockchain. This gives them the power to change it in any way they see fit, as well as allow or disallow other people to use it (this actually happened recently with the Bitcoin Core/Cash split).

The other group is the people running the nodes which perform validation of blockchain transactions. Ideally this would be a diverse group of honest people spread all over the world. But the reality is that anyone with enough money (e.g. gov’t) can purchase the compute power to comprise the majority of nodes. Whoever controls the majority of nodes controls the blockchain.

This is arguably the biggest strike against public blockchains because it’s not just a theoretical possibility. It’s already happened. 70% of Bitcoin mining is done in China, only 1% in the US.

Ripe for attack

Even if you assume that most people are honest and will operate clean nodes, there’s still the small problem of security. Imagine that former Soviet spies Boris and Natasha develop a worm targeting a particular blockchain implementation like Ethereum. They’re so 1337 that they manage to infect 80% of the nodes, allowing them to inject bogus data into the chain and validate it.

Don’t underestimate the fallout of this. Even if the participants discover the attack quickly, the damage has already been done. Everyone else now has to face the ugly decision of whether to trust a blockchain they know has already been compromised. This isn’t just a theoretical scenario. Something similar already happened with Ethereum. It resulted in the developers forking the Ethereum chain. That’s why we now have two Ethereums (ETH and ETC).

Architected insecurely

The Boris and Natasha scenario might sound a little bit too spy-movie-ish, but the nature of a public blockchain requires it to be open to the internet. This isn’t a private database locked down behind layers of security. It’s a peer-to-peer app that is more than happy to accept your malformed TCP packet.

Does that mean it’s impossible to implement a secure, public blockchain? No. But it does mean that it’s much, much harder than to just use a private database behind more proven layers of security. Once again, why not just use a traditional database? Blockchain doesn’t offer enough of an advantage to outweigh the risks.

It’s Time to Stop Using the Term Network Function Virtualization (NFV)

I think it’s time to stop using the term “network function virtualization”. Why? Because it doesn’t exist, at least not in the way the term suggests. The term is a category error, and when people try to make sense of the term, confusion and frustration ensue.

Think of it like this: what’s the difference between a “virtual network function” and a “non-virtual network function”? For example, how is “virtual IP forwarding” different than “non-virtual IP forwarding?” Answer: it’s not.

So what then exactly is network function virtualization?

The Right Idea, The Wrong Term

The European Telecommunications Standards Institute, which arguably coined the term NFV, said the following in a 2012 whitepaper (emphasis mine):

Network Functions Virtualisation aims to address these problems by leveraging standard IT virtualisation technology to consolidate many network equipment types onto industry standard high volume servers

Look at the bold text. How does one consolidate many network equipment types onto commodity servers? Let’s add some specifics to make it more concrete. How does one consolidate a firewall, router, switch, and load-balancer onto a server? By implementing those network functions in software and putting that software on the server.

But here’s the problem with calling that “network function virtualization”: virtualization has nothing to do with implementing network functions in software. In the early days of the Internet, routers (gateways as they were called back then) ran on commodity x86 machines with no virtualization (with the exception, maybe, of virtual memory).

Network functions don’t need virtualizing, and in fact, can’t be virtualized. But the term NFV suggests otherwise.

And that’s where the confusion started….

NFV is like dividing by zero: undefined

Conceptually, NFV is just implementing network functions in software. That’s easy enough to understand. And yet it’s hard to find an actual definition of it anywhere. Instead, you’ll see a lot of hand-wavy things like this:

NFV is a virtual networking concept…
NFV is a network architecture concept that uses the technologies of IT virtualization…

Hence the letters “N” and “V”. And then you have those who gave up on a definition and just went straight for the marketing lingo:

NFV is the next step…
…is the future…
…is the progression/evolution…

Others get closer by hinting at what NFV does, but stop short of actually saying what it is:

NFV consolidates multiple network functions onto industry standard equipment

This seems to be pretty close, but where’s the virtualization part come in? Let’s try this blurb from Angela Karl at TechGenix:

[NFV lets] service providers and operators… abstract network services, including things such as load balancing, into a software that can run on basic server.

Bingo. NFV is not virtualizaton at all. It’s an abstraction of network functions!

NFV is Abstraction, not Virtualization

Before you accuse me of splitting hairs, let me explain the distinction between virtualization and abstraction. Put simply, virtualization is an imitation, while abstraction is a disguise.

Virtualization is an imitation

When you virtualize something, you’re creating an imitation of the thing you’re virtualizing.

For example, when you create a virtual disk in your favorite hypervisor, you’re hiding the characteristics of the underlying storage (disk geometry, partition info, formatting, interface, etc.). But in the same motion, you give the virtual disk the same types of characteristics: disk geometry, partition info, formatting, interface, and so on. To put it in programming lingo, the properties are the same, but the values are different.

Virtualization preserves the underlying properties and doesn’t add any property that’s not already there. Have you ever pinged a virtual disk? Probably not, because virtual disks, like real disks, don’t have network stacks.

Virtualization also preserves the behavior of the thing being virtualized. That’s why you can “shut down” and “power off” virtual machines and “format” and “repartition” virtual disks.

Now try fitting NFV into this definition of virtualization. How do you “virtually route” or “virtually block” a packet? It’s a category error.

Abstraction is a disguise

When you create an abstraction, you’re creating a disguise. Unlike virtualization, with abstraction you’re changing some of the properties of the thing you’re abstracting. You’re taking something and dressing it up to look and act completely different.

Swap space is a good example of an abstraction. It’s data on storage that looks and acts like random access memory (but way slower). Before the days of SSDs, swap was stored on spinning disks which were read and written sequentially. This is completely different than memory which can be read and written randomly. Swap space is a file (Windows) or partition (Linux) disguised as RAM.

The Case for Abstracting Network Functions

Let’s bring this around to networking. What’s it mean to abstract network functions like IP routing and traffic filtering? More importantly, why would you want to? Why not just use virtual routers, switches, and firewalls?

Simply put, virtualized network devices don’t scale. The reasons for this are too numerous to list here, but suffice it to say that TCP/IP and Ethernet networks have a lot of built-in waste and aren’t the most efficient. This is why cloud providers do network function abstraction to an extreme. It’s utterly necessary. Let’s take Amazon AWS as an example.

In AWS, an instance has a virtual network interface. But what’s that virtual network interface connected to? A virtual switch? Nope. Virtual router? Try again. A virtual firewall. Negative. Virtual routers, switches, and firewalls don’t exist on the AWS platform. So the question remains: what’s that virtual NIC connected to?

The answer: nothing. The word “connected” here is a virtual concept borrowed from the real world. You “connect” NICs to switches. In your favorite hypervisor, you “connect” a vNIC to a vSwitch.

But there are no virtual switches or routers in this cloud. They’ve been abstracted into network functions. AWS presents this as if you’re connecting a virtual interface to a “subnet” rather than a router. That’s because AWS has abstracted IP routing away from you, leaving you with nothing to “connect” to. After all, we’re dealing with data. Not devices. Not even virtual devices. So what happens? The virtual NIC passes its traffic to some software that performs network functions. This software does a number of things:

  • Switching – It looks at the Ethernet frame and checks the destination MAC address. If the frame contains an ARP request seeking the default gateway, it replies.
  • Traffic Filtering – If it’s a unicast for the default gateway, it looks at the IP header and checks the destination against the security group rules, NACLs, and routing rules.
  • Routing – If it needs to forward the packet, it forwards it (although forwarding may simply consist of passing it off to another function.)

This is a massive oversimplification, of course, but you get the idea. There’s no reason to “virtualize” anything here because all you’re doing is manipulating bits!

Overvirtualizing the Network

It’s possible to over-virtualize. To give an analogy, suppose you wanted to write a calculator application (let’s call it a virtual calculator). You’d draw a little box with numbers and operators, and let the user click the buttons to perform a calculation. Now imagine that you also decided to write a “virtual hand” application that virtually pressed buttons on the virtual calculator. That would be ridiculous, but that’s essentially what happens when you connect two virtual network devices together.

There an especially great temptation to do this in the cloud. Folks may spin up virtual firewalls, cluster them together, connect them to virtual load-balancers, IDSes, and whatnot. That’s not bad or technically wrong, but in many cases it’s just unnecessary. All of those network functions can be performed in software, without the additional complexity of virtual NICs connecting to this and that.

The Difference Between a Virtual Network Device and a Network Function

When it comes to the cloud, it’s not always clear what you’re looking at. Here are some questions I ask to figure out whether a thing in the cloud is a virtual device or just a abstracted network function:

Is there an obvious real world analog?

There’s a continuum here. An instance has a clear real world analog: a virtual machine. An Internet gateway sounds awfully like the router your ISP puts at your site, but “connecting” to it is a bit hand-wavy. You don’t get a next-hop IP or interface. Instead, your next hop is igw- followed by some gibberish. That smacks of an abstraction to me.

Can you view the MAC address table or create bogus ARP entries?

If  you can, it’s a virtual device (maybe just a Linux VM). If not, it’s likely some voodoo done in software.

Can you blackhole routes?

In AWS you can create blackhole routes, although people usually do it by accident. You can create a route with an internet gateway as a next hop, then delete the gateway. But can you create a route pointing to null0? If not, you have an abstraction, not a virtual device.

Does the TTL get decremented at each hop?

A TTL in an overlay can get decremented based on the hops in the underlay. But what I’m talking about here is not decrementing the TTL when you normally would. AWS doesn’t decrement the TTL at each hop. If you were to get into a routing loop, you’d have a nasty problem. Hence, AWS doesn’t allow transitive routing through its VPCs. So if your TTLs don’t go down at each hop, as with AWS, you’re probably dealing with an abstraction.