I think it’s time to stop using the term “network function virtualization”. Why? Because it doesn’t exist, at least not in the way the term suggests. The term is a category error, and when people try to make sense of the term, confusion and frustration ensue.
Think of it like this: what’s the difference between a “virtual network function” and a “non-virtual network function”? For example, how is “virtual IP forwarding” different than “non-virtual IP forwarding?” Answer: it’s not.
So what then exactly is network function virtualization?
The Right Idea, The Wrong Term
The European Telecommunications Standards Institute, which arguably coined the term NFV, said the following in a 2012 whitepaper (emphasis mine):
Network Functions Virtualisation aims to address these problems by leveraging standard IT virtualisation technology to consolidate many network equipment types onto industry standard high volume servers
Look at the bold text. How does one consolidate many network equipment types onto commodity servers? Let’s add some specifics to make it more concrete. How does one consolidate a firewall, router, switch, and load-balancer onto a server? By implementing those network functions in software and putting that software on the server.
But here’s the problem with calling that “network function virtualization”: virtualization has nothing to do with implementing network functions in software. In the early days of the Internet, routers (gateways as they were called back then) ran on commodity x86 machines with no virtualization (with the exception, maybe, of virtual memory).
Network functions don’t need virtualizing, and in fact, can’t be virtualized. But the term NFV suggests otherwise.
And that’s where the confusion started….
NFV is like dividing by zero: undefined
Conceptually, NFV is just implementing network functions in software. That’s easy enough to understand. And yet it’s hard to find an actual definition of it anywhere. Instead, you’ll see a lot of hand-wavy things like this:
NFV is a virtual networking concept…
NFV is a network architecture concept that uses the technologies of IT virtualization…
Hence the letters “N” and “V”. And then you have those who gave up on a definition and just went straight for the marketing lingo:
NFV is the next step…
…is the future…
…is the progression/evolution…
Others get closer by hinting at what NFV does, but stop short of actually saying what it is:
NFV consolidates multiple network functions onto industry standard equipment
This seems to be pretty close, but where’s the virtualization part come in? Let’s try this blurb from Angela Karl at TechGenix:
[NFV lets] service providers and operators… abstract network services, including things such as load balancing, into a software that can run on basic server.
Bingo. NFV is not virtualizaton at all. It’s an abstraction of network functions!
NFV is Abstraction, not Virtualization
Before you accuse me of splitting hairs, let me explain the distinction between virtualization and abstraction. Put simply, virtualization is an imitation, while abstraction is a disguise.
Virtualization is an imitation
When you virtualize something, you’re creating an imitation of the thing you’re virtualizing.
For example, when you create a virtual disk in your favorite hypervisor, you’re hiding the characteristics of the underlying storage (disk geometry, partition info, formatting, interface, etc.). But in the same motion, you give the virtual disk the same types of characteristics: disk geometry, partition info, formatting, interface, and so on. To put it in programming lingo, the properties are the same, but the values are different.
Virtualization preserves the underlying properties and doesn’t add any property that’s not already there. Have you ever pinged a virtual disk? Probably not, because virtual disks, like real disks, don’t have network stacks.
Virtualization also preserves the behavior of the thing being virtualized. That’s why you can “shut down” and “power off” virtual machines and “format” and “repartition” virtual disks.
Now try fitting NFV into this definition of virtualization. How do you “virtually route” or “virtually block” a packet? It’s a category error.
Abstraction is a disguise
When you create an abstraction, you’re creating a disguise. Unlike virtualization, with abstraction you’re changing some of the properties of the thing you’re abstracting. You’re taking something and dressing it up to look and act completely different.
Swap space is a good example of an abstraction. It’s data on storage that looks and acts like random access memory (but way slower). Before the days of SSDs, swap was stored on spinning disks which were read and written sequentially. This is completely different than memory which can be read and written randomly. Swap space is a file (Windows) or partition (Linux) disguised as RAM.
The Case for Abstracting Network Functions
Let’s bring this around to networking. What’s it mean to abstract network functions like IP routing and traffic filtering? More importantly, why would you want to? Why not just use virtual routers, switches, and firewalls?
Simply put, virtualized network devices don’t scale. The reasons for this are too numerous to list here, but suffice it to say that TCP/IP and Ethernet networks have a lot of built-in waste and aren’t the most efficient. This is why cloud providers do network function abstraction to an extreme. It’s utterly necessary. Let’s take Amazon AWS as an example.
In AWS, an instance has a virtual network interface. But what’s that virtual network interface connected to? A virtual switch? Nope. Virtual router? Try again. A virtual firewall. Negative. Virtual routers, switches, and firewalls don’t exist on the AWS platform. So the question remains: what’s that virtual NIC connected to?
The answer: nothing. The word “connected” here is a virtual concept borrowed from the real world. You “connect” NICs to switches. In your favorite hypervisor, you “connect” a vNIC to a vSwitch.
But there are no virtual switches or routers in this cloud. They’ve been abstracted into network functions. AWS presents this as if you’re connecting a virtual interface to a “subnet” rather than a router. That’s because AWS has abstracted IP routing away from you, leaving you with nothing to “connect” to. After all, we’re dealing with data. Not devices. Not even virtual devices. So what happens? The virtual NIC passes its traffic to some software that performs network functions. This software does a number of things:
- Switching – It looks at the Ethernet frame and checks the destination MAC address. If the frame contains an ARP request seeking the default gateway, it replies.
- Traffic Filtering – If it’s a unicast for the default gateway, it looks at the IP header and checks the destination against the security group rules, NACLs, and routing rules.
- Routing – If it needs to forward the packet, it forwards it (although forwarding may simply consist of passing it off to another function.)
This is a massive oversimplification, of course, but you get the idea. There’s no reason to “virtualize” anything here because all you’re doing is manipulating bits!
Overvirtualizing the Network
It’s possible to over-virtualize. To give an analogy, suppose you wanted to write a calculator application (let’s call it a virtual calculator). You’d draw a little box with numbers and operators, and let the user click the buttons to perform a calculation. Now imagine that you also decided to write a “virtual hand” application that virtually pressed buttons on the virtual calculator. That would be ridiculous, but that’s essentially what happens when you connect two virtual network devices together.
There an especially great temptation to do this in the cloud. Folks may spin up virtual firewalls, cluster them together, connect them to virtual load-balancers, IDSes, and whatnot. That’s not bad or technically wrong, but in many cases it’s just unnecessary. All of those network functions can be performed in software, without the additional complexity of virtual NICs connecting to this and that.
The Difference Between a Virtual Network Device and a Network Function
When it comes to the cloud, it’s not always clear what you’re looking at. Here are some questions I ask to figure out whether a thing in the cloud is a virtual device or just a abstracted network function:
Is there an obvious real world analog?
There’s a continuum here. An instance has a clear real world analog: a virtual machine. An Internet gateway sounds awfully like the router your ISP puts at your site, but “connecting” to it is a bit hand-wavy. You don’t get a next-hop IP or interface. Instead, your next hop is
igw- followed by some gibberish. That smacks of an abstraction to me.
Can you view the MAC address table or create bogus ARP entries?
If you can, it’s a virtual device (maybe just a Linux VM). If not, it’s likely some voodoo done in software.
Can you blackhole routes?
In AWS you can create blackhole routes, although people usually do it by accident. You can create a route with an internet gateway as a next hop, then delete the gateway. But can you create a route pointing to
null0? If not, you have an abstraction, not a virtual device.
Does the TTL get decremented at each hop?
A TTL in an overlay can get decremented based on the hops in the underlay. But what I’m talking about here is not decrementing the TTL when you normally would. AWS doesn’t decrement the TTL at each hop. If you were to get into a routing loop, you’d have a nasty problem. Hence, AWS doesn’t allow transitive routing through its VPCs. So if your TTLs don’t go down at each hop, as with AWS, you’re probably dealing with an abstraction.