It’s Time to Stop Using the Term Network Function Virtualization (NFV)

I think it’s time to stop using the term “network function virtualization”. Why? Because it doesn’t exist, at least not in the way the term suggests. The term is a category error, and when people try to make sense of the term, confusion and frustration ensue.

Think of it like this: what’s the difference between a “virtual network function” and a “non-virtual network function”? For example, how is “virtual IP forwarding” different than “non-virtual IP forwarding?” Answer: it’s not.

So what then exactly is network function virtualization?

The Right Idea, The Wrong Term

The European Telecommunications Standards Institute, which arguably coined the term NFV, said the following in a 2012 whitepaper (emphasis mine):

Network Functions Virtualisation aims to address these problems by leveraging standard IT virtualisation technology to consolidate many network equipment types onto industry standard high volume servers

Look at the bold text. How does one consolidate many network equipment types onto commodity servers? Let’s add some specifics to make it more concrete. How does one consolidate a firewall, router, switch, and load-balancer onto a server? By implementing those network functions in software and putting that software on the server.

But here’s the problem with calling that “network function virtualization”: virtualization has nothing to do with implementing network functions in software. In the early days of the Internet, routers (gateways as they were called back then) ran on commodity x86 machines with no virtualization (with the exception, maybe, of virtual memory).

Network functions don’t need virtualizing, and in fact, can’t be virtualized. But the term NFV suggests otherwise.

And that’s where the confusion started….

NFV is like dividing by zero: undefined

Conceptually, NFV is just implementing network functions in software. That’s easy enough to understand. And yet it’s hard to find an actual definition of it anywhere. Instead, you’ll see a lot of hand-wavy things like this:

NFV is a virtual networking concept…
NFV is a network architecture concept that uses the technologies of IT virtualization…

Hence the letters “N” and “V”. And then you have those who gave up on a definition and just went straight for the marketing lingo:

NFV is the next step…
…is the future…
…is the progression/evolution…

Others get closer by hinting at what NFV does, but stop short of actually saying what it is:

NFV consolidates multiple network functions onto industry standard equipment

This seems to be pretty close, but where’s the virtualization part come in? Let’s try this blurb from Angela Karl at TechGenix:

[NFV lets] service providers and operators… abstract network services, including things such as load balancing, into a software that can run on basic server.

Bingo. NFV is not virtualizaton at all. It’s an abstraction of network functions!

NFV is Abstraction, not Virtualization

Before you accuse me of splitting hairs, let me explain the distinction between virtualization and abstraction. Put simply, virtualization is an imitation, while abstraction is a disguise.

Virtualization is an imitation

When you virtualize something, you’re creating an imitation of the thing you’re virtualizing.

For example, when you create a virtual disk in your favorite hypervisor, you’re hiding the characteristics of the underlying storage (disk geometry, partition info, formatting, interface, etc.). But in the same motion, you give the virtual disk the same types of characteristics: disk geometry, partition info, formatting, interface, and so on. To put it in programming lingo, the properties are the same, but the values are different.

Virtualization preserves the underlying properties and doesn’t add any property that’s not already there. Have you ever pinged a virtual disk? Probably not, because virtual disks, like real disks, don’t have network stacks.

Virtualization also preserves the behavior of the thing being virtualized. That’s why you can “shut down” and “power off” virtual machines and “format” and “repartition” virtual disks.

Now try fitting NFV into this definition of virtualization. How do you “virtually route” or “virtually block” a packet? It’s a category error.

Abstraction is a disguise

When you create an abstraction, you’re creating a disguise. Unlike virtualization, with abstraction you’re changing some of the properties of the thing you’re abstracting. You’re taking something and dressing it up to look and act completely different.

Swap space is a good example of an abstraction. It’s data on storage that looks and acts like random access memory (but way slower). Before the days of SSDs, swap was stored on spinning disks which were read and written sequentially. This is completely different than memory which can be read and written randomly. Swap space is a file (Windows) or partition (Linux) disguised as RAM.

The Case for Abstracting Network Functions

Let’s bring this around to networking. What’s it mean to abstract network functions like IP routing and traffic filtering? More importantly, why would you want to? Why not just use virtual routers, switches, and firewalls?

Simply put, virtualized network devices don’t scale. The reasons for this are too numerous to list here, but suffice it to say that TCP/IP and Ethernet networks have a lot of built-in waste and aren’t the most efficient. This is why cloud providers do network function abstraction to an extreme. It’s utterly necessary. Let’s take Amazon AWS as an example.

In AWS, an instance has a virtual network interface. But what’s that virtual network interface connected to? A virtual switch? Nope. Virtual router? Try again. A virtual firewall. Negative. Virtual routers, switches, and firewalls don’t exist on the AWS platform. So the question remains: what’s that virtual NIC connected to?

The answer: nothing. The word “connected” here is a virtual concept borrowed from the real world. You “connect” NICs to switches. In your favorite hypervisor, you “connect” a vNIC to a vSwitch.

But there are no virtual switches or routers in this cloud. They’ve been abstracted into network functions. AWS presents this as if you’re connecting a virtual interface to a “subnet” rather than a router. That’s because AWS has abstracted IP routing away from you, leaving you with nothing to “connect” to. After all, we’re dealing with data. Not devices. Not even virtual devices. So what happens? The virtual NIC passes its traffic to some software that performs network functions. This software does a number of things:

  • Switching – It looks at the Ethernet frame and checks the destination MAC address. If the frame contains an ARP request seeking the default gateway, it replies.
  • Traffic Filtering – If it’s a unicast for the default gateway, it looks at the IP header and checks the destination against the security group rules, NACLs, and routing rules.
  • Routing – If it needs to forward the packet, it forwards it (although forwarding may simply consist of passing it off to another function.)

This is a massive oversimplification, of course, but you get the idea. There’s no reason to “virtualize” anything here because all you’re doing is manipulating bits!

Overvirtualizing the Network

It’s possible to over-virtualize. To give an analogy, suppose you wanted to write a calculator application (let’s call it a virtual calculator). You’d draw a little box with numbers and operators, and let the user click the buttons to perform a calculation. Now imagine that you also decided to write a “virtual hand” application that virtually pressed buttons on the virtual calculator. That would be ridiculous, but that’s essentially what happens when you connect two virtual network devices together.

There an especially great temptation to do this in the cloud. Folks may spin up virtual firewalls, cluster them together, connect them to virtual load-balancers, IDSes, and whatnot. That’s not bad or technically wrong, but in many cases it’s just unnecessary. All of those network functions can be performed in software, without the additional complexity of virtual NICs connecting to this and that.

The Difference Between a Virtual Network Device and a Network Function

When it comes to the cloud, it’s not always clear what you’re looking at. Here are some questions I ask to figure out whether a thing in the cloud is a virtual device or just a abstracted network function:

Is there an obvious real world analog?

There’s a continuum here. An instance has a clear real world analog: a virtual machine. An Internet gateway sounds awfully like the router your ISP puts at your site, but “connecting” to it is a bit hand-wavy. You don’t get a next-hop IP or interface. Instead, your next hop is igw- followed by some gibberish. That smacks of an abstraction to me.

Can you view the MAC address table or create bogus ARP entries?

If  you can, it’s a virtual device (maybe just a Linux VM). If not, it’s likely some voodoo done in software.

Can you blackhole routes?

In AWS you can create blackhole routes, although people usually do it by accident. You can create a route with an internet gateway as a next hop, then delete the gateway. But can you create a route pointing to null0? If not, you have an abstraction, not a virtual device.

Does the TTL get decremented at each hop?

A TTL in an overlay can get decremented based on the hops in the underlay. But what I’m talking about here is not decrementing the TTL when you normally would. AWS doesn’t decrement the TTL at each hop. If you were to get into a routing loop, you’d have a nasty problem. Hence, AWS doesn’t allow transitive routing through its VPCs. So if your TTLs don’t go down at each hop, as with AWS, you’re probably dealing with an abstraction.

 

Why People Haven’t Adopted IPv6 (And Why You Should Learn It Anyway)

It’s 2017, and if you haven’t learned IPv6 yet, well, you’re not the only one. In December 2016, IPv6 (as we know it today) turned 18 years old. Children who were in the womb when RFC 2460 was being drafted are now old enough to vote, get married, and purchase firearms in some states.

In honor of IPv6’s 18th birthday, allow me to share my theories on why people have been so slow to adopt IPv6. And why you still should consider learning it.

The “Lame name” theory

IPv6 terminology makes it sound like a new version of IPv4 and it’s not. It’s a totally different protocol with a similar name. If you’re familiar with the confusion between Java and JavaScript, you know what I’m talking about. People who set out to learn IPv6 are disappointed when they find out it’s almost nothing like IPv4.

The “Let’s split DHCP in half and spread its most popular functions across two protocols” theory

DHCP for IPv4 can provide clients with IP addresses, DNS servers, default gateways, TFTP servers, and pretty much anything else. DHCPv6 doesn’t have an option for providing a default gateway. If you want to push a default gateway to IPv6 clients, you have to use SLAAC.

The “all things to all people, places, animals, plants” theory

IPv4 has only a few address types that anyone actually uses. Colloquially, they’re public, private (RFC 1918 addresses like 192.168.1.1), and multicast (which includes broadcast). IPv6 has approximately one zillion different address types, including unique-local, link-local, unspecified, and global unicast. Although there are technical justifications for some of these, the plethora of address types makes no sense to anyone who doesn’t deeply understand why “layer 2” is even in the IT lexicon.

The “IPv4 apocalypse” theory

We’ve all heard the constant chicken-little talk about how we have to move to IPv6 yesterday or the internet will die. Driving this is the myth that all IPv4 addresses are gone. They’re not, and the U.S. government is sitting on tens of thousands it’s never going to use. What really happened was that in 2011, the Internet Assigned Numbers Authority (IANA) assigned the last of its available IP address space to regional internet registries (RIRs) which are responsible for doling out addresses. But the IPv4 addresses didn’t just go away. They still exist, and many of them are unused and can be reassigned.

The “NAT is a tool of the devil” theory

If you ever want to have fun, go on any IT forum and ask, “Why do we need IPv6 when we have NAT?” Actually, don’t. That would be trolling. But if you were to ask that question, you’d probably get a few responses hating on IPv4 NAT as a tool of the devil, which IPv6 will save us from… except it does NAT, too.

The “Why do I need both again?” theory

Implementing IPv6 almost always requires a multihomed (dual-stack) implementation, which people figured out about 30 years ago was a bad idea with IPv4 because it confuses everybody. IT admins translate this as, “More work for me.”

The “Because we can” theory

There are enough IPv6 addresses for every cell in your body to have its own internet. Seriously? This, like NAT, is another non-reason to adopt it. Yes, it’s cool that I can give my Uncle Milton’s ant farm its own Internet. But as far as business justification goes, nope.

Why you might want to learn IPv6 (hint: money)

Although IPv6 has been poorly marketed, it’s still worth learning. In fact, I believe in IPv6 so strongly that I’ve created several Pluralsight courses on configuring and troubleshooting it.

Here are three big reasons to consider adding IPv6 to your set of skills:

  • It’s like a sports team. The big boys are rooting for it. I’m talking about Cisco, Juniper, ISPs, Google, et alia. They want to see it win, and they’ll pay to make it happen. If you know IPv6, you can be on the receiving end of some of those payments.
  • The confusion and complexity around IPv6 has made experts that much more valuable to companies who have already invested in IPv6 infrastructure.
  • If you know IPv4, IPv6 isn’t that hard to learn once you realize that it’s a distinct protocol and not a new version of IPv4.

For further IPv6 learning, check out my Pluralsight courses:

Troubleshooting IPv6 at the desktop:

Practical Networking

Configuring and troubleshooting IPv6 on Cisco routers:

Basic Networking for CCNP Routing and Switching 300-101 ROUTE
Troubleshooting Cisco Networks: IPv6 Routing Protocols for CCNP R&S 300-135 TSHOOT

You failed your CCNP exam. Now what?

You took one of the Cisco CCNP Routing and Switching certification exams. You went to the exam center, sat down, and started the exam. About 2 hours later, you saw the dreaded news appear on the screen:

You didn’t pass.

I’ve failed certification exams in the past, so I can relate to the facepalm-worthy feeling you get when you realize you dropped a couple of Benjamins on an exam that you just failed. I know the feeling of wanting to give up, the thoughts of thinking that this whole certification thing is stupid, and the desire to assign blame to whomever or whatever led to your failure.

Failing certification exams is a reality of any IT professional. And from what I’ve seen, sadly, not many people handle failure very well. I want to talk through this.

This isn’t meant to be a pep talk or a “you’ll do better next time” motivational speech. Neither is it meant to be an assignment of blame to you or anyone else. Rather it’s a cold, hard look at why you failed, and how you can pass next time.. or the time after that.

Why you failed

I’ve taken a lot of Cisco certification exams and read a lot of Cisco books over the years and I’ve noticed a pattern. Cisco likes to play off of common misconceptions and little known technical facts. Here’s a non-real but representative example:

Two switches are connected via an 802.1Q trunk. You delete the switched virtual interface for VLAN 1 but both switches still exchange CDP messages. What will prevent CDP messages from traversing VLAN 1 without affecting Cisco IP phones?

Select the best answer:

A. Prune VLAN1 from the trunk

B. Disable VLAN1

C. Disable CDP globally

D. Disable CDP on the trunk

E. None of these

If you’ve watched my Pluralsight course series on the CCNP SWITCH exam, you’ll recall that you can’t disable VLAN1 or prune it from a trunk. Well, you can try to prune it, but CDP messages will still pass. But do you disable CDP globally or just on the trunk interface? This is where obscure knowledge comes in. Cisco IP phones use CDP to get voice VLAN information, so disabling CDP globally is out. That leaves only two answers: disable CDP on the trunk interface or none of the above. Disabling CDP on the trunk interfaces will certainly stop the CDP messages from moving between the switches, and it won’t affect Cisco IP phones since CDP messages never leave a collision domain.

Now here’s the thing: I made that question and answer up on the fly. You have to be able to do that if you want to do well on the exam.

The exam blueprint is like The Oracle, and sometimes just as wrong

If you remember The Matrix movies, you’ll remember the Oracle, a computer program that supposedly knows all. After seeing the Oracle for the first time, Neo asks Morpheus how accurate the Oracle’s “prophecies” are. Morpheus responds with something to the effect of, “Try not to think of it in terms of right and wrong. The Oracle is a guide to help you find the path.” Not surprisingly, it turned out the Oracle was kinda wrong on some stuff.

Well, the blueprint is a lot like that. It has stuff that never shows up on any exam. This is mainly because if the exam covered the entire blueprint, it would be 8 hours long. It also leaves off some topics that do appear on the exam. The lesson here is don’t depend on the exam blueprint. Make sure you know the topics for prerequisite and related exams. If you’re taking CCNP SWITCH, make sure you know the topics for ROUTE. If you’re taking TSHOOT, make sure you know ROUTE and SWITCH. Of course, make sure you know all the CCNA R&S topics upside down and backwards.

Each exam blueprint is a guide. It’s a guide to the other exam blueprints.

How to pass next time.. or the time after

If you’ve already taken a CCNP exam, the next time you go in to take the same exam, you’re technically “brain dumping” parts of it. I’m not talking about cheating. I mean you’ve seen the exam already, and you have a feel for what the questions are like. If you’ve got lots of time and money, you can take the same exam over and over again, getting slightly better each time until you pass. I don’t recommend this strategy, not just because it’s expensive, but because it puts you in the super awkward situation of telling others how many times you took the exam. Trying until you pass is respectable, but you should have some serious expertise to show for it. If I’m interviewing you and it took you 5 tries to pass a CCNP exam, I’m going to grill you hard on the technical questions.

If you want to have a great chance of passing the next time, then study for the certification one step higher than the one you want to attain. If you’re studying for the CCNA, act like you’re studying for the CCNP. If you want the CCNP, act like you’re studying for the CCIE. Obviously the topics are different. You don’t need to study multicast in-depth for your CCNP. But for the topics that overlap, it’s better to overshoot than aim for the bare minimum.

New book! Learn Cisco Network Administration in a Month of Lunches

The pre-release of my new book, Learn Cisco Network Administration in a Month of Lunches, is available from Manning Publications’ early access program.

The book is a tutorial designed for beginners who want to learn how to administer Cisco switches and routers. Set aside a portion of your lunch hour every day for a month, and you’ll start learning practical Cisco Network administration skills faster than you ever thought possible.

Citrix Web Interface 5.4: Error occurred while making the requested connection

I recently ran into a bizarre issue with users not being able to launch applications from a very old Citrix Presentation Server 4.0 farm when trying to launch from Citrix Web Interface 5.4. They were getting the eminently unhelpful, “An error occurred while making the requested connection.”

In the web interface application logs, I noticed this:

An error of type IMA with an error ID of 0x80000003 was reported from the Citrix XML Service at address (servername)

And this:

The farm MyFarm has been configured to use launch references, but a launch reference was not received from the Citrix XML Service. Check that the farm supports launch references or disable launch reference requests.

To resolve this, I modified C:\inetpub\wwwroot\Citrix\XenApp\conf\WebInterface.conf on the Web Interface servers and changed the RequireLaunchReference directive as follows:
RequireLaunchReference=Off
(It was set to On)

And it worked. Supposedly, that directive must be set to Off when using Web Interface 5.4 with PS 4.0. But, I’ve been running for years with it set to On and it worked fine until recently. Another Citrix mystery.

Want more Citrix tips and tricks? Watch my course Citrix NetScaler 10: Design and Deployment!

Net Neutrality is a Scam

One of the biggest scams of the Internet is in full swing right now. You may have heard of it. It’s called “net neutrality.”

Fundamentally, net neutrality is about preventing Internet service providers (ISPs) from throttling or blocking traffic or providing paid prioritization of certain content. In addition, specific rules proposed by the FCC Chairman Tom Wheeler would allow the FCC to arbitrate peering disputes between carriers. Traditionally, carriers have connected each other’s networks with each other for a nominal cost or none at all. The idea being that the mutual benefit of using each other’s network for transit is payment enough. The proposed FCC rules, however, will turn this once amicable transaction into a litigious battleground that could result in the destabilization of the Internet’s backbone.

I recall an article from a 1997 issue of Wired magazine which predicted the collapse of the Internet would be caused by increased growth without the infrastructure to support it. That never happened, in part due to technical innovation which kept up with growth, but also because ISPs and backbone carriers were able to throttle traffic during peak times to ensure everyone could have reasonably fast and reliable internet access.

Now, almost 20 years later, we’re looking at potential regulation that will micromanage how ISPs manage and build out their networks. As a network engineer, I understand the need to throttle or simply block certain types of traffic. But unfortunately, the technical facts have gotten lost amidst the raw politicization of the net neutrality debate. I recently saw a graphic put out by the pro-net neutrality group “Battle for the Net” that shows a picture of the United States Senate and a caption that asks, “Does your state have the Internet’s worst enemy?” It then proceeds to list all the Senators that are supposedly trying to “kill Net Neutrality.” And this is the problem with the net neutrality movement. It’s purely political and devoid of any thoughtful technical or practical discussion. Organizations like Battle for the Net don’t bother to make a case for net neutrality. They assume that it is an absolute good and that being for the Internet means being for net neutrality. The discussion has devolved from a debate into a marketing battle plagued by word games and politics. Net neutrality advocates have adopted the language that this is “a battle for the Internet” and an effort to “keep the internet open.” Apparently, by breaking decades of precedent and giving the FCC more power to control what Internet service providers do, the Internet will somehow become better. The narrative they put forth is that the big bad cable companies with their zillions of dollars are trying to make end users’ Internet experience slow and expensive, and are fighting valiant efforts to “keep the internet free” (Nevermind the fact that the cable companies gave us broadband Internet and brought us out of the dial-up era to begin with.) This David versus Goliath theme is great for stirring emotions, but it falls flat in the face of a little bit of scrutiny. Google, whose income is more than double that of Comcast, is strongly in favor of “net neutrality” regulations. So is Netflix. And Facebook. Regardless of where you stand on net neutrality, one thing is certain: this is not about big money corporations versus the gentle folks of the Internet. It is about giant corporations duking it out for power, control, and government favor. As usual, the politics of net neutrality has turned the debate into more of a sporting event where everyone roots for his own team no matter what. But it’s actually worse than that. If you’re against net neutrality, some will perceive you as being anti-Internet or against Internet freedom. I find this both amusing and disturbing. Amusing, because the notion that giving the FCC unprecedented regulatory power over the Internet will somehow increase freedom to be absurd. And disturbing, because so many have blindly taken sides on this debate without an understanding of its implications or what it’s even about.

One such implication is privacy. How will the FCC ensure that ISPs are complying with the new regulations and not throttling or blocking certain types of traffic? The only way to know is by looking at the traffic, which can only be done with detailed logs of what an ISP’s users are doing. This goes beyond what websites you visited or how many gigabytes you downloaded. This gets down to individual connections. What IP address and port did you connect to? What protocol were you using? Certainly, these things can be logged now, and in fact probably are. But the difference is that, as of now, the FCC has no authority to demand such logs. With net neutrality regulations in place, they will, and they will also have the power to exact fines if ISPs fail to retain logs for a certain period of time. So, you will be able to BitTorrent without restriction, but Uncle Sam is probably going to know about it. Of course, this is already happening with the NSA pretty much spying on everything. But again, the difference is that instead of spying secretly, the collection of your Internet activity will be open and shameless. That may not bother you. Honestly, it doesn’t really bother me. The point is that net neutrality regulations come with some pretty long and tangled strings attached. And it’s wise to unravel them and see where they lead before throwing in your support for the wolf in sheep’s clothing.

How to Make NetApp Use the Correct Interface for iSCSI

If you’re familiar with networking you know that when a device is directly connected to two separate IP networks, traffic destined for one of those networks should egress on the interface that is directly connected to that network. For example, if your storage appliance is directly connected to the 172.16.1.0/24 network, and you want to send a packet to a device with the IP of 172.16.1.55, traffic should egress on the interface connected to that network. Unfortunately, in the case of some NetApp filers, this does not always happen.

I ran into a peculiar issue when trying to force NetApp’s Snapmirror to replicate across a specific interface, only to be met with an ugly “Snapmirror error: cannot connect to source filer (Error: 13102)”. I confirmed with NetApp support that the Snapmirror configuration was correct for what I was trying to accomplish.

To troubleshoot, I started a packet trace on the destination filer using the command:

pktt start all -d /etc

I then kicked off the snapmirror initialization, waited for it to fail, then stopped the packet trace with

pktt stop all

Since I directed the trace files to be placed in /etc on the filer I just browsed to the hidden etc$ CIFS share on the filer and opened the traces in Wireshark. What I found was that the traffic that should have been egressing on the iSCSI VIF was actually going out on the LAN VIF. Not only that, the filer was using its iSCSI address on the LAN VIF! I’m always hesitant to label every quirk a “bug,” but this is definitely not correct behavior.

The remedy was as simple as adding a route statement similar to this:

route add inet 172.16.1.0/24 172.16.2.1 1

where 172.16.1.0/24 is the iSCSI network I want to traverse to reach the Snapmirror partner, and 172.16.2.1 is the gateway on my locally connected iSCSI network. The 1 specifies the cost metric for the route, which will always be 1 unless you need to add additional gateways.

To make the change permanent, simply add the route statement to the /etc/rd file on the filer.

Special thanks to NetApp’s Scott Owens for pointing me in the right direction on this.

Using IRQbalance to Improve Network Throughput in XenServer

If you are running XenServer 5.6 FP1 or later, there is a little trick you can use to improve network throughput on the host.

By default, XenServer uses the netback process to process network traffic, and each host is limited to four instances of netback, with one instance running on each of dom0’s vCPUs. When a VM starts, each of its VIFs (Virtual InterFaces) is assigned to a netback instance in a round-robin fashion. While this results in a pretty even distribution of VIFs-to-netback processes, it is extremely inefficient during times of high network load because the CPU is not being fully utilized.

For example, suppose you have four VMs on a host, with each VM having one VIF each. VM1 is assigned to netback instance 0 which is tied to vCPU0, VM2 is assigned to netback instance 1 which is tied to vCPU1, and so on. Now suppose VM1 experiences a very high network load. Netback instance 1 is tasked with handling all of VM1’s traffic, and vCPU0 is the only vCPU doing work for netback instance 1. That means the other three vCPUs are sitting idle, while vCPU0 does all the work.

You can see this phenomenon for yourself by doing a cat /proc/interrupts from dom0’s console. You’ll see something similar to this:


(The screenshot doesn’t show it, but the first column of highlighted numbers is CPU0, the second is CPU1, and so on. The numbers represent the quantity of interrupt requests.)

If you’ve ever troubleshot obscure networking configurations in the physical world, you’ve probably run into a router or firewall whose CPU was being asked to do so much that it was causing a network slowdown. Fortunately in this case, we don’t have to make any major configuration changes or buy new hardware to fix the problem.

All we need to do to increase efficiency in this scenario is to evenly distribute the VIFs’ workloads across all available CPUs. We could manually do this at the bash prompt, or we could just download and install irqbalance.

irqbalance is a linux daemon that automatically distributes interrupts across all available CPUs and cores. To install it, issue the following command at the dom0 bash prompt:

yum install irqbalance --enablerepo base

You can either restart the host or manually start the service/daemon by issuing:

service irqbalance start

Now restart your VMs and do another cat /proc/interrupts. This time you should see something like this:

That’s much better! Try this out on your test XenServer host(s) first and see if you can tell a difference. Citrix has a whitepaper titled Achieving a fair distribution of the processing of guest network traffic over available physical CPUs (that’s a mouthful) that goes into more technical detail about netback and irqbalance.

Deploying Citrix Access Gateway VPX with Web Interface 5.4 – CAG Setup with RADIUS

Deploying CAG with Web Interface 5.4 is actually very easy, there are just some “gotchas” that you have to be ready for. This is a guide to help you avoid those snags and pitfalls that commonly occur with a CAG VPX and Web Interface integration.

I recommend getting the Citrix Access Gateway VPX Getting Started Guide and HDX Remote Access Guide with Citrix Access Gateway VPX Express if you don’t already have them. The former document contains some inaccuracies but is has some useful reference info as well. The latter takes you through the fundamental setup of the CAG VPX and gets you to the web administration console, where most of the meaty configuration will take place.

There are some assumptions I’m making with this guide since it is based on my own requirements. They are:

  • The CAG VPX has two virtual NICs, external to service external users, and internal for management and communication with the XenApp servers
  • Two logon points will be configured: One that allows user authentication to take place at the web interface, and another that uses RADIUS
  • The CAG VPX will not reside in a DMZ. If your situation requires it to reside in a DMZ, setting it up is trivial once you’ve gotten everything else working.
  • You’ve already got the CAG VPX appliance imported and running, but not configured
  • You have your web interface server setup, with no websites configured.
  • You have installed the CAG VPX license on a Citrix license server

Also, a word of warning: Configuring CAG VPX with Web Interface is like playing chess. Every move you make will affect all of your successive moves. In areas where I suspect your setup might require you to deviate from this guide, I’ll offer some pointers to help you make the right move.

Let’s get started! First, follow the Getting Started guide to configure the management interface for the CAG via the VM console. If you are unsure about a setting, just take the default by hitting Enter.

Once you have your management IP assigned, you’ll need to access the web administration console by browsing to https://[IP address]/lp/adminlogonpoint . Login with the default username and password “admin”. You’ll see a nice dashboard with two dials and some nasty looking red X’s. Click on the Management tab. This will take you to the Networking portion of the System Administration menu group.

Here, enter the CAG’s hostname as an FQDN. You’ll see a list of your network interfaces (eth0, eth1, etc.) with one of them having the management IP address you assigned. To the right, there are four checkboxes labelled Internal, External, Appliance Failover, and Management.

Moving from left to right, the interface that will be used to connect to your XenApp servers should have the Internal checkbox checked.
The interface for management should already have the Management checkbox checked.
The interface that will be receiving external requests from end users should have the External checkbox checked.

If need be, your Internal and External interfaces can be the same. Make sure your DNS servers have an entry for your Internal IP!

Under the Default Gateway section, select the interface the CAG should use to route traffic for subnets to which it is not directly connected. This will probably be your External interface. Remember, the CAG has a direct connection to the subnet your XenApp servers are on, so it doesn’t need a gateway to get to those. But it does need a gateway to get back to your external users who are connecting from the Internet!

Click Save and restart the appliance using the big Restart button on the top right.

Log back into the web administration console and browse to Management > Name Server Providers. Enter your DNS servers and DNS suffixes.

Now go to Static Routes in the System Administration menu. Do you need a static route? If you plan on putting the CAG VPX into a DMZ later, go ahead and enter your static routes. Gateways you specify for static routes will take precedence over the default gateway you specified earlier. Remember, it does not hurt to add them now.

Now Browse down a few rows to Licensing and click Configure. Select the Licensing type and Remote Server for the licensing server. Enter the FQDN or IP of your Citrix licensing server, and click Save. The CAG will attempt to grab its licenses and upon a successful retrieval, it will display them as shown:

NB: If the CAG is unable to retrieve the licenses, I recommend stopping and troubleshooting until it is able to successfully pick up the licenses. You can continue your configuration, but you will not be able to test it until the license issue is resolved.

Moving right along, click on Authentication Profiles under the Access Control menu group. It’s time to add a RADIUS authentication profile! But before you do that, you have to set up a RADIUS server. I recommend reading How to Configure Radius Authentication/Authorization on Windows 2008 for Use on Citrix Access Gateway Standard Edition. (One caveat, however: don’t perform steps 13-17 in the KB article because they’re unnecessary and will cause problems.) Click Add and enter a name for the Authentication Profile. Click New and add your RADIUS server(s) and shared secret. Leave everything under Group Authorization as-is. You’re relying on the RADIUS server to check group authorization.

Now go to XenApp or XenDesktop under Applications and Desktops and enter the IP ranges of clients that can access XenApp servers via ICA and CGP. I really don’t know why there isn’t a checkbox that allows you the equivalent of a “permit ip all”, but there isn’t.

Next, click Secure Ticket Authority. These settings are arguably the most common cause of application launch issues. Select your STA servers carefully, and make sure all your XenApp servers have unique STA ID’s! If you are running Provisioning Services and streaming XenApp, read before proceeding. Once you are sure your STA IDs are unique, click New and enter the FQDN of each XenApp server that will be providing STA services. By default, the connection type is Secure, but I’m guessing your XenApp servers are not using SSL for STA traffic, so select Unsecure. Leave everything else as-is and click Add. Note the servers you selected here, because you will need them later.

At this point you can go ahead and restart the CAG VPX appliance, because it’s now time to do some work on the Web Interface (WI) side. Log into your WI server, launch the Citrix Web Interface Management console, and create a new site.

Should we do the easy one or the hard one first? Trick question. They’re both easy! We’ll set up a site to be used with RADIUS authentication. Click Create Site under the Actions pane on the right, name the website however you wish and click Next. Select “At Access Gateway” as the point of authentication and click Next. Here you’ll be greeted with an intimidating looking Authentication Service URL field. But as I said, this is easy! Just enter https://[cag-FQDN]/CitrixAuthService/AuthService.asmx and click Next, Next. After a few moments, your site will be created.

Right-click the site in the XenApp Web Sites list and select Server Farms. Configure your XenApp servers like you normally would in WI. There is nothing here unique to CAG.

Now right-click the site again and select Secure Access. Select the only item in the list and click Edit. Change Access method to Gateway direct and click Next. Next you’ll be asked for the address of the CAG. Enter the FQDN of the CAG, and optionally enable or disable session reliability. If enabled, you can request tickets from two STAs (A word on this option: When you setup a site for Citrix Receiver, this checkbox must be unchecked. This site you are creating now cannot be used for Receiver, so don’t worry about it here if you plan to use Receiver.) Click Next.

Remember I said to note the XenApp servers you entered into the CAG VPX as your STA servers? Web Interface wants to know about these servers too. Click Add and enter the URL of the first XenApp server you entered into the CAG in the following format:

http://xenapp1.baconfactory.net/scripts/ctxsta.dll

I still do not know why it doesn’t just ask for the FQDN and assume the rest like the CAG does, but that’s how it is. Do this for all STA servers you entered into the CAG, and in the same order. Check, double check, and triple check the URLs! Also optionally change the “Bypass failed servers for” option to 1 minute. Click Finish.

Are we done? Almost. The CAG should be back up now, so log back into it. Click Logon Points under Access Control and click New. Now don’t be intimidated by all the settings. We only care about four things here. Enter the name of the logon point. Select this name carefully because it is what users will have to type in to connect to the CAG. If you enter “cag-logonpoint1” then users will have to go to https://yourcag.yourdomain.com/lp/cag-logonpoint1 which just looks ugly! Under Type select Basic. In the Web Interface field, enter the URL of the web interface site you created (no trailing slash). Under Authentication Profiles, select the RADIUS authentication profile you created earlier. Finally, check the “Single sign-on to web interface” check box and click Save.

Now, we are almost ready to test. But to save ourselves from a disappointing moment of temporary CAG dysfunction, click on Secure Ticket Authority again. Do you see unique STA IDs populated next to each of your XenApp servers? If not, troubleshoot until you do. If so, it’s time to test!

Browse to the logonpoint you just created. If you named your logonpoint “test1” and your CAG’s FQDN is yourcag.yourdomain.com, browse to https://yourcag.yourdomain.com/lp/test1.

Do you get an SSL certificate error? Probably so. I intentionally did not cover installing a certificate because it introduces another level of complexity into the configuration. SSL Certificates are dependent on the hostname, and the hostname you use to connect to the CAG to enumerate and launch apps has to match up with the hostname on the SSL certificate. The certificate also must be signed by a trusted certificate authority. Unless you have your own certificate authority, getting a signed certificate can be a pain. Unfortunately, connecting to CAGs using untrusted SSL certs causes a lot of problems. You may encounter some of these problems or you may not. Test anyway. If you do run into problems, the good news is that the heavy lifting of configuring the CAG is done.

Now, it’s time for a moment of truth. Once you’ve gotten a login prompt, log in with an AD account that has appropriate authorization. You may need to specify the user in UPN or down-level domain format. It depends on your AD environment, but one of those should work. If you have trouble authenticating, first check the logs on the RADIUS server to make sure the denial is not occurring there. If you continue to have issues, it’s time to get acquainted with what will become your new best friend: the CAG debug log. The CAG debug log is at your service at https://yourcag.yourdomain.com/admin/d/?req=DebugLog . Watch it for any FLEXnet or STA errors.

Once you get logged in, try launching a published app. If all goes well, you should see something like…

Citrix XenApp6 0x80060016 Error In PowerShell

I ran into a little snag when executing some XenApp PowerShell commands. Certain commands like Get-XAFarm and Get-XAAdministrator would always give an “0x80060016” error. Here is an example and the fix:

PS C:\Windows\system32> Get-XAFarm
Get-XAFarm : Error reading the current administrator data (0x80060016)
At line:1 char:11
+ Get-XAFarm <<<<
+ CategoryInfo : InvalidResult: (:) [Get-XAFarm], CitrixException
+ FullyQualifiedErrorId : GetCitrixAdminType,Citrix.XenApp.Commands.GetFarmCmdlet

Typically this error code in Citrix indicates a problem with IMA. But in this case it was even simpler than that: IMA couldn’t resolve the hostname of the database server hosting the data store. Make sure that the correct DNS suffixes are being applied so IMA can find the server, and if that fails, just add it to the hosts file and try again.