Troubleshooting MTU size over IPSEC VPN

I recently deployed a couple of wireless access points to two sites that connect to our main office over IPSEC VPN. After a recent firmware update to the wireless controller both access points got stuck in a provisioning loop and appeared to have difficulty communicating with the controller. Both AP’s repeatedly disconnected due to a “heartbeats lost” error.

Connectivity between the main office and the remote sites appeared fine. Both access points were reachable via ping and ssh. I set up a packet debug on both sites’ firewalls and saw traffic going back and forth between the access points and the controller, and both access points appeared on the controller status window, alternating between “Provisioning” and “Disconnected”.

Needless to say I was slightly baffled.

I opened a ticket with the wireless vendor and (very quickly) received an answer. The MTU for CAPWAP traffic between the access points and the controller is hard set by the controller to 1500*. With these sites connected via IPSEC, that was going to cause some fragmentation due to the overhead that IPSEC was going to add onto the traffic going between sites.

I needed to lower the MTU size on the controller, but to what value? IPSEC doesn’t seem to have a ‘fixed’ header size due to the different encryption options that can be used. So how do I find out exactly how much our particular IPSEC configuration is adding?

ping -f

The -f flag from a Windows command prompt prevents an ICMP packet from being fragmented. This, combined with the -l flag allows you to set the size of the ICMP packet being sent.

So, assuming a standard ethernet MTU of 1500, and accounting for an 8-byte ICMP header, and 20-byte IP header, I should be able to send an ICMP packet sized to 1472 bytes, but 1473 should be too large:

C:\Users\netcanuck>ping 172.16.32.1 -f -l 1472

Pinging 172.16.32.1 with 1472 bytes of data:
Reply from 172.16.32.1: bytes=1472 time=3ms TTL=251
Reply from 172.16.32.1: bytes=1472 time=4ms TTL=251
Reply from 172.16.32.1: bytes=1472 time=4ms TTL=251
Reply from 172.16.32.1: bytes=1472 time=3ms TTL=251

C:\Users\netcanuck>ping 172.16.32.1 -f -l 1473

Pinging 172.16.32.1 with 1473 bytes of data:
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.

Excellent! So now to test across our IPSEC tunnel:

C:\Users\netcanuck>ping 172.16.68.1 -f -l 1472

Pinging 172.16.68.1 with 1472 bytes of data:
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.

Now this makes sense. The MTU size does not account for the IPSEC overhead.

After some testing with different packet sizes I hit on the magic number: 1384 bytes. At 1385 the packets were again rejected as being too large. So some quick math:

ICMP payload: 1384 bytes

ICMP header: 8 bytes

IP header: 20 bytes

Subtotal: 1412 bytes

This leaves 88 bytes as the IPSEC header. I should be able to set the MTU size on the controller to 1412 and the access points should resume functioning normally.

I did in fact set the MTU to 1400 – I like nice, round numbers – and sure enough both access points resumed proper communication with the controller.

What I Learned Today

Sometimes the simple tools are easy to overlook. Using a standard Windows command prompt and ping using the -f  flag is a quick and easy way to diagnose MTU and fragmentation issues across a VPN tunnel.

* It appears from the support documentation for this particular wireless vendor that the MTU size should be 1450 by default which should take into account at least some overhead and explains why these access points were working fine until now. The firmware update seems to have changed this to 1500.

The Problem With “Free”.

It’s rare to have a day go by during which I don’t hear or read about some product that a vendor is now ‘giving away’ or moving to a ‘freemium’ model. In some of the more contentious verticals in the IT industry this seems to be a key tactic for winning new customers and providing value-add for existing ones.

I’m not in marketing or sales, so I can only assume here that the premise behind these gratuitous offerings is to have new, potential customers try the product, fall in love with it, and want to then add more of that company’s products to their infrastructure. There is also a tiny voice in my head that suggests perhaps these organizations might also want their ‘free’ product to become so critical to your operation, that should they decide to charge a fee or licensing for said product at some point in the future, that you’d be forced to pay because it has become something you simply couldn’t live without.

Ultimately the short or long-term goal of offering these products doesn’t really matter. What matters is there is a very big problem with these free products:

They’re free.

They don’t generate revenue, at least directly, for the vendor providing them. This means they are, in all aspects, simply a cost center…a money sink. An expense that perhaps proves the old saying that “you have to spend money to make money”. But the real issue here for you or I as a potential user, or implementer of these products, is that it is very difficult to get any support.

Hello, Bonjour

This particular rant blog post is centered around one such product that everybody seems to be racing to give away. If you, like me, work in an environment that is moving to support the BYOD craze and have anything other than one large, flat network, then Apple’s Bonjour is probably driving you nuts and causing you to sprout gray hair, if you have any left.

Because this particular protocol and all of it’s relatives (mDNS, Zeroconf) can’t communicate across layer 3 boundaries (they have a TTL of 1) when someone on your BYOD wifi wants to talk to the Apple TV on your corporate wifi, you need something to broker that connection.  Enter the Bonjour Gateway (BG).

Aerohive was first to announce and make available their BG product in early 2012. It is built into their HiveOS on any Aerohive access point, or as a virtual machine that will run on VMware. It’s free up to 2 instances of the virtual appliance. I don’t know what the cost might be for anyone wishing to use more than 2, but I would imagine this is an opportunity to sell actual Aerohive hardware to a potential customer.

Cisco has included it as part of their Wireless Lan Controller (WLC) software beginning with version 7.4.  This isn’t free, per se, but is obviously a valuable addition for any existing customer.

Ruckus announced in January 2013 their SmartWay™ technology as “beyond bonjour bridging”, and would be available Q2. Again, this is only free in the sense that existing customers would not have to pay for the software upgrade to their existing controllers.

A quick Google search at some other vendor offerings show that pretty much everyone in the wireless space is offering support for Bonjour in some way.

I may be wrong about this but it seems to me that providing a solution for this issue in enterprise networks is/was a priority for each of these vendors. Why then has my experience with getting one of these platforms working been such a disaster?

Aerohive

If you don’t already follow Andrew von Nagy on Twitter (@revolutionwifi), you should. He is a true wifi evangelist and an excellent resource for keeping up-to-date on all things 802.11. His twitter feed was very active with the announcement of the release of Aerohive’s BG.

Working in a K-12 education environment we had already identified this as a need. Staff and students wanted to take advantage of AirPrint and AirPlay and we had to find a solution. I quickly signed up for my free Aerohive BG and HiveManager account.  Installation was easy as it comes in the form of an OVA. It’s pretty much ‘drop it into VMware’ and you are ready to go.

I had some problems with devices being able to see the AirPrint and AirPlay services across subnets. After some tinkering I decided to email Aerohive at the provided “free_bonjour_support@aerohive.com” address with my issue. That email must have ended up in the bit bucket because I received no reply.  I sent out a tweet about a week later asking @Aerohive how long one could expect to wait for support for the BG.  That too was met with silence. Two weeks later I was rather frustrated and sent out another tweet, this one a little more vitriolic:

“Going nowhere fast with Aerohive’s free bonjour gateway. Anyone have alternative suggestions? (That work)”

Now it should be noted that I’m in Canada and this tweet was sent out on November 22nd, 2012 – US Thanksgiving.

Andrew von Nagy responded via twitter and helped me out with some troubleshooting. I have to throw out a big thanks to him for taking the time on a holiday to offer some support.

On that same day, I received a reply to my original email (unsure if Andrew had anything to do with this) and began working with the online support to get the BG working.

A short 10 weeks later, I had resolved the issue (on my own) and closed the support request with Aerohive.  From the original email on November 5th to resolution on January 10th….granted there are a few holidays in there…but that’s a long time to get an issue with an initial configuration resolved.

Ruckus

Just around the same time (January 2013) I managed to get that first BG working, we received word from our current wireless vendor, Ruckus, that they too were working on a BG solution. This was direct from David Callisch, VP of Marketing for Ruckus Wireless. He even offered to let us beta test the new firmware. This is great news! Being able to implement this solution on infrastructure we already own and manage should be quick and easy, right?

It’s mid May, and we still haven’t received the beta firmware.

Also, Ruckus recently pulled their latest 9.6 firmware off their support site, so I have a feeling 9.7 and SmartWay™ are going to miss their targeted Q2 release.

“Ruckus    Wireless    has    decided    to    remove    the    9.6.0.0.264    release    for    ZoneDirector    while    we    investigate    an    issue    that    was    discovered    after    the    release.”

Aerohive Revisited

In April I received an email from Aerohive that outlined some major bug fixes and enhancements to their free BG.  While I had been able to get it working with AirPlay somewhat in my previous attempt we had never been able to get AirPrint to work properly. I hoped that this news would mean we could get both pieces to function properly.

Having deleted the VM for the original installation of Aerohive’s BG, attempted to reinstall it, only to be told that my serial # had already been activated and that I could not reactivate it.  Ok, easy fix, right?  I  fired off an email to “free_bonjour_support@aerohive.com” and explained my situation and asked if I could have a new key or the original key re-enabled.

That email went out April 19th, and I have yet to get any sort of reply.

Free Should Not Mean “free from support”

If these value-added features, or in some cases, fully ‘free’ products are meant to drive potential customers to become paying customers and/or if these products are meant to keep existing customers as loyal, long-term customers with an existing vendor, then I would expect support be as agile and attentive as it would be for any other product or offering from these same vendors.

I shouldn’t be left waiting for an email that never comes, and I certainly shouldn’t have to resort to social media shaming to get action from a vendor. Sadly it seems to be the most effective method of getting things moving, but it should be a last resort not the primary method of seeking resolution.

Perhaps I’m expecting too much from a free product or feature, and I may be misinterpreting the purpose of these add-ons as marketing/sales tools. I might be naive in believing that any truly ‘free’ product is going to become a key part of my infrastructure and solve a major technical hurdle for my users. I can only hope there is actually some sort of benevolent, beneficial reason for vendors to offer these solutions, and hope that they are able to provide some better support in the future.

Otherwise, there are truly free and open products like Avahi that are able to quickly and easily deploy mDNS service discovery options across subnets. If you know a little Linux…

Note: During the writing of this post I had been contacted by our local Aerohive rep who caught wind of a Tweet I sent out yesterday about my BG issue.  He’s managed to get me a new serial # for our BG so I can happily reinstall it and give it another go.  Social media wins again!