Cisco Catalyst Wifi, Take Two

On November 13th, Cisco announced their next-generation wireless platform with the release of the Catalyst 9800 Series Wireless Controller.

You read that right, the next WLC platform from Cisco is running on Catalyst and expands Cisco’s DNA-Center architecture into the wireless space.

The Catalyst 9800 controllers come in a variety of form factors. The option for a standalone hardware controller is still here with the 9800-40 and 9800-80, or the 9800 series can be run as a VM in a private or public cloud. A third option is now to run embedded wireless on the Catalyst 9k series switches.

Embedded wireless controllers on Catalyst switches…that sounds familiar, doesn’t it?

Cisco made a similar move a few years ago with an architecture called Converged Access. This embedded the wireless controller functionality into IOS XE on the 3650 and 3850 access switches. For various reasons, it did not live up to expectations, and Cisco killed it in IOS XE Everest 16.5.1a in late 2017.

Cisco and Aironet

Cisco acquired Aironet Wireless Communications in 1999 for $799M. Since then, Cisco wireless access points have generally been referred to as “Aironet” products by name. This includes the software that runs on the wireless controllers and access points, AireOS.

AireOS came from Cisco’s acquisition of Airespace in 2005. Airespace were the developers of the AP/Controller model and the Lightweight Access Point Protocol (LWAPP), which was the precursor to CAPWAP.

(Credit to Jake Snyder for correcting me on the origins of AireOS)

Whatever AireOS version is running on your wireless controller is the same that you have on your access points. Cisco has developed the platform to be what it is today, and very little of it remains what was once the original AireOS.

With this iteration, or rather re-invention of the Wireless Controller, Cisco have highlighted three key improvements to their predecessor wireless software.

Always-On

Controller redundancy is always critical to prevent downtime in the event of failure. Here, Cisco are touting stateful switch over with an active standby model in which client state is maintained across the standby controller, offering no downtime for clients in the event of a failure.

Patches and minor software updates now will not change the base image of the controller. Updates can be done without client downtime. Patches for specific AP models can be done without affecting the base image or other access point models with per-AP device packs. These are installed to the controller and then pushed only to the model of AP they are for.

New AP models can also be joined to the controller without impact to the overall base image with the AP device packs, allowing new hardware to join an existing environment without a major upgrade.

Citing “no disruption” base image/version upgrades, the new 9800 controllers can be updated independently of the access points, whereas previously the software version running on the controller and access points was coupled. Upgrades were done to the controller, and then pushed to the access points. More often than not, this resulted in interruption to clients on affected access points, some rebooting of the controller and AP’s was inevitable, and quite often, some orphaned access points that never quite upgraded properly or failed to rejoin the controller.

Cisco have made many improvements to the upgrade process over the years, including staged firmware upgrades, however in large wireless deployments, firmware upgrades would not generally be considered zero-downtime.

With the new controller architecture using an RF-based intelligent rolling upgrade process, Cisco has aimed at eliminating some of these issues. During the upgrade process, the standby or secondary controller is first upgraded to the new image. You can then specify a percentage of access points you would like upgraded at once (5%-25%), and the controller then determines which AP’s should be upgraded using the AP neighbor information and # of clients on each AP. APs with no clients are upgraded first. Clients on access points that are to be upgraded are steered toward neighboring access points in order to prevent interruption in service.

The idea of steering clients to other access points or 5Ghz radios instead of 2.4Ghz radios isn’t new, and because I’m not a wireless expert I won’t comment on exactly how it’s done, but it is my understanding that it is difficult to guarantee that the client will “listen” to the steering mechanism. I feel even with this intelligent RF behind this upgrade process, some clients will inevitably experience a loss of connectivity during the upgrade process.

Once the access point is upgraded, it then joins the already-upgraded controller, and resumes servicing clients.

After all access points are joined to the upgraded controller, the primary controller begins its upgrade process.

Secure

Encrypted Traffic Analytics was first announced as part of the Catalyst 9K switch launch, and uses advanced analytics and Cisco Stealthwatch to detect malware in encrypted flows, without the need for SSL decryption. ETA is now available for wireless traffic on the 9800 platform, if deployed in a centralized model, meaning all wireless traffic is tunneled back to the controller.

This is a great feature considering the only other option for gaining visibility into encrypted traffic is usually some form of sketchy certificate man-in-the-middle voodoo. In many situations this works okay for corporate domain-joined machines as here you can control the certificate trusts, but if you provide wireless to any BYOD devices or to the general public in any way, this often results in people not using your wireless because of certificate issues.

Deploy Anywhere

Cisco is offering a lot of flexibility in deployment options for this new wireless controller.

Branch offices can look at the embedded software controller on Catalyst 9K switches for up to 200 APs, and 4K clients.

Edit: Since the original publication of this post, I’ve clarified that the option to run the 9800 controller on a Catalyst 9K switch is only available as an SD-Access Fabric Mode deployment option. SD-Access requires DNA Center. This is an expensive proposition for what could truly have been a fantastic option for small/medium branch office deployments.

Private or public cloud options are available on KVM, VMware, Cisco ENCS, and will be available on AWS. These options support 1000, 3000, and up to 6000 APs, and 10K, 32K, and 64K clients. The AWS public cloud option only supports FlexConnect deployment models, which makes sense as tunneling all client traffic back to your controller in this case would get expensive quickly.

Physical appliance options include the 9800-40 at 2000 APs, 32K clients and 40Gbps (4x10Gbps interfaces), as well as the 9800-80 at 6000 APs, 64K clients, and 80Gbps (8x10Gbps interfaces). The 9800-80 also has a modular option which allows for GigE, 10GigE, 40GigE, and 100GigE uplinks.

Each of these options have identical setup, configuration, management, and features.

Lessons Learned?

Overall, the presentation of this new wireless platform seems solid. Cisco have acknowledged the problems with Converged Access, and have seem to have checked off all of the missing boxes from that first attempt. Feature parity was a big one, and Cisco insists here that all features will be the same up to the existing controller software version 8.8 (current version is 8.5 at the time if this post), so that would give Cisco and their customers quite a bit of time to flesh out the new architecture.

Now, AireOS isn’t going to disappear suddenly. Cisco have said that they are going to continue to develop and support the existing line of controllers and AireOS software, until they can be sure that this new architecture has been successfully adapted by their customers. Customers who previously bought into Converged Access may not be lining up to be the first customers to try out the new platform, but the popularity of the Catalyst 9K switches should provide a good foundation for the embedded controller to gain a foothold.

You can check out Cisco’s presentation at Networking Field Day 19 here:

 

Forward Thinkers, Forward Networks.

Maintenance windows. Let’s be honest, they suck. If you ask any network admin they will likely tell you the midnight maintenance windows are their least favorite part of the job. They are a necessity due to the very nature of what we do, which is build, operate and maintain large, complex networks, because any changes that are made can have far-reaching, and often unpredictable impact. Impact to production systems that we must avoid whenever possible. So, we schedule downtime and amp up our caffeine intake for an evening of changes and testing whatever we may have broken.

No matter how meticulous you are in your planning, no matter how well you know the subtle intricacies of your environment, something, somewhere is going to go wrong. Even if you are one of the lucky few to have a lab environment in which to test changes, it’s often not even close to the scale of your actual network.

But, what if you had a completely accurate, full-scale model of your network, and could test those changes without having to risk your production network? A break/fix playground that would allow you to vet any changes you needed to make, which would in turn, allow you the peace of mind of shorter, smoother maintenance windows, or perhaps (GASP!) no maintenance windows at all?

Go ahead, break it.

That’s what Forward Networks’ co-founders David Erickson and Brandon Heller want you to do within their Forward Platform, as they bring about a new product category they call Network Assurance:

“Reducing the complexity of networks while eliminating the human error, misconfiguration, and policy violations that lead to outages.”

At Network Field Day 13, only a few days after Forward Networks came out of stealth, we had the privilege of hearing, for the first time, exactly who and what Forward Networks was, and how their product would “accelerate an industry-wide transition toward networks with greater flexibility, agility, and automation, driven by a new generation of network control software.”

David Erickson, CEO and co-founder, spoke to how they have recognized that modern networks are complex, made up of hundreds if not thousands of devices, are often heterogeneous, and can contain millions of lines of configuration, rules, and policy. The tools we have to manage these networks are outdated (ping, traceroute, SNMP, etc.) and the time spent as a network admin going through the configuration of these devices looking for problems is overwhelming at times. As a result, a significant portion of outages in today’s networks are caused by simple human error, which has far-reaching impact to business, and brand.

This is not a simulation or emulated model of your network, but a full-scale replica, in software, that you can use to review, verify and test against, without risk to production systems. The algorithm they use claims to trace through every port in your network to determine where every possible packet could go within the network as it is presently configured. The “all packet”.

Applications

The three applications that were demonstrated for us were Search, Verify, and Predict.

Search – think “Google” for your network. Search devices and behavior within and interactive topology.

Verify – See if your network is doing what you think it should be doing. All policy is applied with some intent, is your intent being met?

Predict – When you identify the need for a change, how can you be sure the change you make will work? How do you know that change won’t break something else? Test your proposed changes against the copy of your network and see exactly what the impacts will be.

Forward Search

Brandon Heller offered an in-depth demo of these tools, beginning with Search. Looking at a visual overview of the demo network, he was able to query in very simple terms for specific traffic. In this case traffic from the Internet, to his web servers. In a split second, Search zoomed in on a subset of the network topology, showing exactly where this traffic would flow. Diving further into the results, each device would then show the rules or configuration that allowed this traffic across the device in an intuitive step-through menu that traced the specified path through the entire network, and highlighted the relevant configuration or code.

This was all done in a few seconds, on a heterogeneous topology of Juniper, Arista, and Cisco devices.

Normally, tracing the path through the network would require a network admin, with knowledge of each of those vendors, to manually test with tools like ping and traceroute, and also comb through each configuration device-by-device along the path he or she thought was the correct one, in order to verify the traffic was flowing properly.

The response time on the queries was snappy,  and Brandon explained this was due to the fact that, like a search engine, everything about the network was indexed ahead of time, making queries almost instantaneous.

Forward Verify

It’s one thing to understand how your network should behave, and another to be able to test and confirm this behavior. Forward Verify has two ways of doing this. The first is a library of predefined checks that identify common configuration errors. Things like duplex consistency, etc. that are fairly common, yet easy to miss configuration errors.

The second is with network-specific policy checks. Here once again, a simple to understand intuitive query verified that bidirectional traffic to and from the Internet could get to the web servers over via http and ssh.

When there is a failure, a link is provided which allows you to drill down into the pertinent devices and their configuration and see where your policy check is failing.

Forward Predict

When a problem is identified or a change to the network configuration is necessary, Forward Predict is the final tool in the suite, and in my opinion, the most important one, as it allows you to test a change against your modeled network to see what impact it will have. This is huge, as typically changes are planned, implemented and then tested in a production environment in a change or maintenance window.

Forward Predict, while it may not eliminate the need for proper planning and implementation, allows you to build and test configuration changes in what is essentially a fully duplicated sandbox model of your exact environment. This is going to make those change windows a lot less painful as you already know what the outcome will be, rather than troubleshooting problems that weren’t anticipated when the changes were planned.

Moving “Forward”

A common sentiment among NFD delegates during this presentation was that Forward Networks’ product did some amazing things, however we wondered if there was an opportunity here to move this product one step further and have it actually implement or make the changes to the network, after the changes have been vetted by Forward Predict.

Forward Adjust, perhaps?

Understandably, this is going to involve a lot of testing, especially in light of the fact that Forward is completely vendor-neutral and touts the ability to work with complex, mixed environments. Making changes in those types of environments adds a lot of responsibility to this platform, and with that comes risk. Risk that most engineers might be a little skeptical to entrust to a single platform.

Time will tell, and I look forward to hearing more about Forward Networks’ development over the upcoming months, and see where the Network Assurance platform takes us.

Check out the entire presentation over at Tech Field Day, including a fantastic demonstration from Behram Mistree on how Forward Verify can help mitigate and diagnose outages in complex, highly resilient networks.