Featured

Meraki MT, MQTT, Node-Red & Home Assistant Auto Discovery

NOTE (6-7-22): Please download the flows.json file again if you did prior to 10AM June 7th. The original one was not correct and will possibly break your node-red install. Thanks to Rohan Karamandi for catching it for me.

NOTE (6-9-22): Thanks to the help of Rohan Karamandi, the flows have been updated to now create device entities in Home Assistant! You can now see each MT and all its associated telemetry, as well as a link directly to it in dashboard!

Alex’s summary and musings

To preface all of the following blog, I am NOT a programmer.

Hey folks! I have been absent for quite a while trying to get s*** (stuff) done. One of my side projects that never ends has been home automation and learning how to tie APIs and telemetry together through fun things like Node-Red and Home Assistant. As you may be aware, Meraki released the MT sensor product line a year or two ago, and added MQTT support for low latency telemetry streaming. MQTT is incredibly fun to work with, mostly due to the simplicity of the protocol and how much potential it has to change how you work with devices, services, telemetry, etc from many different sources. That being said, one thing I’ve always wanted a reason to learn/figure out, has been Home Assistant’s auto-discovery functionality for MQTT devices.


TL;DR If you dont want to read my take on HA / Node-Red etc., please skip the next section or so to where I have outlined the required packages, addons etc, and node-red flows to get going. >> The goal of this project was to figure out how can I automate onboarding of MT Sensors, and the data sent through MQTT into Home Assistant without having to do much manual work after the initial configuration.

A Quick-ish Overview of the Technologies Used


Home Assistant

Home Assistant is an open source, community driven Home Automation, IoT, etc platform that allows you to bridge together completely disparate platforms, systems, protocols, and solutions into a single pane-of-glass that is able to be installed on something as small as a Raspberry Pi, to running on old PCs or even as a Virtual Machine.

I have run Home Assistant on Intel NUCs, Ultra-small form factor PCs, VMs, and more, as I have upgraded, got bored, or in general wanted to try something new over the last 7 or so years. I have used Home Assistant to integrate Z-Wave, Zigbee, Wifi devices, as well as Cloud APIs together to allow for control, visualization, and automation in my house, and even in a few business settings to provide a simple functional front-end to interact with.
To get started with HA (home assistant) check out this link here:
https://www.home-assistant.io/getting-started/


Node-Red

Node-RED is a programming tool for wiring together hardware devices, APIs and online services in new and interesting ways. It provides a browser-based editor that makes it easy to wire together flows using the wide range of nodes in the palette that can be deployed to its runtime in a single-click. I have used node-red to learn how to program things that I never imagined was even possible including:

  • Creating a busy light based on my Cisco Webex Teams status, Cell-Phone call activity, and Computer’s camera and microphone state to light up a panel for my family to know not to barge into my office without knocking (however, news flash, they don’t care).
  • Consolidate various sensors around my house from DIY ESP devices, to Smartthings, to Meraki MT and MV cameras, and turn them into entities for Home Assistant to graph things like Temperature, Humidity, Noise Levels, Car/Human detection, Air Quality, Door open/close, Wet/Dry status etc. as well as send me alerts through Telegram, Email, Discord, Slack, and more!
  • Endless other things like telegrams bots to yell at you when you try and talk to them, which was an oddly interesting exercise in learning and entertainment.

Meraki MT & MQTT

Instead of getting too wordy, I would highly recommend the following link to review how MT and MQTT work together:

https://documentation.meraki.com/MT/MT_General_Articles/MT_MQTT_Setup_Guide

If you can wade through the “marketing” for useful content, there is a summary of MT somewhere in here: https://meraki.cisco.com/products/sensors/

My personal take:

Meraki MT is incredibly interesting. It effectively is the careful combination of the Meraki Dashboard, Cameras, and Access Points, with purpose built BLE IoT sensors, that provides you an incredibly easy to implement, manage, and consume, IoT platform that keeps expanding every year. Since Day 1 of getting to test these little sensors, I have been enamored with the ease of deployment and data consumption models. Initially the telemetry was only available in the dashboard and API, but has expanded to using local MQTT streaming from the BLE Gateways (MV Cameras or MR Access points) directly to your MQTT broker. This additional functionality is what spawned this entire project and blog post.


The Project Starts Here

Technology Used:

  • Home Assistant in Home-Assistant OS install mode
  • HA Add-on for Mosquitto MQTT Broker
  • HA Add-on for Node-Red for subscribing, re-writing, and publishing topics and information
  • Meraki MR/MV as an MT Gateway and MQTT publisher
  • Meraki MT Sensors (of all sorts) for measuring all the things

Topology Diagram:

Home Assistant Setup

In my installation of Home Assistant, I am using the completely supervised Home Assistant OS installed on an ultra-small form factor PC. The reason for using Home Assistant OS is for the awesome add-on store that makes integrating things like MQTT Brokers and Node-Red dead simple.


If you have not added the Node-Red and the Mosquitto MQTT Broker go to:

Select Configuration
Select Add-ons, Backups & Supervisor
Install the Mosquitto MQTT Broker Add-On:

Once Installed walk through the configuration documentation and then start the container/add-on. Home Assistant creates its own user account, however make sure you configure user accounts for Node-Red, and MT via the configuration page of the add-on like so:


I suggest configuring a simple permissive topic ACL while you are learning as outlined in the documentation. For simplicity sake I have my development environment’s Topic ACL users set to allow all like so (note: this is not necessarily recommended for your day to day / production use):

user homeassistant
topic readwrite #

user mt
topic readwrite #

user node-red
topic readwrite #


Next install the Node-Red Add-On package:

Once installed go through the configurations and start the container/add-on.

Now in Home Assistant we need to configure the MQTT integration, so go to:

Devices & Services
Search for MQTT
Configure the MQTT settings. Note: they may be prepopulated with the broker name of the add-on, and also uses a homeassistant user account to connect to mosquitto.

Make sure that discovery is enabled.

This is key to making this integration successful. This allows Home Assistant to listen for specific topic formats in MQTT that tell Home Assistant to import the defined sensor/entity into the configuration automatically. For more information on how that works (and what I used to build the node-red flows) please check out: https://www.home-assistant.io/docs/mqtt/discovery/


HACS Install (Optional but useful)

There is a third party store of integrations and useful themes etc you can add to Home Assistant through the Home Assistant Community Store. To use some of the additional things I will call out at the end of the post, install HACS through this guide: https://hacs.xyz/docs/setup/download


Configuring Dashboard to publish to your MQTT broker

Meraki Dashboard has a few places you can configure MQTT servers to support publishing topics to a broker.

Start here and select Environmental
Select Overview
Select MQTT Brokers
Select Edit brokers
Select New MQTT broker
Click Select to start the MQTT process
Here you can see the topics that are going to be sent to the broker

That sums up the process to enable MQTT messaging in dashboard for Meraki MT. One thing to note is the source of the MQTT updates will be any supported MR (access point) or MV (camera) that is in the same network in dashboard as the MT sensors. This means there wont be a guaranteed IP address as the source for every message, but to an MQTT broker will not affect the message publishing.

Configuring the Meraki Dashboard for APIs

Once we have enabled MQTT, we will need to grab your User’s API Key so that we can resolve the names of the sensors in the node-red flow. Please follow this document here and set up your Organization for APIs and generate your API key:
https://documentation.meraki.com/General_Administration/Other_Topics/Cisco_Meraki_Dashboard_API


What does MQTT look like on the network?

Since I am a network engineer at heart, lets take a look at a packet capture of the traffic over the network:

  1. Capture at your broker to see all possible messages being sent.
  2. Filter the traffic based on the TCP port. For example: tcp.port == 1883
After applying the filter we can see the messages going to and from the broker

Wireshark by default does not decode the MQTT payload, so to see the information being published:

Right click the message and go to Protocol Preferences > MQ Telemetry Transport Protocol preferences…
Select show Messages as text

Now you can filter based on message or topic contents. For example: tcp.port == 1883 && mqtt.topic contains “meraki/v1/mt”

As you can see here the filter gets us just the messages about MT

Time for some Node-Red action!

Node Red in this case will be attaching to the MQTT broker and subscribing to the MT topics using a wildcard. This could be constrained further to a more specific topic, but for the purpose of the integration we want to capture all topics and data. Below I am providing the node-red flows that we will use to pull the information from dashboard on the sensor names, and then convert the topics to discoverable topics in Home Assistant. One thing we will need to do first is add a few npm packages that are part of the flow including:

  1. node-red-contrib-string (Used to make parsing and cleanup of the topic easy)
  2. node-red-contrib-secret (Used to protect the Meraki API Key)

To add these navigate to:

The sandwich button
Manage Palette lets you add and remove packages
Install Tab

Once in the install tab search for the package names above and install. Once installed the nodes will be available on the side bar like so:

Secret node
String node

Importing the flows into Node-Red

Download the below ZIP file. It contains a text file with the flows in JSON format. DO NOT open in the native MacOS text editor. I suggest downloading Atom, Sublime, or Visual Studio Code as those editors will not replace back ticks and quotations with other characters like TextEdit does.

Now to import this flow named MT3.0 (you can rename it) Go to the sandwich menu in the top right > Import:

The next page will bring you to an import page where you can import the file or paste the contents of the file and click Import:

While unlikely, If you have any nodes named the same with the same config, there may be a window that pops up asking you if you still want to import. I suggest reviewing and selecting to import the flows:

In this case I re-used the HTTPS node I created

The next section we will be editing some of the nodes so that we can build the node mac/serial to name database, and attach to the MQTT instance in Home Assistant.


Editing the Global key store flow in Node-Red

The global key store flow is used to create a table in memory of the Meraki device MAC and serial to name mappings, so that our nodes in Home Assistant have their dashboard names instead of their MAC addresses. This is not at all required but is an inherent part of both the MQTT flows I will walk through later in the post, and this flow here. It does make it so you can name, rename, etc in dashboard, and have it reflect in Home Assistant.

First double click on the “Set the Meraki Org_ID” node and put in your Meraki Organization ID in the variable.

If you are unfamiliar with how to gather this please see: https://developer.cisco.com/meraki/api-v1/#!getting-started/find-your-organization-id

Next double click on the “apiKey” and input your Meraki API Key that has at least Org read privileges.

Optional Validation Node

Double click this node and in the variable, populate a MAC Address of one of your sensors (or any other device in your Meraki organization with a name).

This will populate a field in the debug node called “Debug the response” of “globalvartest” as you can see here:

If this does not populate, there is something wrong with the MAC Address inserted in the flow, if the whole call fails and gives a statusCode other than 200, more than likely the API Key, or Org ID is incorrect.

If you drop the payload portion of the debug message down, you will see the list of all devices in the response. If you open the function node titled “Create Global Key Store” and review it, the small function iterates through the payload array, takes the MAC Address, the serial, and then maps them to global variables with the name configured in dashboard.

Now for the Flow to onboard all the things!

First things first, lets edit the MQTT configurations to map to your Mosquitto instance:

Starting with the left MT Topics go in and edit the configuration:

Change the server IP address, then go to the security tab and input your node-red user/pass

This should update the rest of the MQTT nodes, if it does not, double click on each of them and make sure the HASS MQTT server configuration is correct. There are multiple nodes to try and ensure that we set retain on specific information needed in the Mosquitto db while on others not storing the readings as they do consume resources.

From here you SHOULD be good to go. However, please read through the nodes and make sure you understand them. I will more than likely iterate on this blog post as time goes on. For a break-down of the nodes please continue reading!


Explaining the flow in a little bit of detail:

Section 1:

First Part of the Flow

The first node is just the MQTT subscription node. Where we import messages from MQTT into the flow based on the topic we matched:

the topic we match has a wildcard after MT to match all topics from all networks

The next node is used to configure a couple variables we re-use later in the flow for auto discovery.

The string node is used to grab the sensor_mac from the topic for further discrimination
The “Look up Name” node takes the sensor_mac we isolated, and looks up the name from the global database
The last node in this portion of the flow creates a sensor_mac_clean variable that is used for Home Assistant Discovery topic creation

Section 2 (the beefy part):

This section is where we sort the topics, then output them to specific nodes to create topics Home Assistant will be able to discover and add to your installation.

1. Sort Topics:

This node just looks at the topic and sorts based on the topic contents for each type of telemetry

2. Rate Limit:

We take each topic and rate limit them in case there is a flood per topic

3. Door node:

Here is where we set up the contents to create the topic. The msg.payload is incredibly important and formulates the discovery data for Home Assistant
{"name": "MT-" & msg.sensor_name & "-" & msg.sensor_type ,"unique_id" : msg.sensor_mac_clean & "-" & msg.sensor_type, "device_class": msg.sensor_type , "payload_on": true, "payload_off": false, "value_template": "{{value_json.open}}"}

If you check the contents here, we are informing home assistant that this is a binary sensor, and what the sensor values are for on and off, aka open and closed.

4. Binary_Sensor node:

This node builds the topic for home assistant discovery from the data built into the message from earlier in the flow
5. Humidity Node:
This node is a standard sensor node capturing humidity, and is much simpler in configuration as the information is less structured than binary sensors and is then piped into a sensor function node like the binary_node

6. Stop the LOOP:

Use this to drop the already reformatted messages

7. PM25 Node:

This node is a little different due to an issue I have found with auto discovery topics and containing periods.

Due to the default structure of the PM2.5 payload with MTs, Home Assistant tries to filter the payload as a nested JSON key and does not properly onboard the MT sensor data. To combat this, I built a flow to rewrite the payload data. As seen here, the PM25 node looks fairly close to the others, but as we proceed through the flow, I change the topic to PM2.5MassConcentration-reformat so that we can differentiate the source of the msg being either direct from the Meraki environment into MQTT or from node-red, then process it through the next section.

8. The PM2.5 fix

Here we shift the values around to reformat the data payload that comes through, to have the payload key of pm25 instead of PM2.5MassConcentration. This allows home assistant to write the values to the db correctly, instead of reading it as {“PM2” : {“5MassConcentration”: “$value”}}.

9. Retain MQTT output:

This is just an MQTT output node where we set the message to be retained in the db.

10. MSG debug nodes:

Debug nodes are how you learn, and figure out where you messed up, or what data you receive at different points in your flow. You will notice a fair number of debug nodes configured but disabled in this flow. If you enable one, do so by clicking the button to the right of the green nodes. You dont have to deploy the flow to enable it, however if you want the flow to start with it enabled if you reload node-red, deploy the configuration. Example:


OPTIONAL: HACS Add-Ons and Dashboard

HACS allows for developers to create themes, new lovelace cards, and even new integrations (like wyze) without having to have them merged directly into the HA code-base. There are 2 specific HACS add-ons I suggest installing so that you can import my HA Dashboard that auto populates MT as they are added to the Meraki network.

Here is what it looks like:

To install HACS: https://hacs.xyz/docs/setup/download

Once installed, navigate to the HACS tab in Home Assistant:

Next we head to FrontEnd:

Install the packages:

Auto-Entities:

Then install Mini-Graph-Card:

Once these are installed, you will need to reboot your Home Assistant instance to merge the new front end packages.


Import the Lovelace YAML for MT Auto Graphs

importing this dashboard is fairly simple. The process is:

  • Download the zip file with the .yaml inside
  • Open and copy the yaml contents
  • Navigate to the top right > Edit Dashboard:
  • Click the 3 dot menu again and select Manage dashboards:
  • On the bottom right then select +Add Dashboard
  • Call the Dashboard whatever you would like:
  • Select the new Dashboard on the side bar:
  • Start with an empty Dashboard and “Take Control”
  • Top Right select Raw Configuration Editor
  • Select all and paste the contents of the file over anything in the window
  • Click Save and close the editor

BOOM! You’re Done!


Conclusion

This has been a lengthy blog post. I wanted to ensure I explained as much of this as I could and hope that if you have any feedback, find any errors, or want more info that you leave a message in the comment area below!


Useful Resources

https://hasspodcast.io/ – Listen to Rohan Karamandi and Phil Hawthorne talk about new things and fun stuff with Home Assistant
https://www.home-assistant.io/docs/mqtt/ – Home Assistant MQTT overview and integration info
https://nodered.org/docs/tutorials/first-flow – Node-Red getting started
https://www.home-assistant.io/ – Home Assistant’s main web page with a plethora of information
https://www.youtube.com/c/NodeREDStudio – Node-Red’s youtube channel with great videos

Featured

Whoa.. where has the time gone?

It has been a while since I have added much to my blog. I would like to happily blame my role within Cisco Systems for this. I moved into product management about 2 years ago and have found it difficult to write on my own AND for Cisco at the same time. While I have not been on here to add content, I have been working behind the scenes in Cisco’s Meraki business unit on a number of new features, one of which has taken a lot of my time called “Adaptive Policy”.

To see a little bit of the (mostly Adaptive Policy) things I’ve worked on lately please check out the following links:

Cisco Live US 2019 Breakout Session: https://www.ciscolive.com/global/on-demand-library.html?search=brkcrs-2105#/session/1541700263402001nfc7

Meraki Webinar on Adaptive Policy: https://blubrry.com/meraki_unboxed/60508862/episode-25-smarter-security-policies-for-a-dynamic-network/

Meraki Unboxed Podcast: https://blubrry.com/meraki_unboxed/60508862/episode-25-smarter-security-policies-for-a-dynamic-network/

Cisco Champion Radio – Adaptive Policy: https://soundcloud.com/user-327105904/s7e31-meraki-adaptive-policy?_lrsc=7193dbbe-b484-46d9-9daf-cac7f94c292b&dtid=osolin001080

Cisco Live US 2020 Breakout Session: https://www.ciscolive.com/global/on-demand-library.html?search.featured=cloNewReleases&search=2100#/session/1570157033023001TZGP

Adaptive Policy Documentation (co-wrote this with a peer including all follow-up docs): https://documentation.meraki.com/zGeneral_Administration/Cross-Platform_Content/Adaptive_Policy/Adaptive_Policy_Overview

Adaptive Policy Videos on Cisco Meraki Knowledge Youtube Channel: https://www.youtube.com/playlist?list=PL9tG2WCbXnfaGu4uHDofpP_QpzMhkvb9D

I do plan to hopefully continue recording videos and writing, but it may be more just me linking to official documentation and content here. Let me know your thoughts in the comments and thanks for visiting my page!

Meraki Auto RF Explained

Meraki loves to chalk up the secret sauce in their products to “Meraki Magic” and boasts “anyone can do it”. Yet our inner engineering geek wants to open the curtain and see the real show. An example of that is Auto RF, which is a form of Radio Resource Management (RRM) that allows Meraki Wi-Fi access points to dynamically plan WLAN channels and radio transmit (TX) power. The following sections will break down what Auto RF is and how it works.

Auto RF is made up of two major components: Auto Channel and Auto TX Power. The goal is to provide an initial channel plan, and then adjust dynamically over time based on the environment. Both features are enabled by default, reducing the number of steps required to deploy Meraki access points effectively.

All currently shipping Meraki access points are built with a dedicated 2.4GHz/5GHz scanning radio, which constantly scans the entire usable spectrum. This radio, among other things such as location analytics and WIPS, is used to detect neighboring BSS’s and make off-channel scans without consuming airtime on client-serving radios. The scanning radio dwells on every channel to monitor duty cycle and detect levels of non-802.11 interference. It also sends probes on non-DFS channels to detect neighboring BSS’s and listens for beacons on all channels.

AUTO CHANNEL

The current iteration of Auto Channel comes from an algorithm called TurboCA. Auto Channel is designed to react to degrading conditions while balancing client performance against the disruptiveness of changing channels. Fortunately, with the 802.11-2012 standard we have better adoption of 802.11h, which defines standardized Channel Switch Announcements (CSAs) that reduce the impact of moving to a new channel by notifying clients when they will change channels and what channel they are moving to, so that clients can follow. Auto Channel relies on this heavily where possible, but also takes into account that many clients do not support CSAs, so it tries not to change channels frequently unless necessary.

Channel Switch Announcements

How does it work?

You can refer to the above linked TurboCA article for the full mathematical detail, but this section will summarize the process.

The goal of Auto Channel is to build a channel plan that minimizes channel overlap, optimizes cell sizes for better roaming, and maximizes channel efficiency by picking the best channel available for each AP. Then, it regularly rebuilds the plan in search of an optimization. The computation for the channel plan happens in the Meraki Cloud, where all Meraki access points report their logging data.

The key metrics for the algorithm are:

  • Node (AP) Performance
  • Network (AP Set) Performance
  • Channel Quality (noise floor, non-802.11 interference, neighboring BSS’s, etc.)
  • Channel Width
  • AP Load (number of associated clients)
  • Channel Switch Penalty
  • Hop Limit

Node performance is a calculation of how well an access point should perform on a given channel and channel width. Network performance is the product of the performance of all nodes in a Meraki Network, which is important as an individual node score close to zero will bring down the Network performance score, ensuring that a channel plan will not create issues for one area while optimizing another.

One bad node performance can rule out a channel plan
(arbitrary numbers used for examples)

Channel quality measures non-802.11 interference, duty cycle, and channel width. Channel switch penalty is a metric designed to reduce the number of channel changes for negligible benefit to reduce negative impact impact, and is weighted heavier on 2.4GHz where fewer clients support CSAs.

Hop Limit is used to determine how many neighboring APs we will consider when planning an AP’s channel. This basically determines the “aggressiveness” of the calculation. Meraki runs this calculation at 3 different intervals with different hop limits:

  • Every 15 minutes with a hop limit of “0”.
  • Every 3 hours with a hop limit of “1”, then “0”.
  • Every 24 hours with a hop limit of “2”, then “1”, then “0”

With a hop limit of “0” an AP only considers itself and directly neighboring APs when planning its channel. By running this more frequently, an AP can react to significant events quickly (such as a jammed channel) but won’t change channels too frequently. The more aggressive plans are run less frequently to balance creating a more globally optimal plan vs. changing channels too frequently.

The Auto RF Process

The process starts by inputting the current channel plan (if one exists), and collecting the scanning results and load information from each AP. It then picks a pseudo-random AP and identifies the channel that will render the best “node performance” for that AP. This selection favors picking heavier loaded APs first, as more clients actively connected to an AP signals that it is more important. By doing this, more actively used APs will have a better chance at picking the best channel available rather than running last and taking whatever channels are left.

After all APs have been assigned a channel and the cloud has calculated the predicted node performance for each, the network performance of the plan is determined. If the network performance of the new plan is better than the current plan, the new plan becomes the proposed plan. The algorithm is run ten times to compare multiple possible configurations. Once all iterations are run, the proposed plan becomes the current plan, and updated APs will change their channels accordingly.

Note that previous iterations of Auto Channel (before 802.11h) would not switch channels if a client was currently associated. Because Meraki now changes channels while clients are associated, it can lead to disruptions with clients using real-time applications that don’t support CSAs. If this is causing a negative impact, Meraki Support can revert this behavior upon request.

Exceptions

As with all things RF-related, an automatic algorithm will not fit every environment. In challenging RF environments, or high density deployments, Auto Channel can fall short. In these scenarios, manual channel assignment may be a better option, but Auto Channel can still be used as a starting point to reduce the amount of manual configuration required.

Static channel assignment

APs with a static channel assignment will be used in the plan to identify a used channel, but will be used in the algorithm to generate a plan or calculate network performance.

DFS events will always override a static or auto channel plan and trigger an immediate channel change, as required by the FCC.

If a jammed channel is detected, meaning that levels of non-802.11 interference exceed 65% for longer than one minute, a channel change will occur without waiting for the next run of Auto Channel.

If an AP is being used for wireless mesh, it will not change channels as this will have a significant impact on all APs and clients using that mesh route.

AUTO TX POWER

As with Auto Channel, Auto TX Power calculations are done in the Cloud, and the process is run every twenty minutes. A neighbor report is collected from each AP in the network, which contains the Signal-to-Noise Ratio (SNR) for all neighboring APs in the same Meraki network. The AP also reports its currently connected clients along with their SNR.

Using these lists, the Cloud compiles a list of “direct neighbors” for each AP (defined as any AP in the Meraki network with an SNR of 8dB or greater), and calculates what the ideal TX power should be. For each AP, the Cloud attempts to keep the SNR for its strongest direct neighbor at 30dB and always higher than 17dB for every direct neighbor.

An AP will never reduce its transmit power if a client is connected with SNR <10dB. Generally, if a client is connected with SNR <10dB it is looking for a better AP to roam to. If it hasn’t roamed, it can be assumed that a better AP is not available, so reducing the transmit power will only worsen that client’s performance.

To prevent dramatic changes in TX power which could have unintended results, at each twenty minute run an AP can increase its transmit power by 1dB or lower by 1-3dB. When a new Meraki AP is deployed, it starts at the highest transmit power supported by AP within the regulatory domain of which it is a member, unless overridden by an RF Profile or otherwise statically configured. This means that it could take several iterations before an AP reaches its optimal transmit power level.

RF Profiles can be used to define operating parameters for Auto RF

EXCEPTIONS

Auto TX Power will never set the transmit power lower than 5dBm on the 2.4GHz radios or 8dBm on the 5GHz radios to avoid setting a value which is unusably low where there are a high density of APs. There are valid use cases, such as when using directional antennas or in challenging RF environments, where such a low value is warranted. These environments usually require manual tuning anyway, in which case static values can be set in the Dashboard.

Static transmit power assignment

If an AP has an active mesh neighbor, it will not increase or decrease its transmit power. When using mesh, if an AP has no client serving SSIDs enabled it will always use its maximum available transmit power.

Active mesh prevents transmit power changes

If an AP only has one direct neighbor, it’s considered risky to reduce transmit power so it’s not done as often.

Monitoring

The Meraki Dashboard allows for several tools to monitor the current channel plan and any changes that have been made by Auto RF.

The Wireless > Radio Settings page allows you to identify the current channel and transmit power being used by each AP, as well as the target power range that Auto TX Power is using:

Radio Settings

Clicking on any AP takes you to the Status Page, where the RF tab displays a lot of information about client count, channel utilization, and any changes made to the access point by Auto RF. In the below screenshot, we can see that Auto TX power has adjusted the transmit power, and clicking Details will show exactly what was changed:

RF Tab in the Status Page
Transmit power was increased from 8dBm to 9dBm on the 5GHz radio

Summary

As you can see, there’s a lot more to Auto RF than is evident at first glance. Meraki leverages the analytics of the Dashboard and the metadata from millions of access points to create and refine these algorithms so that less time and effort needs to be spent tuning and tweaking configuration during deployment.

Designing Wi-Fi for High Density

In technical interviews, I often ask (and am often asked):
How would you design a Wi-Fi network to support a large room with 1000 devices?

The question is purposely vague to identify how someone thinks through a problem that doesn’t have a single answer, and to observe how thoroughly they respond. Below I’ll take my own stab at a response.

Step 1: Requirements Gathering

Starting off by talking about antenna types or software tuning is the wrong first step, every time. As much information as is provided in the question, it’s never enough. Wi-Fi is a fickle beast, and collecting requirements is certainly the most important step. I would start by asking qualifying questions such as:

  • What types of devices will be associating?
  • What types of applications are we expecting to support, and/or how much bandwidth is needed per client?
  • What is the construction and layout of the room?
  • What are the restrictions on AP location, such as cabling, mounting, or aesthetic requirements, etc.?

Step 2: Hardware

The correct hardware choice is usually determined by the answers to the questions in the previous section. Some environments, such as stadiums, allow for access points to be mounted under seats, where integrated omnidirectional antennas are adequate. In other areas, such as conference centers where chairs and tables may be moved, access points need to be mounting on walls or high ceilings.

Above 25ft, omnidirectional antennas lose a lot of their performance, as most of the attenuation is into space where there are no clients. In these cases, downtilt omni-directional antennas can provide a similar horizontal range, but better propagated toward the floor. In cases where limiting the propagation is desired, semi-directional or directional antennas will limit the horizontal propagation while also improving the vertical reach.

Step 3: Software Tuning

While every environment is different and requires unique exact configurations, a high density environment almost certainly requires a high density of APs, and with that there are a certain set of options that are best practice for almost all such deployments

Data Rates

In a well designed Wi-Fi environment, it’s a best practice to increase the minimum data rate above the default. 12-18Mbps is a common setting, as it prevents 802.11b devices from joining the BSSID and bringing other clients down, and it reduces the airtime required for management frames, leaving more space for meaningful traffic. It can also reduce effective cell sizes by not supporting clients that have too weak an RSSI to transmit at the increased minimum rate. However, caution is needed as setting the minimum data rate too high can lead to high amounts of corruption

Channel Planning

More APs means more chance for co-channel contention, which negatively impacts all clients on that channel. Where possible, enabling the use of 5GHz UNII-2 extended channels allows for more non-overlapping channels, as long as clients support them. On the 2.4GHz spectrum, with only 3 non-overlapping channels available in the US, disabling the 2.4GHz radio on select APs will reduce the number of APs in an area fighting for the same frequency.

In addition to enabling more 5GHz channels, it’s important to reduce the channel width to allow for more channels to be used concurrently. A high density environment configured for 80MHz-wide channels may only have six non-overlapping channels available, while the same environment configured for 20MHz-wide channels will have 25 non-overlapping channels. There is a tradeoff in throughput by reducing channel width, but that’s usually less important than having more channels available.

With a reduced number of 2.4GHz radios compared to 5GHz, band steering can also be effective at encouraging dual-band clients to connect on the 5GHz channels where there is less congestion.

Power Levels

With high client density, access points are generally placed to cover a chosen number of client devices. Because those clients are in a smaller area than lower density deployments, the AP doesn’t need to cover as large a physical area. Lowering the transmit (TX) power of the APs will reduce the cell size, and thus reduce the amount of co-channel contention.

Advanced Options

Some vendor-specific options, such as Cisco’s RX-SOP, can also impact client connectivity and roaming. While RX-SOP is marketed as helping to “reduce cell size ensuring clients are connected using the highest possible data rate”, this is not what it’s designed for, and improperly configuring these options can negatively impact connectivity. RX-SOP is used to lower the possible contention between APs on the same and adjacent channels by reducing the APs “sensitivity” to packets in determining transmit opportunity. When tuned correctly, it can increase the overall available airtime available.

Most vendors offer some type of Radio Resources Management (RRM) capabilities to automatically tune the settings above, to provide features such as: coverage hole detection and correction, dynamic channel assignment, dynamic transmit power control, and client balancing. However, many RRM solutions don’t do a great job of tuning for high-density environments out of the box, and almost always need tweaking and tuning.


As with any Wi-Fi deployment, there is no “one-size fits all” answer. Site surveys, both pre- and post-installation, are vital in ensuring success.

Multicast over Wireless

Multicast has brought a lot of efficiencies to IP networks. But multicast wasn’t designed for wireless, and especially isn’t well suited for high-bandwidth multicast applications like video. I’ll cover the challenges of multicast over wireless and design considerations.

But first, an overview of multicast:

To level set, I’ll briefly cover IP multicast. For the purposes of this article, I’ll focus specifically on Layer 2, If you’re already familiar with multicast over ethernet, feel free to skip this section.

What is multicast?

In short, multicast is a means of sending the same data to multiple recipients at the same time without the source having to generate copies for each recipient. Whereas broadcast traffic is sent to every device whether they want it or not, multicast allows recipients to subscribe to the traffic they want. As a result, efficiency is improved (traffic is only sent once) and overhead is reduced (unintended recipients don’t receive the traffic).

How does it work?

With multicast, the sender doesn’t know who the recipients are, or even how many there are. In order for a given set of traffic to reach its intended recipients, we send traffic to multicast groups. IANA has reserved 224.0.0.0 – 239.255.255.255 for multicast groups, with 239.0.0.0/8 commonly used within private organizations. Traffic is sent with the unicast source IP of the sender, and a destination IP of the chosen multicast group.

On the receiving side, recipients subscribe to a multicast group using Internet Group Management Protocol (IGMP). A station that wishes to join a multicast group sends an IGMP Membership Report / Join message for that given group. Most enterprise switches, WLCs, or APs use IGMP snooping to inspect IGMP packets and populate their multicast table, which matches ports/devices to multicast groups. Then, when a multicast packet is received, the network device can forward that packet to the intended receivers. Network devices that don’t support IGMP snooping will forward the packet the same as it would a broadcast, to every port except the port the packet came in on. Here’s an example of an IGMP Join request:

The problems with multicast over WiFi vs wired

In a switched wired network, all traffic is sent at the same data rate (generally 1Gbps today) and with each port being its own collision domain, collisions are rare. In addition, wired traffic uses a bounded medium, so interference and frame corruption is also rare. Because of this, there is no network impact to sending large amounts of wired traffic as multicast. WiFi does not share either of these characteristics, which makes multicast more complicated. Below are some of the issues with multicast to multicast over WiFi:

  1. Multicast traffic is sent at a mandatory data rate. As mentioned, WiFi clients share a collision domain. Because multicast is a single transmission that must be received by all transmitted receivers, access points are forced to send that frame at the lowest-common-denominator settings, to give the receivers the best chance of hearing the transmission uncorrupted. While this is fine for small broadcast traffic like beacons, it’s unsustainable for high-bandwidth applications.
  2. Low data-rate traffic consumes more air time. Because multicast traffic is sent at a low data rate, it takes longer for each of those transmissions to complete. A 1MB file sent at a data rate of 1 Mbps will take significantly longer than the same file at a data rate of 54Mbps. This means that all other stations must spend more time waiting for their turn to transmit.
  3. Battery-powered clients have reduced battery life. Multicast and broadcast traffic are sent at the DTIM interval, which all stations keep track of. When a multicast frame is sent, all stations must wake up to listen to the frame, and discard it if they don’t need it. This results in battery-powered devices staying awake for a lot longer than needed. If the DTIM interval is too high, the increased latency can impact real-time applications like video. But the lower the DTIM interval, the more often stations need to wake up.
  4. Multicast senders will not resend corrupt frames. Frame corruption and retransmissions are a standard part of any WiFi transaction. Every unicast frame, even if unacknowledged at upper OSI layers such as when using UDP, are acknowledged at Layer 2, and retransmitted by the sending station if necessary. This may not seem like a big deal at first, as unacknowledged traffic on a wired network works fine most of the time. But in an area of interference or poor RSSI level, it’s not unusual to see 10% of wireless frames retransmitted. 10% loss would be considered extremely high on a wired network, and most applications are unable to handle this level of loss.

So how do we fix it?

There’s no silver bullet to “fixing” multicast over wireless, but there are a few ways to design around the shortcomings.

  1. Increasing the minimum data rate. An increase to the minimum data rate means that broadcast and multicast frames must be sent at the higher rate. Unicast traffic is acknowledged at Layer 2, reducing loss experienced by the upper layers. As mentioned earlier, higher data rates reduce the time spent transmitting, and increase throughput for the multicast traffic. It also reduces the amount of time a battery powered device must spend listening to the frames. However, other design and configuration considerations must be made to ensure the wireless network can support this, as changing the minimum data rate can impact roaming, as well as connectivity for low-powered devices.
  2. Multicast-to-Unicast Conversion (M-to-U). Many vendors of wireless APs support multicast-to-unicast conversion, which sends a unicast copy the frame to each intended receiver, using IGMP snooping to determine those stations. This means that the frame can be sent at the receiving station’s best data rate, which should almost always be above the minimum. Several unicast transmissions at 54Mbps would still use less channel time than the same multicast transmission at 1Mbps. In addition, stations which aren’t the intended receivers don’t need to wake up to listen to the frame, reducing their battery consumption.

The pudding

Let’s take a look at the same multicast frame sent with and without Multicast-to-Unicast Conversion. Using iperf2 (since iperf3 doesn’t support multicast), we’ll generate multicast traffic at a rate of 20Mbps from a wired client and send it to a wireless client, using multicast address 239.255.1.2.

Parameters for this test:
Receiver: MacBook Pro (2015 edition). 3 spatial stream 802.11ac airport card.
Access Point: Cisco Meraki MR42E (802.11ac Wave 2, 3×3:3) with omni-directional dipole antennas.

Wired Multicast Source (10.1.1.216):

Mcast Source.png

Wireless Multicast Recipient (M-to-U enabled): 

Mcast MtoU.jpg

Wireless Multicast Recipient (M-to-U disabled):

Mcast No MtoU.jpg

The first thing to notice is the loss rate. With M-to-U enabled, my 20Mbps stream was successfully being transmitted with almost no loss. With M-to-U disabled, throughput was reduce by roughly 95%, with an average of 1Mbps throughput. There are two reasons for this: first, the mandatory data rate used for the multicast transmission was 6Mbps, of which ~40% is attributed to protocol overhead. In addition, with a unicast transmission the AP can buffer frames to a receiver, whereas a multicast transmission is best effort: it has no layer 2 acknowledgement or communication from the receivers. This can be improved with application-level handling, such as the application deciding to transmit at a lower quality, but there are no guarantees that the application is set up to handle that. iperf has no such throttling/accommodation.

To dive in further, let’s take a look at the differences in the frames transmitted:

Frame Capture (M-to-U enabled):

Unicast Multicast.png

Frame Capture (M-to-U disabled):

Demulticast Multicast.png

We can verify that the second frame is using multicast by the MAC address in the Destination Address field, since all multicast MACs begin with 01-00-5E. Notice also that the source address of the unicast frame is set to the MAC address of the access point as the AP had to generate that frame, whereas the multicast frame’s source is that of the sending station since there was no frame modification needed.

Next, we’ll look at the data rate. Multicast is always sent at the basic rate, which was 6 Mbps for this BSSID, and a transmission time of 2072μs. Compared to M-to-U with a data rate of 540Mbps and a transmission time of 46μs. That means that the multicast transmission held the channel 45 times longer than the unicast, and still only sent half as much data.

Also, since multicast must use the lowest-common-denominator parameters, it cannot take advantage of efficiency improvements such as A-MPDU and multiple spatial streams offered by this AP.

So wouldn’t M-to-U be the silver bullet solution?

As is often the case, the answer is “it depends”. In a lab where my 3 spatial stream MacBook Pro can connect at MCS 8, it may appear so. But, if the majority of clients are connected at a low data rate, and the content only consumes a small amount of bandwidth, the overhead caused by retransmitting small frames for a large number of receivers could add delay and consume more aggregate airtime than simply transmitting once at a low data rate.

Deploying Wi-Fi for Location Analytics

Many Wi-Fi vendors on the market now include the capability to leverage access points for location analytics in addition to serving clients. However, deploying location analytics has its own set of requirements, and attempting to simply leverage the same APs for location analytics may have suboptimal results if not planned out correctly. The following sections will detail some of these design considerations to optimize location accuracy and performance.

How do APs determine a device’s location?

Wi-Fi geolocation is done primarily by collecting the RSSI of frames sent from a client seen by multiple access points in an area, then applying trilateration algorithms to that data to approximate the location of a device. This requires careful placement of access points, as well as accurate placement on a floor plan or other location system within the access point controller.

AP Placement Considerations

First and foremost, for trilateration to work properly a client needs to be heard by at least three APs at any given time, and four would be ideal. On the flip side, more than five or six APs could limit the effectiveness by adding unnecessary noise and interference in the environment. A client that is only seen by two APs will be accurate in one dimension (the distance between the APs), but won’t be able to accurately detect the location in the second dimension.

Contrary to designing for coverage, location detection works best when the service area is completely encapsulated by the access points, meaning that APs are placed on the outer edge of the zone where devices will be located.

Because trilateration happens in the latitude and longitude planes and signal strength is used to determine the distance of a client between APs, placing APs in a perfect grid or line actually inhibits the APs from detecting the offset from each other. It’s recommended to place APs in an imperfect shape, which is especially important in long narrow spaces such as corridors or alleys.

Finally, minimize any major line-of-sight obstructions between APs, especially in areas of heavy traffic. Shelves and walls between APs will impact the RSSI received by the AP, which will place the client further away from the AP than it actually is.

Factors Impacting Accuracy

Traffic Frequency

It’s important to note that the accuracy will be limited by how often the access points see frames from a client. For a mobile phone with the screen turned off, such as in someone’s pocket, APs will rely on the periodic probes that a device will send out, which may be as few as a couple of times per minute, meaning our location detection will only be current to the last probe.

Trilateration Frequency

Because it can take a lot of processing power to constantly detect and triangulate a large number of clients, many Wi-Fi vendors will aggregate the received data and perform the trilateration at regular intervals, such as once per minute. It’s important to review the vendor’s documentation and set expectations accordingly.

MAC Randomization

Both iOS and Android support MAC randomization, which masks the device’s true MAC address in many management frames. This can make triangulating a device, or keeping track of subsequent visits, significantly more difficult. iOS has this feature enabled by default, whereas most Android phones default to disabled. There are ways to de-anonymize these devices, but it’s usually more hassle than it’s worth. The easiest way to overcome MAC randomization is to encourage devices to join the Wi-Fi network, as the real MAC address is used for association.

Beyond Wi-Fi

Because most Wi-Fi clients are mobile and probe frequency is sparse, sub-meter accuracy will be difficult-to-impossible to achieve. Other technologies, such as BLE, RFID, and RTLS may be used in place of, or in addition to, relying on Wi-Fi for location analytics. Some vendors, such as Meraki, include BLE scanning radios in their access points. While BLE can be more accurate than Wi-Fi, a larger percentage of devices are either not BLE-enabled, or users are disabling the BLE radio in their client device.

Wifi and Meraki Widgets for Mac and Windows

I recently decided to try to learn how to write python a little bit. I’m still not very good at it, however I did create something recently that I feel should be shared! Meraki local status pages can provide some very useful information for troubleshooting, however having to browse to ap.meraki.com/switch.meraki.com/wired.meraki.com is not always desirable, nor does it update quickly if you are walking around troubleshooting and connecting to different devices. So I figured hey, lets create a widget or skin for some common overlay tools out there (Ubersicht for Mac, and Rainmeter for Windows) and try and populate some useful information. So without further droning on, I want to introduce the tools I created! 

Meraki Skin for Rainmeter (Windows)

This skin requires the use of Rainmeter for Windows. For those not familiar, Rainmeter is a free tool that allows you to do anything from display useful data about your computer, to writing entire user interfaces to perform just about any function.
My rainmeter skin combines a number of bits of data, from the wifi stats out of netsh, IP info from netsh, and a bunch of data points from the first MR, MS, and/or MX that you are connected behind. There is also a hard requirement for Python3 to be installed in your PATH. This is so rainmeter can execute the script associated without having to derive the correct path based on installation. 
For more information please check out the github repo at: Meraki Rainmeter

Meraki Widget for Ubersicht (Mac)

Colorized based on connection quality

This widget requires the use of Ubersicht as a widget overlay tool. For those not familiar, Ubersicht is extremely lightweight and has a number of really cool widgets you can install. 

This widget has a hard requirement for python3 to be installed as well. One of the neat features of Uebersicht is I was able to color code some of the values for RSSI, Noise Floor, and if connected to an MR, the SNR from an AP perspective. These will change colors based on connectivity from green to yellow, to red. (Special thanks to Nathan Wiens @nwiens for helping me with the HTML)

To install please either browse to the Ubersicht widgets repo 
Or to my GitHub Repo

As always, thanks for reading, and if you have any feedback please leave it in the comments section below. Thanks!

MX Dual VPN Hub OSPF to EIGRP Redistribution

Disclaimer: It is a highly recommended practice to employ a system of peer review for any changes you make that effect data plane traffic. This practice is especially important on systems managed via CLI. CLI is not always consistent between software versions or device types. Reviewing documentation and getting a second set of eyes always helps. CLI configuration had been the de facto method for configuring network equipment up until a few years ago. The only way to do it accurately and consistently is to ensure multiple experienced engineers sign off on the candidate configurations. Without peer review configurations tend to be riddled with typos, artifacts from cut-and-paste, and inconsistent conventions. Always have someone check your work.

 

In this blog post I will review how to implement dual hub Cisco Meraki MX’s into an existing Cisco infrastructure that is running EIGRP as the dynamic routing protocol.

As of June 2018 – MXs allow for OSPF peering when in vpn concentrator mode or in NAT mode with a single VLAN(there is also a beta for BGP but that is for another day). This OSPF peering however only does route injection and does not learn routes. This is to allow for upstream/downstream devices to be aware of VPN peer subnets and also to allow for us to have redundant dynamic routes to VPN peers in the scenario where we have dual hub MXs in a hub and spoke topology.

For a breakdown in understanding MX 1-arm concentrator mode, please refer to the following document:

VPN Concentrator Deployment Guide

and

SD WAN Deployment Guide (CVD)

Now for the fun stuff.

Both of these guides outline in pretty decent detail how to deploy the technologies from a meraki perspective. The one bit of data lacking is integration into existing topologies with EIGRP.

Summary of the problem

In the graphic below, we have 2x MXs that are operating as Hubs in VPN concentrator mode. They are both peering to an upstream L3 routing appliance via OSPF. The L3 appliances are then using EIGRP for dynamic routing within the organization.

The spoke MX is configured to connect to DC1 as the primary VPN path, and DC2 as the secondary VPN path. If we were to just redistribute the 10.10.10.0/24 subnet into EIGRP both MXs would show a more or less equal cost to the destination network of 10.10.10.0/24. This could potentially cause asynchronous routing if the spoke MX sent traffic to DC1 for a service upstream, and then the upstream router chose to send the return traffic to DC2. This is not the desired behavior typically as that means the return path may be less desirable (higher latency, loss, etc). When you add dual hubs to a spoke MX, there is no way on the hubs to prune the routes to the spoke as the hubs will always advertise all spoke networks connected (unless the spoke VPN is down).

To fix this potential problem we need to deploy some configurations to make sure that a spoke’s traffic that goes to a primary hub, returns on the same path.

Diagram

As you can see in the diagram the configuration is not terribly complex however we should break down each piece.

EIGRP

Without going into an entire blog post of how EIGRP works, remember that EIGRP does not use a simple cost setting like you would use in OSPF to weight a route. Instead EIGRP uses {Bandwidth}{Delay}{Reliability}{Load}{MTU}. We will be taking advantage of modifying the {delay} attribute when redistributing to make one set of redistributed routes appear more desirable than others.

A little light reading: Introduction to EIGRP

Prefix Lists

When redistributing routes you can take a number of different approaches. My least favorite is crossing your fingers and just redistributing the entire protocol. This can be useful in some circumstances but can cause more harm than good if you are not 100% on what routes you could possibly be injecting. That is why I recommend using prefix lists to more or less create filters for your redistribution statements. There is a potential if you redistribute the entire protocol that you could cause asynchronous routing as well which we want to avoid as we want the traffic to take the same return path. An example of a prefix list would be:

“ip prefix-list {Name} seq 10 permit {subnet in CIDR notation}”

E.g. “ip prefix-list PRIMARY seq 10 permit 10.10.10.0/24”

As you can see here we can use the sequence {XX} to place a statement before or after. I recommending if your list is not going to be too large to skip a few in between. E.g. Seq 10, seq 20, seq 30. This can be useful when you are building large prefix-lists and need to slide a prefix in between two others.

You can also use the suffix of “le XX” or “ge XX” as greater than or less than the prefix listed. This can be useful if you are trying to match a number of smaller prefixes.

Example:“ip prefix-list PRIMARY seq 20 permit 10.0.0.0/8 ge 16 le 30”

This prefix list would match any prefixes in the 10.0.0.0/8 that are greater than 16 bits and less than 30 bits. So any prefix that is 10.X.X.X/16 – 10.X.X.X/30 would be matched. However 10.X.X.X/31 would not be matched and 10.X.X.X/12 would not be matched due to being too small and too large respectively.

How could we use prefix lists though?

In this case we want to make the DC1 advertisement look much better than DC2, and only for the spoke MXs that are using DC1 as their primary. To accomplish this we would use prefix-lists to only match those spoke sites, and set a metric that is desirable on DC1 and less desirable on DC2.

This configuration on the DC1 IOS L3 appliance would be:

“ip prefix-list PRIMARY seq 10 permit 10.10.10.0/24”

On the DC2 IOS L3 appliance it would be:

“ip prefix-list SECONDARY seq 10 permit 10.10.10.0/24”

For more on prefix lists please read the following great blog article: PacketLife – Understanding IP Prefix Lists

Route Maps

Now to inject the prefix-lists we created, we need to utilize route maps. Route Maps are extremely versatile in function and can perform anything from route filtering to policy based routing and beyond. In this case we are just going to use route maps for route matching and injection.

On DC1 the route map would be:

Route-map HUB-Primary permit 10

Match IP address prefix-list PRIMARY

On DC2 the route map would be:

Route-map HUB-Secondary permit 10

Match IP address prefix-list SECONDARY

Route Map light reading: Route-Maps for IP Routing Protocol Redistribution Configuration

OSPF Configuration

One thing I always try to do is avoid making changes to a routing process when I have a multi-step configuration I am working on. This is in part to preserve the existing routing table and to make sure we do not make any config changes that would be potentially outage causing. To get OSPF up and running there are a number of configs to keep in mind. In this case the OSPF config is contained between the L3 first hop and the MX so it is not as paramount to add in all the little configuration tidbits that you would in a larger scale OSPF deployment. That being said if you are a big OSPF fan and want to build out your configuration with router-ids and other fun things have at it. In my example I will not be tuning the OSPF config.

DC1 & 2 OSPF configuration:

Router OSPF 10

Network X.X.X.X X.X.X.X area 0 #where the Xs represent the P2P network and wildcard mask between your MX and L3

E.G. “network 10.255.255.0 0.0.0.3 area 0”

Passive-interface default # Don’t advertise OSPF on any interface

No passive-interface {interface name} #ok advertise on this interface

E.g. “no passive-interface gigabitethernet 1/0/48”

Redistribution into EIGRP

Now what we all have been waiting for…. Let’s get some routes into EIGRP!

To redistribute the routes from our VPN topology into EIGRP we are going to tie our previous configurations together in 1 nice lengthy statement.

For DC1:

redistribute ospf {process} route-map {route-map we created} metric {bandwidth | delay | reliability | load | MTU} 

E.G. “redistribute ospf 10 route-map HUB-Primary metric 10000 10 255 5 1500”

For DC2:

E.G. “redistribute ospf 10 route-map HUB-Secondary metric 10000 1000 255 5 1500″

Notice in the above example the underlined delay value is highly increased from the primary hub redistribution. This will make it more or less a backup route for access to the VPN subnet.

What if I have different spokes using DC2 as primary and DC1 as secondary?

In a lot of situations you have some spokes terminating on DC1 or DC2 as their primary hub. Which is to be expected as deployments grow and bandwidth isn’t always easily upgraded. To deploy we can take advantage of what we have already built, and add a little more config. So in the event that we have a spoke site with 10.10.10.0/24 terminating on DC1 as a primary, and 10.10.20.0/24 terminating on DC2 as primary, we would do the following:

For DC1:

ip prefix-list PRIMARY seq 10 permit 10.10.10.0/24

ip prefix-list SECONDARY seq 10 permit 10.10.20.0/24

Route-map HUB-Primary permit 10

Match IP address prefix-list PRIMARY

Route-map HUB-Secondary permit 10

Match IP address prefix-list SECONDARY

router ospf 10

redistribute ospf 10 route-map HUB-Primary metric 10000 10 255 5 1500

redistribute ospf 10 route-map HUB-Secondary metric 10000 1000 255 5 1500

For DC2:

ip prefix-list PRIMARY seq 10 permit 10.10.20.0/24

ip prefix-list SECONDARY seq 10 permit 10.10.10.0/24

Route-map HUB-Primary permit 10

Match IP address prefix-list PRIMARY

Route-map HUB-Secondary permit 10

Match IP address prefix-list SECONDARY

router ospf 10

redistribute ospf 10 route-map HUB-Primary metric 10000 10 255 5 1500

redistribute ospf 10 route-map HUB-Secondary metric 10000 1000 255 5 1500

In the above examples, what we did is make sure the primary spokes have priority from the primary hubs, and the secondary spokes are heavily weighted to be backup routes in the event that their primary hub goes down.

In Closing

This entire post came about after a number of situations I have had with customers needing redistribution and not having a clear path on how to do so. I included some links throughout the document that I urge you to read as you are configuring either in a lab or production. If you find any errors in my configurations or recommendations please let me know either in the comments or via DM on twitter/linkedin. Thank you for reading and I hope this was informative for you!

Deconstructing the RADIUS CoA process

If you need to brush up on the RADIUS process, please read my previous post:
Following the 802.1X AAA process with Packet Captures

Everyone talks about it, yet I rarely meet folks that really understand what CoA (Change of Authorization) means for RADIUS authentication and client access. I recently spent a few hours troubleshooting RADIUS CoA and figure since it is fresh in my mind maybe I can share and hopefully help others out in the field.

So….

In Summary: RADIUS Change of Authorization (RFC 3576 & RFC 5176) Allows a RADIUS server to send unsolicited messages to the Network Access Server (aka Network Access Device/Authenticator in Cisco terminology e.g. AP/WLC/Switch/Firewall) to change the connected client’s authorized state. This could mean anything from disconnecting the client, to sending different attribute value pairs to the Authenticator to change the device’s VLAN/ACL and more. It is fairly robust in what it can do so I may not go too deep as I want this to be consumable.

What RADIUS CoA is NOT: Magic!

I will be walking through CoA Use Cases, what CoA looks like from a PCAP perspective,  and how to gather data for troubleshooting.


RADIUS CoA Typical Use Cases:

Central captive portal (Open SSID with MAC filtering) – Especially with Cisco ISE, RADIUS CoA is the core feature set required for the captive portal. In the example below, we are redirecting a client to a splash page for either Authentication or Acceptable Use Policy review. As you can see below we have a pretty simple process.

  1. The client connects to the network (wired/wireless)
  2. Client MAC address is sent to RADIUS server as a username and password (Access-Request)
  3. RADIUS server responds with an Access-Accept and a URL redirect. (could also include a VLAN assignment)
  4. The client is redirected to the splash portal
  5. User logs in using the credentials required
  6. RADIUS server then sends a CoA with a request to reauthenticate
  7. Authenticator (AP/Switch/WLC) sends a CoA-ACK
  8. Authenticator sends an Access_Request with existing Session-Id and authentication data.
  9. RADIUS server then responds back with Access-Accept and any extra functions e.g. a Filter-ID for group policy assignment in Meraki Wireless.
  10. YAY INTERNET!

Wireless and Wired CoA-Reauthenticate Process

Screen Shot 2018-01-16 at 2.17.07 PM

The above process is also used for secure device registration and URL redirects for blacklisting etc. but would involve a complete client authentication/reauthentication via EAP instead of MAC authentication. For an example check the shared captures labeled 1-of-2 and 2-of-2. These contain the EAPoL side and the RADIUS side.

Client Posturing – In some cases you may want to perform posturing on the end client. This, more often than not, requires a client on the end machine, whether it is a dissolvable agent with java, or a thick client like Cisco AnyConnect. The whole goal of posturing is making sure the clients that have access to your internal resources are properly secured from threats. A common scenario is a user removing or disabling Anti-Virus. When this event occurs it may be desired to limit that client’s access to the network until AV is reinstalled or enabled. This could be done through an ACL or VLAN change.

One of the difficult situations that arises when changing VLANs is the client may not release their IP address. In 802.11 this is easily handled by sending a disconnect-request instead of reauthentication. In wired authentication scenarios this is not typically recommended as it requires a port bounce and can take some tweaking to make work well, if at all. Instead of a VLAN change it is recommended to perform ACL changes to wired clients. On a catalyst switch this could be a dACL (Downloadable ACL) for instance.

Dynamic Network Restrictions – Closely following the use case above, a client’s access may need to be dynamically changed if they are not adhering to the network policy. Using products such as Cisco’s Stealthwatch in tandem with Cisco ISE, we could monitor a client for data dumping thresholds and change the VLAN/ACL applied to them or shut down the port to minimize the impact. This is just one example of the many possibilities.

Wireless Disconnect-Request Flow

Screen Shot 2018-01-17 at 11.28.18 AM


Now on to the Fun stuff….

To capture CoA packets:

The CoA packets are only seen between the authenticator and the authentication server. Therefore we need to capture between the authenticator and the authentication server as depicted below.

Screen Shot 2018-01-16 at 11.15.39 AM

In most environments this consists of using a SPAN/RSPAN port to capture traffic. Some vendors do provide the ability to perform tcpdumps/pcaps which can be a little easier, especially if you are offsite. For capture applications I tend to lean towards using wireshark as it is free and powerful. To download please go to Wireshark.org.

CoA Messages are sent on two different udp ports depending on the platform. Cisco standardizes on UDP port 1700, while the actual RFC calls out using UDP port 3799. These messages are all included in the “radius” wireshark filter.

Just in case you don’t have a test network please feel free to use the pcaps in this share:

CoA PCAP Examples


RADIUS CoA Packet Types

There are two different RADIUS CoA packets that are sent from the RADIUS Server (Authentication Server):

  • Disconnect-Request – Requests to terminate the session of the client.
  • CoA-Request – Requests to do a number of things from reauthenticate to port-bounce, shutdown, and more.

And there are four that are sent from the NAS/NAD/Authenticator:

  • Disconnect-ACK – Acknowledgment of successful disconnect
  • Disconnect-NAK – Failed session disconnect
  • CoA-ACK – Acknowledgment of successful CoA action
  • CoA-NAK – Failed CoA action

RADIUS Server Sourced Packets

In this section we will review the two CoA messages that are sent from the RADIUS server and the useful material in the packet.

Disconnect-Request Message

Wireshark Filter: radius.code == 40

This packet is sent from the RADIUS server and is used to simply disconnect the client from the current session. This also typically involves an immediate re-authentication by the client. Disconnect-Requests can/should be used in 802.11 situations where a VLAN change needs to occur. If we simply used a CoA-Request (as we’ll see later), the client may be changed to a new VLAN while keeping the IP address it obtained from the former VLAN, clearly causing problems.

Screen Shot 2018-01-12 at 10.29.29 AM

A few useful attributes in this message are:

Account-Terminate-Cause 

The account terminate cause will let you know the reason for the request. This can vary but typically is classified as an Admin-Reset from a Cisco ISE Perspective.

Screen Shot 2018-01-12 at 10.31.56 AM

Audit-Session-ID & Calling-Station-ID

These fields can be used to filter information from your RADIUS server regarding the client MAC address (Calling-Station-Id) and session ID. So when you need to hunt down a particular failure in a log, you can correlate the logs via these two attributes.

Screen Shot 2018-01-12 at 10.33.09 AM

NAS Response Link

Wireshark helpfully gives a link to the frame that is the NAS response to the RADIUS server. This Disconnect-Ack packet will be reviewed in the authenticator sourced packets section later in this post.

Screen Shot 2018-01-12 at 11.02.31 AM

 

CoA Request

Wireshark Filter: radius.code == 43

Unlike the Disconnect-Request above, a CoA-Request can contain a number of actions. This can include anything from reauthentication to bouncing or shutting down a port. A lot of these can be vendor specific responses. So in this instance I am going to use the Cisco ISE CoA Request info. One thing to note is useful attributes are also still the Audit-Session-Id, Calling-Station-ID, and the Response Link as well as the attributes below.

Screen Shot 2018-01-12 at 4.49.13 PM

Useful Info:

Cisco-AVPair: subscriber:command = XXXXXX

This is where we are able to request that the authenticator perform a function. With Cisco ISE it is rolled into a Cisco-AVPair: subscriber:command.

For instance:

  • subscriber:command=reauthenticate

This request will cause a reauthentication either for the client via EAP, or the authenticator may send the MAC address and session ID again in the event that it is a MAC authenticated session.

Screen Shot 2018-01-17 at 2.29.52 PM

 

  • subscriber:command=bounce-host-port

This is a wired only CoA Request. A request to bounce the host port will end up with a link-down link-up event on the switchport. This can be useful for trying to move a client to a new VLAN if possible. This is not something I recommend defaulting to for guest portals however as it can take some tweaking to the core CoA configurations. In ISE this would involve rewriting the Network Device Profile and CoA ReAuth requests to include a port-bounce, which I do not believe is a recommended practice.

Screen Shot 2018-01-17 at 2.37.52 PM

 

  • subscriber:command=disable-host-port

This is another wired only CoA request. This will disable the switchport if the switch supports it. I have seen cases where the end switch may not support a port shutdown and will bounce the port instead. This is not a recommended CoA request for most situations as it takes manual intervention to resolve. Instead a VLAN or ACL change is far more effective, even if the VLAN doesn’t exist (blackhole).

Screen Shot 2018-01-17 at 3.41.15 PM


Authenticator Sourced Packets

Now we will review the packets that are sent in response to the CoA or disconnect request from the server. These are fairly simple and usually only include an ACK for pass or NAK for failure.

Disconnect-ACK

Wireshark Filter: radius.code == 41

This is an acknowledgment of a successful disconnect-request instruction from the authenticator to the RADIUS server. This packet can contain attributes such as the session that was disconnected, calling-station-id, or just simply the Message-Authenticator.

Screen Shot 2018-01-17 at 4.18.15 PM

Disconnect-NAK

Wireshark Filter: radius.code == 42

This is an acknowledgement of a failed disconnect-request. This might happen if the client is already disconnected, or if the session has ended prior to the disconnect request. In the example screenshot we can see a bit of useful information in the error cause attribute.

Screen Shot 2018-01-17 at 4.15.00 PM

CoA-ACK

Wireshark Filter: radius.code == 44

As with the Disconnect-ACK, the CoA-ACK is just an acknowledgement of the success of the CoA requested action. This packet can contain attributes such as the session that was disconnected, calling-station-id, or just simply the Message-Authenticator.

Screen Shot 2018-01-17 at 4.17.05 PM

CoA-NAK

Wireshark Filter: radius.code == 45

Once again just like the Disconnect-NAK, the CoA-NAK is an acknowledgement of a failed CoA action. This could be due to lack of support or the session has ended prior to the CoA-Request. Just like the Disconnect-NAK we get a nice Error-Cause for further troubleshooting.

Screen Shot 2018-01-17 at 4.15.38 PM

 

In closing

One thing to remember is CoA can be used to create some very complex if-this-then-that type scenarios. In the end however it is not a complex feature and definitely not magic! I hope this post was informative for you. If you find anything incorrect please let me know. Thanks and good luck!