MX Dual VPN Hub OSPF to EIGRP Redistribution

Disclaimer: It is a highly recommended practice to employ a system of peer review for any changes you make that effect data plane traffic. This practice is especially important on systems managed via CLI. CLI is not always consistent between software versions or device types. Reviewing documentation and getting a second set of eyes always helps. CLI configuration had been the de facto method for configuring network equipment up until a few years ago. The only way to do it accurately and consistently is to ensure multiple experienced engineers sign off on the candidate configurations. Without peer review configurations tend to be riddled with typos, artifacts from cut-and-paste, and inconsistent conventions. Always have someone check your work.

 

In this blog post I will review how to implement dual hub Cisco Meraki MX’s into an existing Cisco infrastructure that is running EIGRP as the dynamic routing protocol.

As of June 2018 – MXs allow for OSPF peering when in vpn concentrator mode or in NAT mode with a single VLAN(there is also a beta for BGP but that is for another day). This OSPF peering however only does route injection and does not learn routes. This is to allow for upstream/downstream devices to be aware of VPN peer subnets and also to allow for us to have redundant dynamic routes to VPN peers in the scenario where we have dual hub MXs in a hub and spoke topology.

For a breakdown in understanding MX 1-arm concentrator mode, please refer to the following document:

VPN Concentrator Deployment Guide

and

SD WAN Deployment Guide (CVD)

Now for the fun stuff.

Both of these guides outline in pretty decent detail how to deploy the technologies from a meraki perspective. The one bit of data lacking is integration into existing topologies with EIGRP.

Summary of the problem

In the graphic below, we have 2x MXs that are operating as Hubs in VPN concentrator mode. They are both peering to an upstream L3 routing appliance via OSPF. The L3 appliances are then using EIGRP for dynamic routing within the organization.

The spoke MX is configured to connect to DC1 as the primary VPN path, and DC2 as the secondary VPN path. If we were to just redistribute the 10.10.10.0/24 subnet into EIGRP both MXs would show a more or less equal cost to the destination network of 10.10.10.0/24. This could potentially cause asynchronous routing if the spoke MX sent traffic to DC1 for a service upstream, and then the upstream router chose to send the return traffic to DC2. This is not the desired behavior typically as that means the return path may be less desirable (higher latency, loss, etc). When you add dual hubs to a spoke MX, there is no way on the hubs to prune the routes to the spoke as the hubs will always advertise all spoke networks connected (unless the spoke VPN is down).

To fix this potential problem we need to deploy some configurations to make sure that a spoke’s traffic that goes to a primary hub, returns on the same path.

Diagram

As you can see in the diagram the configuration is not terribly complex however we should break down each piece.

EIGRP

Without going into an entire blog post of how EIGRP works, remember that EIGRP does not use a simple cost setting like you would use in OSPF to weight a route. Instead EIGRP uses {Bandwidth}{Delay}{Reliability}{Load}{MTU}. We will be taking advantage of modifying the {delay} attribute when redistributing to make one set of redistributed routes appear more desirable than others.

A little light reading: Introduction to EIGRP

Prefix Lists

When redistributing routes you can take a number of different approaches. My least favorite is crossing your fingers and just redistributing the entire protocol. This can be useful in some circumstances but can cause more harm than good if you are not 100% on what routes you could possibly be injecting. That is why I recommend using prefix lists to more or less create filters for your redistribution statements. There is a potential if you redistribute the entire protocol that you could cause asynchronous routing as well which we want to avoid as we want the traffic to take the same return path. An example of a prefix list would be:

“ip prefix-list {Name} seq 10 permit {subnet in CIDR notation}”

E.g. “ip prefix-list PRIMARY seq 10 permit 10.10.10.0/24”

As you can see here we can use the sequence {XX} to place a statement before or after. I recommending if your list is not going to be too large to skip a few in between. E.g. Seq 10, seq 20, seq 30. This can be useful when you are building large prefix-lists and need to slide a prefix in between two others.

You can also use the suffix of “le XX” or “ge XX” as greater than or less than the prefix listed. This can be useful if you are trying to match a number of smaller prefixes.

Example:“ip prefix-list PRIMARY seq 20 permit 10.0.0.0/8 ge 16 le 30”

This prefix list would match any prefixes in the 10.0.0.0/8 that are greater than 16 bits and less than 30 bits. So any prefix that is 10.X.X.X/16 – 10.X.X.X/30 would be matched. However 10.X.X.X/31 would not be matched and 10.X.X.X/12 would not be matched due to being too small and too large respectively.

How could we use prefix lists though?

In this case we want to make the DC1 advertisement look much better than DC2, and only for the spoke MXs that are using DC1 as their primary. To accomplish this we would use prefix-lists to only match those spoke sites, and set a metric that is desirable on DC1 and less desirable on DC2.

This configuration on the DC1 IOS L3 appliance would be:

“ip prefix-list PRIMARY seq 10 permit 10.10.10.0/24”

On the DC2 IOS L3 appliance it would be:

“ip prefix-list SECONDARY seq 10 permit 10.10.10.0/24”

For more on prefix lists please read the following great blog article: PacketLife – Understanding IP Prefix Lists

Route Maps

Now to inject the prefix-lists we created, we need to utilize route maps. Route Maps are extremely versatile in function and can perform anything from route filtering to policy based routing and beyond. In this case we are just going to use route maps for route matching and injection.

On DC1 the route map would be:

Route-map HUB-Primary permit 10

Match IP address prefix-list PRIMARY

On DC2 the route map would be:

Route-map HUB-Secondary permit 10

Match IP address prefix-list SECONDARY

Route Map light reading: Route-Maps for IP Routing Protocol Redistribution Configuration

OSPF Configuration

One thing I always try to do is avoid making changes to a routing process when I have a multi-step configuration I am working on. This is in part to preserve the existing routing table and to make sure we do not make any config changes that would be potentially outage causing. To get OSPF up and running there are a number of configs to keep in mind. In this case the OSPF config is contained between the L3 first hop and the MX so it is not as paramount to add in all the little configuration tidbits that you would in a larger scale OSPF deployment. That being said if you are a big OSPF fan and want to build out your configuration with router-ids and other fun things have at it. In my example I will not be tuning the OSPF config.

DC1 & 2 OSPF configuration:

Router OSPF 10

Network X.X.X.X X.X.X.X area 0 #where the Xs represent the P2P network and wildcard mask between your MX and L3

E.G. “network 10.255.255.0 0.0.0.3 area 0”

Passive-interface default # Don’t advertise OSPF on any interface

No passive-interface {interface name} #ok advertise on this interface

E.g. “no passive-interface gigabitethernet 1/0/48”

Redistribution into EIGRP

Now what we all have been waiting for…. Let’s get some routes into EIGRP!

To redistribute the routes from our VPN topology into EIGRP we are going to tie our previous configurations together in 1 nice lengthy statement.

For DC1:

redistribute ospf {process} route-map {route-map we created} metric {bandwidth | delay | reliability | load | MTU} 

E.G. “redistribute ospf 10 route-map HUB-Primary metric 10000 10 255 5 1500”

For DC2:

E.G. “redistribute ospf 10 route-map HUB-Secondary metric 10000 1000 255 5 1500″

Notice in the above example the underlined delay value is highly increased from the primary hub redistribution. This will make it more or less a backup route for access to the VPN subnet.

What if I have different spokes using DC2 as primary and DC1 as secondary?

In a lot of situations you have some spokes terminating on DC1 or DC2 as their primary hub. Which is to be expected as deployments grow and bandwidth isn’t always easily upgraded. To deploy we can take advantage of what we have already built, and add a little more config. So in the event that we have a spoke site with 10.10.10.0/24 terminating on DC1 as a primary, and 10.10.20.0/24 terminating on DC2 as primary, we would do the following:

For DC1:

ip prefix-list PRIMARY seq 10 permit 10.10.10.0/24

ip prefix-list SECONDARY seq 10 permit 10.10.20.0/24

Route-map HUB-Primary permit 10

Match IP address prefix-list PRIMARY

Route-map HUB-Secondary permit 10

Match IP address prefix-list SECONDARY

router ospf 10

redistribute ospf 10 route-map HUB-Primary metric 10000 10 255 5 1500

redistribute ospf 10 route-map HUB-Secondary metric 10000 1000 255 5 1500

For DC2:

ip prefix-list PRIMARY seq 10 permit 10.10.20.0/24

ip prefix-list SECONDARY seq 10 permit 10.10.10.0/24

Route-map HUB-Primary permit 10

Match IP address prefix-list PRIMARY

Route-map HUB-Secondary permit 10

Match IP address prefix-list SECONDARY

router ospf 10

redistribute ospf 10 route-map HUB-Primary metric 10000 10 255 5 1500

redistribute ospf 10 route-map HUB-Secondary metric 10000 1000 255 5 1500

In the above examples, what we did is make sure the primary spokes have priority from the primary hubs, and the secondary spokes are heavily weighted to be backup routes in the event that their primary hub goes down.

In Closing

This entire post came about after a number of situations I have had with customers needing redistribution and not having a clear path on how to do so. I included some links throughout the document that I urge you to read as you are configuring either in a lab or production. If you find any errors in my configurations or recommendations please let me know either in the comments or via DM on twitter/linkedin. Thank you for reading and I hope this was informative for you!

Advertisements

Deconstructing the RADIUS CoA process

If you need to brush up on the RADIUS process, please read my previous post:
Following the 802.1X AAA process with Packet Captures

Everyone talks about it, yet I rarely meet folks that really understand what CoA (Change of Authorization) means for RADIUS authentication and client access. I recently spent a few hours troubleshooting RADIUS CoA and figure since it is fresh in my mind maybe I can share and hopefully help others out in the field.

So….

In Summary: RADIUS Change of Authorization (RFC 3576 & RFC 5176) Allows a RADIUS server to send unsolicited messages to the Network Access Server (aka Network Access Device/Authenticator in Cisco terminology e.g. AP/WLC/Switch/Firewall) to change the connected client’s authorized state. This could mean anything from disconnecting the client, to sending different attribute value pairs to the Authenticator to change the device’s VLAN/ACL and more. It is fairly robust in what it can do so I may not go too deep as I want this to be consumable.

What RADIUS CoA is NOT: Magic!

I will be walking through CoA Use Cases, what CoA looks like from a PCAP perspective,  and how to gather data for troubleshooting.


RADIUS CoA Typical Use Cases:

Central captive portal (Open SSID with MAC filtering) – Especially with Cisco ISE, RADIUS CoA is the core feature set required for the captive portal. In the example below, we are redirecting a client to a splash page for either Authentication or Acceptable Use Policy review. As you can see below we have a pretty simple process.

  1. The client connects to the network (wired/wireless)
  2. Client MAC address is sent to RADIUS server as a username and password (Access-Request)
  3. RADIUS server responds with an Access-Accept and a URL redirect. (could also include a VLAN assignment)
  4. The client is redirected to the splash portal
  5. User logs in using the credentials required
  6. RADIUS server then sends a CoA with a request to reauthenticate
  7. Authenticator (AP/Switch/WLC) sends a CoA-ACK
  8. Authenticator sends an Access_Request with existing Session-Id and authentication data.
  9. RADIUS server then responds back with Access-Accept and any extra functions e.g. a Filter-ID for group policy assignment in Meraki Wireless.
  10. YAY INTERNET!

Wireless and Wired CoA-Reauthenticate Process

Screen Shot 2018-01-16 at 2.17.07 PM

The above process is also used for secure device registration and URL redirects for blacklisting etc. but would involve a complete client authentication/reauthentication via EAP instead of MAC authentication. For an example check the shared captures labeled 1-of-2 and 2-of-2. These contain the EAPoL side and the RADIUS side.

Client Posturing – In some cases you may want to perform posturing on the end client. This, more often than not, requires a client on the end machine, whether it is a dissolvable agent with java, or a thick client like Cisco AnyConnect. The whole goal of posturing is making sure the clients that have access to your internal resources are properly secured from threats. A common scenario is a user removing or disabling Anti-Virus. When this event occurs it may be desired to limit that client’s access to the network until AV is reinstalled or enabled. This could be done through an ACL or VLAN change.

One of the difficult situations that arises when changing VLANs is the client may not release their IP address. In 802.11 this is easily handled by sending a disconnect-request instead of reauthentication. In wired authentication scenarios this is not typically recommended as it requires a port bounce and can take some tweaking to make work well, if at all. Instead of a VLAN change it is recommended to perform ACL changes to wired clients. On a catalyst switch this could be a dACL (Downloadable ACL) for instance.

Dynamic Network Restrictions – Closely following the use case above, a client’s access may need to be dynamically changed if they are not adhering to the network policy. Using products such as Cisco’s Stealthwatch in tandem with Cisco ISE, we could monitor a client for data dumping thresholds and change the VLAN/ACL applied to them or shut down the port to minimize the impact. This is just one example of the many possibilities.

Wireless Disconnect-Request Flow

Screen Shot 2018-01-17 at 11.28.18 AM


Now on to the Fun stuff….

To capture CoA packets:

The CoA packets are only seen between the authenticator and the authentication server. Therefore we need to capture between the authenticator and the authentication server as depicted below.

Screen Shot 2018-01-16 at 11.15.39 AM

In most environments this consists of using a SPAN/RSPAN port to capture traffic. Some vendors do provide the ability to perform tcpdumps/pcaps which can be a little easier, especially if you are offsite. For capture applications I tend to lean towards using wireshark as it is free and powerful. To download please go to Wireshark.org.

CoA Messages are sent on two different udp ports depending on the platform. Cisco standardizes on UDP port 1700, while the actual RFC calls out using UDP port 3799. These messages are all included in the “radius” wireshark filter.

Just in case you don’t have a test network please feel free to use the pcaps in this share:

CoA PCAP Examples


RADIUS CoA Packet Types

There are two different RADIUS CoA packets that are sent from the RADIUS Server (Authentication Server):

  • Disconnect-Request – Requests to terminate the session of the client.
  • CoA-Request – Requests to do a number of things from reauthenticate to port-bounce, shutdown, and more.

And there are four that are sent from the NAS/NAD/Authenticator:

  • Disconnect-ACK – Acknowledgment of successful disconnect
  • Disconnect-NAK – Failed session disconnect
  • CoA-ACK – Acknowledgment of successful CoA action
  • CoA-NAK – Failed CoA action

RADIUS Server Sourced Packets

In this section we will review the two CoA messages that are sent from the RADIUS server and the useful material in the packet.

Disconnect-Request Message

Wireshark Filter: radius.code == 40

This packet is sent from the RADIUS server and is used to simply disconnect the client from the current session. This also typically involves an immediate re-authentication by the client. Disconnect-Requests can/should be used in 802.11 situations where a VLAN change needs to occur. If we simply used a CoA-Request (as we’ll see later), the client may be changed to a new VLAN while keeping the IP address it obtained from the former VLAN, clearly causing problems.

Screen Shot 2018-01-12 at 10.29.29 AM

A few useful attributes in this message are:

Account-Terminate-Cause 

The account terminate cause will let you know the reason for the request. This can vary but typically is classified as an Admin-Reset from a Cisco ISE Perspective.

Screen Shot 2018-01-12 at 10.31.56 AM

Audit-Session-ID & Calling-Station-ID

These fields can be used to filter information from your RADIUS server regarding the client MAC address (Calling-Station-Id) and session ID. So when you need to hunt down a particular failure in a log, you can correlate the logs via these two attributes.

Screen Shot 2018-01-12 at 10.33.09 AM

NAS Response Link

Wireshark helpfully gives a link to the frame that is the NAS response to the RADIUS server. This Disconnect-Ack packet will be reviewed in the authenticator sourced packets section later in this post.

Screen Shot 2018-01-12 at 11.02.31 AM

 

CoA Request

Wireshark Filter: radius.code == 43

Unlike the Disconnect-Request above, a CoA-Request can contain a number of actions. This can include anything from reauthentication to bouncing or shutting down a port. A lot of these can be vendor specific responses. So in this instance I am going to use the Cisco ISE CoA Request info. One thing to note is useful attributes are also still the Audit-Session-Id, Calling-Station-ID, and the Response Link as well as the attributes below.

Screen Shot 2018-01-12 at 4.49.13 PM

Useful Info:

Cisco-AVPair: subscriber:command = XXXXXX

This is where we are able to request that the authenticator perform a function. With Cisco ISE it is rolled into a Cisco-AVPair: subscriber:command.

For instance:

  • subscriber:command=reauthenticate

This request will cause a reauthentication either for the client via EAP, or the authenticator may send the MAC address and session ID again in the event that it is a MAC authenticated session.

Screen Shot 2018-01-17 at 2.29.52 PM

 

  • subscriber:command=bounce-host-port

This is a wired only CoA Request. A request to bounce the host port will end up with a link-down link-up event on the switchport. This can be useful for trying to move a client to a new VLAN if possible. This is not something I recommend defaulting to for guest portals however as it can take some tweaking to the core CoA configurations. In ISE this would involve rewriting the Network Device Profile and CoA ReAuth requests to include a port-bounce, which I do not believe is a recommended practice.

Screen Shot 2018-01-17 at 2.37.52 PM

 

  • subscriber:command=disable-host-port

This is another wired only CoA request. This will disable the switchport if the switch supports it. I have seen cases where the end switch may not support a port shutdown and will bounce the port instead. This is not a recommended CoA request for most situations as it takes manual intervention to resolve. Instead a VLAN or ACL change is far more effective, even if the VLAN doesn’t exist (blackhole).

Screen Shot 2018-01-17 at 3.41.15 PM


Authenticator Sourced Packets

Now we will review the packets that are sent in response to the CoA or disconnect request from the server. These are fairly simple and usually only include an ACK for pass or NAK for failure.

Disconnect-ACK

Wireshark Filter: radius.code == 41

This is an acknowledgment of a successful disconnect-request instruction from the authenticator to the RADIUS server. This packet can contain attributes such as the session that was disconnected, calling-station-id, or just simply the Message-Authenticator.

Screen Shot 2018-01-17 at 4.18.15 PM

Disconnect-NAK

Wireshark Filter: radius.code == 42

This is an acknowledgement of a failed disconnect-request. This might happen if the client is already disconnected, or if the session has ended prior to the disconnect request. In the example screenshot we can see a bit of useful information in the error cause attribute.

Screen Shot 2018-01-17 at 4.15.00 PM

CoA-ACK

Wireshark Filter: radius.code == 44

As with the Disconnect-ACK, the CoA-ACK is just an acknowledgement of the success of the CoA requested action. This packet can contain attributes such as the session that was disconnected, calling-station-id, or just simply the Message-Authenticator.

Screen Shot 2018-01-17 at 4.17.05 PM

CoA-NAK

Wireshark Filter: radius.code == 45

Once again just like the Disconnect-NAK, the CoA-NAK is an acknowledgement of a failed CoA action. This could be due to lack of support or the session has ended prior to the CoA-Request. Just like the Disconnect-NAK we get a nice Error-Cause for further troubleshooting.

Screen Shot 2018-01-17 at 4.15.38 PM

 

In closing

One thing to remember is CoA can be used to create some very complex if-this-then-that type scenarios. In the end however it is not a complex feature and definitely not magic! I hope this post was informative for you. If you find anything incorrect please let me know. Thanks and good luck!

Single SSID BYOD Onboarding

**This video builds on top of the previous video of BYOD with Device Registration and Native Supplicant Provisioning. So please be sure to watch it for configuring the certificate templates and some of the SSID configuration. **

In this video we configure ISE and wireless with a single SSID for WPA2-Enterprise to perform device registration and EAP-TLS provisioning.

BYOD with Device Registration and Native Supplicant Provisioning

Aside from standard radius authentication and guest access, ISE is also useful for secure BYOD access. In this video I walk through building an onboarding SSID and Secure SSID in dashboard. Then in ISE we configure the guest portal, certificate template, native supplicant provisioning profile, and rule sets to put it all in play. Once that is done we test and verify access.