Natural Networks: 2024

Introduction

Automation of the physical network remains an elusive goal for many. While cloud networking (both private and public clouds of various flavours) have provided accessible network automation through SDN (Software Defined Networking) the physical network is still largely handled manually using the CLI (Command Line Interface) or via APIs (Application Programming Interfaces).

Products such as Arista's CloudVision and Cisco's SD-Access offer 'simplified' network automation but these aren't the same as SDN as they're more like orchestration tools that can create/read/update/delete configurations on network devices through APIs, they aren't directly managing the control plane like SDN, they may know the state of the network (eg, routing, spanning-tree etc) but they cannot directly influence it without applying configuration changes to the devices.

With modern network devices you will find APIs such as NETCONF (Network Configuration Protocol)/RESTCONF (Representational State Transfer Configuration Protocol, think HTTP NETCONF), and gRPC (Google Remote Procedure Call). However, there are still many older network devices in service that do not support these modern programming interfaces.

Therefore we're left with using imaginative ways to automate the network using CLI based tools - having to programmatically deal with unstructured data that's designed for humans, not machines.

In this brief example I will show how we can use Nornir to run tasks against an inventory of network devices, Netmiko to connect to each device and run commands, and TextFSM to parse the unstructured output of the commands.

The Issue

Many processes of a network are distributed and stateful in nature, eg routing protocols, spanning-trees, multicast, etc While you can discern how they will behave based upon the configuration, you won't know for sure until you observe them in operation. In this example I will automate the check to see if the desired spanning-tree root bridge is selected.

In this scenario, MST (Multiple Spanning-Tree) is used, each switch needs to be checked for its root bridge value, to see if it's the expected value (which in this case is the upstream building distribution switch). This will show up any issues such as a misconfigured switch taking over as the root bridge or some other unexpected behaviour. Note the script will also work with PVST (Per VLAN Spanning-Tree).

The Solution

Before doing anything, insure the python virtual environment is setup:

~$ mkdir check_stp_root

~$ cd check_stp_root

~/check_stp_root$ python3 -m venv .venv

~/check_stp_root$ source .venv/bin/activate

(.venv) ~/check_stp_root$ pip install nornir nornir-utils nornir-netmiko textfsm

TextFSM Template

First, I find out the appropriate CLI command to use, for spanning tree on a Cisco device there's a few, but I settled on 'show spanning-tree root'.

Cisco3560-10#show spanning-tree root

Root Hello Max Fwd
MST Instance Root ID Cost Time Age Dly Root Port
---------------- -------------------- --------- ----- --- --- ------------
MST0 0 0053.002c.740a 0 2 20 15 Fa0/8

The interesting part is the Root ID of 0053.002c.740a which is correct but we'll test with a different value to see how the output changes.

Now that we have the expected output, we can create a TextFSM template (called show_spanning-tree_root.template) that identifies the interesting parts and assigns them to values we can use in our script.

Here is what I came up with:

Value VLAN (\S+)
Value Priority (\d+)
Value RootID (\S+)
Value RootCost (\d+)
Value HelloTime (\d+)
Value MaxAge (\d+)
Value FwdDly (\d+)
Value RootPort (\S+)
Start
^${VLAN}\s+${Priority}\s+${RootID}\s+${RootCost}\s+${HelloTime}\s+${MaxAge}\s+${FwdDly}\s+${RootPort} -> Continue
^${VLAN}\s+${Priority}\s+${RootID}\s+${RootCost}\s+${HelloTime}\s+${MaxAge}\s+${FwdDly} -> Record

Generous use of regular expressions (aka regex) like \s+ and \d+ , however still less than what would be used without TextFSM, ie if we tried to parse the output directly in the script. There are plenty of regular expression references around, such as https://www.pythoncheatsheet.org/cheatsheet/regular-expressions .

The desired values are listed with their own regex, and then reference them where they appear in the CLI output. E.g. ${RootID} is located where I could expect it to appear in the output. The -> Continue tells TextFSM to capture the data and to keep going regardless of match, the -> Record tells TextFSM to store the gathered values.

Use https://textfsm.nornir.tech/ to help develop your template to avoid trial and error while writing your script.

It's worth checking https://github.com/networktocode/ntc-templates to see if there is a pre-made template.

Nornir

For this scenario I'll use Nornir's SimpleInventory to provide the details of each switch. I will also use the groups to store the expected root bridge ID. To make things a little more interesting I'll create a separate groups called building1 and building2 to show how you could specify different switches and root bridge IDs for each building.

First, create the config.yaml file (Nornir uses Yet Another Markup Language, YAML formatting for its configuration files)

config.yaml:

---

inventory:

plugin: SimpleInventory

options:

host_file: "hosts.yaml" # Path to your hosts file

group_file: "groups.yaml" # Path to your groups file (optional)

defaults_file: "defaults.yaml" # Path to your defaults file (optional)

runner:

plugin: threaded # Or "serial" for sequential execution

options:

num_workers: 10 # Number of threads to use (adjust as needed)

# Optional: Configure logging

logging:

level: WARNING # Or "INFO", "WARNING", etc.

enabled: True

to_console: True

log_file: "nornir.log"

This tells Nornir that we're using the SimpleInventory plugin and it can find the inventory and associated particulars in the hosts.yaml, groups.yaml, and defaults.yaml files.

It also specifies for it to run threaded (simultaneously) 10 workers, meaning it will work on 10 switches at a time.

A few other settings such as how verbose we would like it to log events and where.

Then create the hosts.yaml:

---
switch1:
hostname: '192.168.90.10'
platform: 'ios'
groups:
- 'access_switches'
- 'building1'
switch2:
hostname: '192.168.90.11'
platform: 'ios'
groups:
- 'access_switches'
- 'building2'

Here we specify our network switches. The names, the IP addresses, the platform (important for Netmiko, in this case Cisco's IOS), and what groups the device is a member of (see groups.yaml below).

groups.yaml:

---

access_switches:

platform: 'ios'

building1:

platform: 'ios'

data:

expected_root_id: '0053.002c.740a'

building2:

platform: 'ios'

data:

expected_root_id: '0053.002c.740a'

Here we specify the groups, I've redundantly specified the platform again, you can set this on the host or group level. The interesting part is the 'expected_root_id' value, you can place anything under data: for reference in your scripts.

defaults.yaml:

---

port: 22 # Or the SSH port for your devices

This file is optional but here we set the port to 22 for SSH access, which is the default anyway. If you were using a non-default port, you will need to specify that here, or you can specify it in the group or host files.

Now we have a TextFSM template and the Nornir configuration completed, we're ready to write the script.

Python

Create new script called check_stp_root.py .

Import the python modules (libraries), self explanatory names:

import textfsm

from getpass import getpass

from nornir import InitNornir

from nornir.core.inventory import Inventory

from nornir.core.task import Result

from nornir_utils.plugins.functions import print_result # For nice output

from nornir_netmiko import netmiko_send_command

from nornir.core.filter import F

In this example I'm doing something a little different, I prompt for the credentials to use to access the switches. You can set these in the Nornir SimpleInventory files (in the hosts, or groups, or defaults) but that isn't very secure so for now I just prompt - to make this script automated I suggest using something like Hashicorp Vault to securely manage the credentials, or perhaps environment variables.

# Prompt for credentials

username = input("Enter your username: ")

password = getpass("Enter your password: ")

Create a function we can use to inject the newly received credentials into the inventory:

# Custom function to load inventory and add credentials

def load_inventory_with_credentials(inventory: Inventory):

for host in inventory.hosts.values():

host.username = username

host.password = password

return inventory

Create a function that will be ran as a task by Nornir. It calls Netmiko to connect to the device, run the command "show spanning-tree root", and run the output through TextFSM which will then return structured data we can use.

def get_stp_root_info(task):

"""Fetches and parses 'show spanning-tree root' using Netmiko."""

result = task.run(

task=netmiko_send_command, command_string="show spanning-tree root"

)

if result.result: # Check if the command was successful

with open('show_spanning-tree_root.template') as f:

table = textfsm.TextFSM(f)

task.host["stp_data"] = table.ParseText(result.result) # Store data

else:

task.host["stp_data"] = [] # Store empty list on error

return Result(host=task.host, result=result.result) # Return result object

Create a function that checks the resulting data from TextFSM for the expected Root Bridge ID.

def check_stp_root(task, expected_root_id):

"""Checks STP root bridge info against a single expected root ID."""

discrepancies = []

stp_data = task.host.get("stp_data", []) # Retrieve parsed data

for vlan_data in stp_data:

vlan = vlan_data[0]

root_id = vlan_data[2]

if root_id != expected_root_id:

discrepancies.append(

f"VLAN/MST {vlan}: Unexpected root bridge: {root_id} (expected: {expected_root_id})"

)

return discrepancies

Create a function that creates a report

def generate_report(agg_result):

"""Generates a consolidated report for all hosts."""

print("Spanning Tree Root Bridge Check Report:\n")

for host, result in agg_result.items():

print(f"Host: {host}")

discrepancies = result.result # Get result from check_stp_root

if discrepancies:

print(" Discrepancies:")

for discrepancy in discrepancies:

print(f" - {discrepancy}")

else:

print(" No discrepancies found.")

print("-" * 30) # Separator between hosts

Then create a Nornir instance, using the function we created earlier to inject the credentials into the inventory:

# Nornir Configuration:

nr = InitNornir(config_file="config.yaml") # Adjust path if needed

# Update inventory with credentials

nr.inventory = load_inventory_with_credentials(nr.inventory)

Then we kick off the task that will check each "building" group of switches for any Root Bridge ID discrepancies, note how we reference the 'expected_root_id' value in Nornir's groups:

# Check STP for each building

for building in ["building1", "building2"]:

print(f"Checking Building: {building}")

# Run get_stp_root_info task on the building group

nr.filter(F(has_parent_group=building)).run(task=get_stp_root_info)

results_checks = nr.filter(F(has_parent_group=building)).run(

task=check_stp_root,

expected_root_id=nr.inventory.groups[building].data["expected_root_id"],

)

generate_report(results_checks)

Running the script:

(.venv) ~/check_stp_root$ python3 check_stp_root.py

If the Root Bridge ID is matched, the output will be:

Enter your username: <username>
Enter your password:
Checking Building: building1
Spanning Tree Root Bridge Check Report:
Host: switch1
No discrepancies found.
------------------------------
Checking Building: building2
Spanning Tree Root Bridge Check Report:
Host: switch2
No discrepancies found.
------------------------------

If I change the Root Bridge ID for building2 from 0053.002c.740a to 0053.002c.741a in the groups.yaml file.

Enter your username: admin
Enter your password:
Checking Building: building1
Spanning Tree Root Bridge Check Report:
Host: switch1
No discrepancies found.
------------------------------
Checking Building: building2
Spanning Tree Root Bridge Check Report:
Host: switch2
Discrepancies:
- VLAN/MST MST0: Unexpected root bridge: 0053.002c.740a (expected: 0053.002c.741a)
------------------------------

As you can see, the script highlights the unexpected root bridge value and shows the expected value.

It's easy to build upon this script - creating additional functions that combine these tools to check and report on the network. Also you can make changes to the devices based upon the checks - eg if the root bridge is wrong, maybe see if the priority is incorrect and change it, then recheck. Perhaps the root bridge ID can be automatically derived instead of relying upon a hard set value.

Conclusion

I've demonstrated how combining TextFSM, Nornir, and Netmiko can effectively automate a network that doesn't support contemporary APIs. If it has a CLI and Netmiko supports it then it can be automated. Although you need to keep an eye out for any changes to the CLI and it's output formatting as that will break your scripts - this is where using APIs is a far better approach as any changes to an API, if they can't be avoided, are generally well documented and communicated.

An alternative to using TextFSM and Netmiko directly is Napalm - Napalm abstracts the above by handling the connection and gathering of data for you. An added benefit is that it provides the same data no matter what device is being accessed - eg a Juniper switch port's details will be presented the same way as a Cisco switch port's details. Therefore, you can write code that can be used on any device that Napalm supports. Although in this particular case, Napalm doesn't have a way to show the spanning-tree information - perhaps it will be supported later.

Quick Summary of a Cisco VRF-Lite Configuration for a Campus Environment

Architecture

Traditional 3 tier architecture - core, distribution, access.

Central firewalls (HA Pair, Active/Passive)
Core switches (VSS pair)
Distribution switches (VSS pairs)

Virtual Routing and Forwarding

Campus is divided up into 6 VRFs (VRF-Lite):

Building Management Systems (BMS) - HVAC, FIPS, CCTV, Emergency Lighting, ECPs etc
Information & Communications Technology (ICT) - Things unique to IT but don't assume secure
General Staff (STAFF) - General business staff
Affiliates (AFFIL) - Guests, 3rd parties
Edge (EDGE) - Printers, Audio/Visual equipment, Voice
Network Management (NETINF) - Switch management

Each VRF has a Route Distinguisher (RD), and Route Targets (RT Import/Export)

See https://packetlife.net/blog/2013/jun/10/route-distinguishers-and-route-targets/ on what these are and how they are used.

In this case, the RD and RT import/export are all the per VRF.

BMS 65535:100
ICT 65535:110
STAFF 65535:120
AFFIL 65535:130
EDGE 65535:140
NETINF 65535:150

Each VRF has its own router process and therefore its own route tables, in the example below, OSPFv2 has been used.

Firewall interfaces (Core switch - Firewall)

These are VLAN interfaces trunked over a LAG between the core switches and the firewalls.

BMS VLAN 3000
ICT VLAN 3010
STAFF VLAN 3020
AFFIL VLAN 3030
EDGE VLAN 3040
NETINF VLAN 3050

Core switch interfaces (Per Building: Core switch - Disitribution Switch)

These are P2P VLANs on a LAG between the core switches and the distribution switches. One per VRF, per building. So the first building gets VLANs 2010, 2100, 2200, 2300, 2400, 2500, the second building gets VLANs 2011, 2101, 2201, 2301, 2401, 2501 and so on.

BMS VLAN 2010 - 2099
ICT VLAN 2100 - 2199
STAFF VLAN 2200 - 2299
AFFIL VLAN 2300 - 2399
EDGE VLAN 2400 - 2499
NETINF VLAN 2500 - 2599

Distribution switch interfaces (one pair per building ie VSS/VLT/MCLAG)

These are the access VLANs. They are what endpoints/clients will be using. I use the same VLANs per each building because the boundary is the distribution switches.

BMS VLAN 1000 - 1009
ICT VLAN 1010 - 1019
STAFF VLAN 1020 - 1029
AFFIL VLAN 1030 - 1039
EDGE VLAN 1040 - 1049
NETINF VLAN 1050 - 1059

IP Schema and Routing

In the examples below I have used a Class A RFC1918 address range and OSPFv2 routing.

Example Core and Distribution Switch VRF-Lite configuration

Using the AFFIL VRF as an example. To create the other VRFs you simply copy the configuration while changing the identifiers/numbers/addresses to suite.

Core Switch

A static default route to the firewall's AFFIL VLAN interface is used.

ip vrf AFFIL

description Affiliates

rd 65535:130

route-target export 65535:130

route-target import 65535:130

ip multicast-routing vrf AFFIL

vlan 3030

name AFFIL_P2P_FW

vlan 2300

name AFFIL_P2P_BldA

interface Loopback130

description Loop Back AFFIL

ip vrf forwarding AFFIL

ip address 10.30.5.1 255.255.255.255

no ip proxy-arp

ip pim sparse-mode

ip ospf 130 area 0

interface Loopback131

description Mulitcast RP AFFIL

ip vrf forwarding AFFIL

ip address 10.30.5.253 255.255.255.255

no ip proxy-arp

ip pim sparse-mode

ip ospf 130 area 0

interface Vlan3030

description Firewall AFFIL

ip vrf forwarding AFFIL

ip address 10.30.2.5 255.255.255.240

no ip redirects

no ip proxy-arp

interface Vlan2300

description Building A AFFIL

ip vrf forwarding AFFIL

ip address 10.30.3.1 255.255.255.252

no ip redirects

no ip proxy-arp

ip ospf 130 area 0

router ospf 130 vrf AFFIL

router-id 10.30.5.1

capability vrf-lite

passive-interface default

no passive-interface Loopback130

no passive-interface Loopback131

no passive-interface Vlan2300

default-information originate always

ip pim vrf AFFIL rp-address 10.30.5.253 override

ip route vrf AFFIL 0.0.0.0 0.0.0.0 10.30.2.3

Associate the appropriate VLANs with the Firewall and the distribution switch interfaces.

Distribution Switch

Layer 3 between core and distribution. Layer 2 between distribution and access. VLANs 1030 - 1032 are the SVIs for the access networks for the building - these will be trunked/tagged to each access switch/stack and associated on each port as appropriate as an access/untagged VLAN.

ip vrf AFFIL

rd 65535:130

route-target export 65535:130

route-target import 65535:130

ip multicast-routing vrf AFFIL

interface Loopback130

description General Management Loop Back AFFIL

ip vrf forwarding AFFIL

ip address 10.30.5.10 255.255.255.255

no ip proxy-arp

ip pim sparse-mode

ip ospf 130 area 0

interface Vlan1030

description AFFIL_VLAN1030

ip vrf forwarding AFFIL

ip address 10.30.100.1 255.255.255.0

interface Vlan1031

description AFFIL_VLAN1031

ip vrf forwarding AFFIL

ip address 10.30.101.1 255.255.255.0

interface Vlan1032

description AFFIL_VLAN1032

ip vrf forwarding AFFIL

ip address 10.30.102.1 255.255.255.0

interface Vlan2300

description AFFIL_P2P_BldA

ip vrf forwarding AFFIL

ip address 10.30.3.2 255.255.255.252

no ip redirects

no ip proxy-arp

ip pim sparse-mode

ip ospf network point-to-point

ip ospf 130 area 0

router ospf 130 vrf AFFIL

router-id 10.30.5.10

redistribute connected subnets

passive-interface default

no passive-interface Loopback130

no passive-interface Vlan1030

ip pim vrf AFFIL rp-address 10.30.5.253

Associate the appropriate VLANs with the core switch interfaces and downstream access switches.

Tuesday, June 18, 2024

Nornir, Netmiko, TextFSM

Introduction

The Issue

The Solution

TextFSM Template

Nornir

Python

Conclusion

Monday, May 06, 2024

Example Campus VRF Configuration

Quick Summary of a Cisco VRF-Lite Configuration for a Campus Environment

Architecture

Virtual Routing and Forwarding

Firewall interfaces (Core switch - Firewall)

Core switch interfaces (Per Building: Core switch - Disitribution Switch)

Distribution switch interfaces (one pair per building ie VSS/VLT/MCLAG)

IP Schema and Routing

Example Core and Distribution Switch VRF-Lite configuration