Tuesday, June 18, 2024

Nornir, Netmiko, TextFSM

 Introduction

Automation of the physical network remains an elusive goal for many. While cloud networking (both private and public clouds of various flavours) have provided accessible network automation through SDN (Software Defined Networking) the physical network is still largely handled manually using the CLI (Command Line Interface) or via APIs (Application Programming Interfaces). 

Products such as Arista's CloudVision and Cisco's SD-Access offer 'simplified' network automation but these aren't the same as SDN as they're more like orchestration tools that can create/read/update/delete configurations on network devices through APIs, they aren't directly managing the control plane like SDN, they may know the state of the network (eg, routing, spanning-tree etc) but they cannot directly influence it without applying configuration changes to the devices.

With modern network devices you will find APIs such as NETCONF (Network Configuration Protocol)/RESTCONF (Representational State Transfer Configuration Protocol, think HTTP NETCONF), and gRPC (Google Remote Procedure Call). However, there are still many older network devices in service that do not support these modern programming interfaces.

Therefore we're left with using imaginative ways to automate the network using CLI based tools - having to programmatically deal with unstructured data that's designed for humans, not machines. 

In this brief example I will show how we can use Nornir to run tasks against an inventory of network devices, Netmiko to connect to each device and run commands, and TextFSM to parse the unstructured output of the commands.

The Issue

Many processes of a network are distributed and stateful in nature, eg routing protocols, spanning-trees, multicast, etc While you can discern how they will behave based upon the configuration, you won't know for sure until you observe them in operation. In this example I will automate the check to see if the desired spanning-tree root bridge is selected. 

In this scenario, MST (Multiple Spanning-Tree) is used, each switch needs to be checked for its root bridge value, to see if it's the expected value (which in this case is the upstream building distribution switch). This will show up any issues such as a misconfigured switch taking over as the root bridge or some other unexpected behaviour. Note the script will also work with PVST (Per VLAN Spanning-Tree).

The Solution


Before doing anything, insure the python virtual environment is setup:

~$ mkdir check_stp_root
~$ cd check_stp_root
~/check_stp_root$ python3 -m venv .venv
~/check_stp_root$ source .venv/bin/activate
(.venv) ~/check_stp_root$ pip install nornir nornir-utils nornir-netmiko textfsm


TextFSM Template

First, I find out the appropriate CLI command to use, for spanning tree on a Cisco device there's a few, but I settled on 'show spanning-tree root'.

Cisco3560-10#show spanning-tree root

                                        Root    Hello Max Fwd
MST Instance           Root ID          Cost    Time  Age Dly  Root Port
---------------- -------------------- --------- ----- --- ---  ------------
MST0                 0 0053.002c.740a         0    2   20  15  Fa0/8        

The interesting part is the Root ID of 0053.002c.740a which is correct but we'll test with a different value to see how the output changes.

Now that we have the expected output, we can create a TextFSM template (called show_spanning-tree_root.template) that identifies the interesting parts and assigns them to values we can use in our script. 

Here is what I came up with:

Value VLAN (\S+)
Value Priority (\d+)
Value RootID (\S+)
Value RootCost (\d+)
Value HelloTime (\d+)
Value MaxAge (\d+)
Value FwdDly (\d+)
Value RootPort (\S+)
Start
  ^${VLAN}\s+${Priority}\s+${RootID}\s+${RootCost}\s+${HelloTime}\s+${MaxAge}\s+${FwdDly}\s+${RootPort} -> Continue
  ^${VLAN}\s+${Priority}\s+${RootID}\s+${RootCost}\s+${HelloTime}\s+${MaxAge}\s+${FwdDly} -> Record

Generous use of regular expressions (aka regex) like \s+ and \d+ , however still less than what would be used without TextFSM, ie if we tried to parse the output directly in the script. There are plenty of regular expression references around, such as https://www.pythoncheatsheet.org/cheatsheet/regular-expressions . 

The desired values are listed with their own regex, and then reference them where they appear in the CLI output. E.g. ${RootID} is located where I could expect it to appear in the output. The -> Continue tells TextFSM to capture the data and to keep going regardless of match, the -> Record tells TextFSM to store the gathered values.

Use https://textfsm.nornir.tech/ to help develop your template to avoid trial and error while writing your script.



It's worth checking https://github.com/networktocode/ntc-templates to see if there is a pre-made template.

Nornir

For this scenario I'll use Nornir's SimpleInventory to provide the details of each switch. I will also use the groups to store the expected root bridge ID. To make things a little more interesting I'll create a separate groups called building1 and building2 to show how you could specify different switches and root bridge IDs for each building.

First, create the config.yaml file (Nornir uses Yet Another Markup Language, YAML formatting for its configuration files)

config.yaml:
---
inventory:
  plugin: SimpleInventory
  options:
    host_file: "hosts.yaml"  # Path to your hosts file
    group_file: "groups.yaml"  # Path to your groups file (optional)
    defaults_file: "defaults.yaml"  # Path to your defaults file (optional)

runner:
  plugin: threaded  # Or "serial" for sequential execution
  options:
    num_workers: 10  # Number of threads to use (adjust as needed) 

# Optional: Configure logging
logging:
  level: WARNING  # Or "INFO", "WARNING", etc.
  enabled: True
  to_console: True
  log_file: "nornir.log"

This tells Nornir that we're using the SimpleInventory plugin and it can find the inventory and associated particulars in the hosts.yaml, groups.yaml, and defaults.yaml files.

It also specifies for it to run threaded (simultaneously) 10 workers, meaning it will work on 10 switches at a time.

A few other settings such as how verbose we would like it to log events and where.

Then create the hosts.yaml:

--- 
switch1:
  hostname: '192.168.90.10'
  platform: 'ios' 
  groups:
    - 'access_switches'
    - 'building1'
switch2:
  hostname: '192.168.90.11'
  platform: 'ios' 
  groups:
    - 'access_switches'
    - 'building2'

Here we specify our network switches. The names, the IP addresses, the platform (important for Netmiko, in this case Cisco's IOS), and what groups the device is a member of (see groups.yaml below).

groups.yaml:

---
access_switches:
  platform: 'ios'

building1:
  platform: 'ios'
  data:
    expected_root_id: '0053.002c.740a'

building2:
  platform: 'ios'
  data:
    expected_root_id: '0053.002c.740a'

Here we specify the groups, I've redundantly specified the platform again, you can set this on the host or group level. The interesting part is the 'expected_root_id' value, you can place anything under data: for reference in your scripts.

defaults.yaml:

---
port: 22  # Or the SSH port for your devices

This file is optional but here we set the port to 22 for SSH access, which is the default anyway. If you were using a non-default port, you will need to specify that here, or you can specify it in the group or host files.

Now we have a TextFSM template and the Nornir configuration completed, we're ready to write the script.

Python

Create new script called check_stp_root.py .

Import the python modules (libraries), self explanatory names:

import textfsm
from getpass import getpass
from nornir import InitNornir
from nornir.core.inventory import Inventory
from nornir.core.task import Result
from nornir_utils.plugins.functions import print_result  # For nice output
from nornir_netmiko import netmiko_send_command
from nornir.core.filter import F

In this example I'm doing something a little different, I prompt for the credentials to use to access the switches. You can set these in the Nornir SimpleInventory files (in the hosts, or groups, or defaults) but that isn't very secure so for now I just prompt - to make this script automated I suggest using something like Hashicorp Vault to securely manage the credentials, or perhaps environment variables. 

# Prompt for credentials
username = input("Enter your username: ")
password = getpass("Enter your password: ")

Create a function we can use to inject the newly received credentials into the inventory:

# Custom function to load inventory and add credentials
def load_inventory_with_credentials(inventory: Inventory):
    for host in inventory.hosts.values():
        host.username = username
        host.password = password
    return inventory

Create a function that will be ran as a task by Nornir. It calls Netmiko to connect to the device, run the command "show spanning-tree root", and run the output through TextFSM which will then return structured data we can use.

def get_stp_root_info(task):
    """Fetches and parses 'show spanning-tree root' using Netmiko."""
    result = task.run(
        task=netmiko_send_command, command_string="show spanning-tree root"
    )
    if result.result:  # Check if the command was successful
        with open('show_spanning-tree_root.template') as f:
            table = textfsm.TextFSM(f)
            task.host["stp_data"] = table.ParseText(result.result)  # Store data
    else:
        task.host["stp_data"] = []  # Store empty list on error
    return Result(host=task.host, result=result.result)  # Return result object

Create a function that checks the resulting data from TextFSM for the expected Root Bridge ID.

def check_stp_root(task, expected_root_id):
    """Checks STP root bridge info against a single expected root ID."""
    discrepancies = []
    stp_data = task.host.get("stp_data", [])  # Retrieve parsed data

    for vlan_data in stp_data:
        vlan = vlan_data[0]
        root_id = vlan_data[2]

        if root_id != expected_root_id:
            discrepancies.append(
                f"VLAN/MST {vlan}: Unexpected root bridge: {root_id} (expected: {expected_root_id})"
            )

    return discrepancies

Create a function that creates a report

def generate_report(agg_result):
    """Generates a consolidated report for all hosts."""
    print("Spanning Tree Root Bridge Check Report:\n")
    for host, result in agg_result.items():
        print(f"Host: {host}")

        discrepancies = result.result  # Get result from check_stp_root
        if discrepancies:
            print("  Discrepancies:")
            for discrepancy in discrepancies:
                print(f"    - {discrepancy}")
        else:
            print("  No discrepancies found.")

    print("-" * 30)  # Separator between hosts

Then create a Nornir instance, using the function we created earlier to inject the credentials into the inventory:

# Nornir Configuration:

nr = InitNornir(config_file="config.yaml")  # Adjust path if needed

# Update inventory with credentials
nr.inventory = load_inventory_with_credentials(nr.inventory)

Then we kick off the task that will check each "building" group of switches for any Root Bridge ID discrepancies, note how we reference the 'expected_root_id' value in Nornir's groups:

# Check STP for each building
for building in ["building1", "building2"]:
    print(f"Checking Building: {building}")
    # Run get_stp_root_info task on the building group
    nr.filter(F(has_parent_group=building)).run(task=get_stp_root_info)
    
    results_checks = nr.filter(F(has_parent_group=building)).run(
        task=check_stp_root,
        expected_root_id=nr.inventory.groups[building].data["expected_root_id"],
    )
    generate_report(results_checks) 

Running the script:

(.venv) ~/check_stp_root$ python3 check_stp_root.py

If the Root Bridge ID is matched, the output will be:

Enter your username: <username>
Enter your password: 
Checking Building: building1
Spanning Tree Root Bridge Check Report:
Host: switch1
  No discrepancies found.
------------------------------
Checking Building: building2
Spanning Tree Root Bridge Check Report:
Host: switch2
  No discrepancies found.
------------------------------

If I change the Root Bridge ID for building2 from 0053.002c.740a to 0053.002c.741a in the groups.yaml file.

Enter your username: admin
Enter your password: 
Checking Building: building1
Spanning Tree Root Bridge Check Report:
Host: switch1
  No discrepancies found.
------------------------------
Checking Building: building2
Spanning Tree Root Bridge Check Report:
Host: switch2
  Discrepancies:
    - VLAN/MST MST0: Unexpected root bridge: 
0053.002c.740a (expected: 0053.002c.741a)
------------------------------

As you can see, the script highlights the unexpected root bridge value and shows the expected value.


It's easy to build upon this script - creating additional functions that combine these tools to check and report on the network. Also you can make changes to the devices based upon the checks - eg if the root bridge is wrong, maybe see if the priority is incorrect and change it, then recheck. Perhaps the root bridge ID can be automatically derived instead of relying upon a hard set value.

Conclusion


I've demonstrated how combining TextFSM, Nornir, and Netmiko can effectively automate a network that doesn't support contemporary APIs. If it has a CLI and Netmiko supports it then it can be automated. Although you need to keep an eye out for any changes to the CLI and it's output formatting as that will break your scripts - this is where using APIs is a far better approach as any changes to an API, if they can't be avoided, are generally well documented and communicated.

An alternative to using TextFSM and Netmiko directly is Napalm - Napalm abstracts the above by handling the connection and gathering of data for you. An added benefit is that it provides the same data no matter what device is being accessed - eg a Juniper switch port's details will be presented the same way as a Cisco switch port's details. Therefore, you can write code that can be used on any device that Napalm supports. Although in this particular case, Napalm doesn't have a way to show the spanning-tree information - perhaps it will be supported later.

Monday, May 06, 2024

Example Campus VRF Configuration

Quick Summary of a Cisco VRF-Lite Configuration for a Campus Environment

Architecture

Traditional 3 tier architecture - core, distribution, access.

  • Central firewalls (HA Pair, Active/Passive)
  • Core switches (VSS pair)
  • Distribution switches (VSS pairs)


Virtual Routing and Forwarding

Campus is divided up into 6 VRFs (VRF-Lite):

  • Building Management Systems (BMS) - HVAC, FIPS, CCTV, Emergency Lighting, ECPs etc
  • Information & Communications Technology (ICT) -  Things unique to IT but don't assume secure
  • General Staff (STAFF) - General business staff
  • Affiliates (AFFIL) - Guests, 3rd parties
  • Edge (EDGE) - Printers, Audio/Visual equipment, Voice
  • Network Management (NETINF) - Switch management

Each VRF has a Route Distinguisher (RD), and Route Targets (RT Import/Export)


In this case, the RD and RT import/export are all the per VRF.
  • BMS 65535:100
  • ICT 65535:110
  • STAFF 65535:120
  • AFFIL 65535:130
  • EDGE 65535:140
  • NETINF 65535:150
Each VRF has its own router process and therefore its own route tables, in the example below, OSPFv2 has been used.


Firewall interfaces (Core switch - Firewall)

These are VLAN interfaces trunked over a LAG between the core switches and the firewalls.

  • BMS VLAN 3000
  • ICT VLAN 3010
  • STAFF VLAN 3020
  • AFFIL VLAN 3030
  • EDGE VLAN 3040
  • NETINF VLAN 3050


Core switch interfaces (Per Building: Core switch - Disitribution Switch)

These are P2P VLANs on a LAG between the core switches and the distribution switches. One per VRF, per building. So the first building gets VLANs 2010, 2100, 2200, 2300, 2400, 2500, the second building gets VLANs 2011, 2101, 2201, 2301, 2401, 2501 and so on.

  • BMS VLAN 2010 - 2099
  • ICT VLAN 2100 - 2199
  • STAFF VLAN 2200 - 2299
  • AFFIL VLAN 2300 - 2399
  • EDGE VLAN 2400 - 2499
  • NETINF VLAN 2500 - 2599


Distribution switch interfaces (one pair per building ie VSS/VLT/MCLAG)

These are the access VLANs. They are what endpoints/clients will be using. I use the same VLANs per each building because the boundary is the distribution switches.

  • BMS VLAN 1000 - 1009
  • ICT VLAN 1010 - 1019
  • STAFF VLAN 1020 - 1029
  • AFFIL VLAN 1030 - 1039
  • EDGE VLAN 1040 - 1049
  • NETINF VLAN 1050 - 1059


IP Schema and Routing

In the examples below I have used a Class A RFC1918 address range and OSPFv2 routing.


Example Core and Distribution Switch VRF-Lite configuration


Using the AFFIL VRF as an example. To create the other VRFs you simply copy the configuration while changing the identifiers/numbers/addresses to suite.





Core Switch

A static default route to the firewall's AFFIL VLAN interface is used.

ip vrf AFFIL
 description Affiliates
 rd 65535:130
 route-target export 65535:130
 route-target import 65535:130

ip multicast-routing vrf AFFIL 

vlan 3030
 name AFFIL_P2P_FW

vlan 2300
 name AFFIL_P2P_BldA

interface Loopback130
 description Loop Back AFFIL
 ip vrf forwarding AFFIL
 ip address 10.30.5.1 255.255.255.255
 no ip proxy-arp
 ip pim sparse-mode
 ip ospf 130 area 0

interface Loopback131
 description Mulitcast RP AFFIL
 ip vrf forwarding AFFIL
 ip address 10.30.5.253 255.255.255.255
 no ip proxy-arp
 ip pim sparse-mode
 ip ospf 130 area 0

interface Vlan3030
 description Firewall AFFIL
 ip vrf forwarding AFFIL
 ip address 10.30.2.5 255.255.255.240
 no ip redirects
 no ip proxy-arp
!
interface Vlan2300
 description Building A AFFIL
 ip vrf forwarding AFFIL
 ip address 10.30.3.1 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip ospf 130 area 0

 router ospf 130 vrf AFFIL
 router-id 10.30.5.1
 capability vrf-lite
 passive-interface default
 no passive-interface Loopback130
 no passive-interface Loopback131
 no passive-interface Vlan2300
 default-information originate always

ip pim vrf AFFIL rp-address 10.30.5.253 override

ip route vrf AFFIL 0.0.0.0 0.0.0.0 10.30.2.3

Associate the appropriate VLANs with the Firewall and the distribution switch interfaces.


Distribution Switch

Layer 3 between core and distribution. Layer 2 between distribution and access. VLANs 1030 - 1032 are the SVIs for the access networks for the building - these will be trunked/tagged to each access switch/stack and associated on each port as appropriate as an access/untagged VLAN.

ip vrf AFFIL
 rd 65535:130
 route-target export 65535:130
 route-target import 65535:130

ip multicast-routing vrf AFFIL 

interface Loopback130
 description General Management Loop Back AFFIL
 ip vrf forwarding AFFIL
 ip address 10.30.5.10 255.255.255.255
 no ip proxy-arp
 ip pim sparse-mode
 ip ospf 130 area 0

interface Vlan1030
 description AFFIL_VLAN1030
 ip vrf forwarding AFFIL
 ip address 10.30.100.1 255.255.255.0

interface Vlan1031
 description AFFIL_VLAN1031
 ip vrf forwarding AFFIL
 ip address 10.30.101.1 255.255.255.0

interface Vlan1032
 description AFFIL_VLAN1032
 ip vrf forwarding AFFIL
 ip address 10.30.102.1 255.255.255.0

interface Vlan2300
 description AFFIL_P2P_BldA
 ip vrf forwarding AFFIL
 ip address 10.30.3.2 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip pim sparse-mode
 ip ospf network point-to-point
 ip ospf 130 area 0

router ospf 130 vrf AFFIL
 router-id 10.30.5.10
 redistribute connected subnets
 passive-interface default
 no passive-interface Loopback130
 no passive-interface Vlan1030

ip pim vrf AFFIL rp-address 10.30.5.253

Associate the appropriate VLANs with the core switch interfaces and downstream access switches.