Tuesday, June 18, 2024

Nornir, Netmiko, TextFSM

 Introduction

Automation of the physical network remains an elusive goal for many. While cloud networking (both private and public clouds of various flavours) have provided accessible network automation through SDN (Software Defined Networking) the physical network is still largely handled manually using the CLI (Command Line Interface) or via APIs (Application Programming Interfaces). 

Products such as Arista's CloudVision and Cisco's SD-Access offer 'simplified' network automation but these aren't the same as SDN as they're more like orchestration tools that can create/read/update/delete configurations on network devices through APIs, they aren't directly managing the control plane like SDN, they may know the state of the network (eg, routing, spanning-tree etc) but they cannot directly influence it without applying configuration changes to the devices.

With modern network devices you will find APIs such as NETCONF (Network Configuration Protocol)/RESTCONF (Representational State Transfer Configuration Protocol, think HTTP NETCONF), and gRPC (Google Remote Procedure Call). However, there are still many older network devices in service that do not support these modern programming interfaces.

Therefore we're left with using imaginative ways to automate the network using CLI based tools - having to programmatically deal with unstructured data that's designed for humans, not machines. 

In this brief example I will show how we can use Nornir to run tasks against an inventory of network devices, Netmiko to connect to each device and run commands, and TextFSM to parse the unstructured output of the commands.

The Issue

Many processes of a network are distributed and stateful in nature, eg routing protocols, spanning-trees, multicast, etc While you can discern how they will behave based upon the configuration, you won't know for sure until you observe them in operation. In this example I will automate the check to see if the desired spanning-tree root bridge is selected. 

In this scenario, MST (Multiple Spanning-Tree) is used, each switch needs to be checked for its root bridge value, to see if it's the expected value (which in this case is the upstream building distribution switch). This will show up any issues such as a misconfigured switch taking over as the root bridge or some other unexpected behaviour. Note the script will also work with PVST (Per VLAN Spanning-Tree).

The Solution


Before doing anything, insure the python virtual environment is setup:

~$ mkdir check_stp_root
~$ cd check_stp_root
~/check_stp_root$ python3 -m venv .venv
~/check_stp_root$ source .venv/bin/activate
(.venv) ~/check_stp_root$ pip install nornir nornir-utils nornir-netmiko textfsm


TextFSM Template

First, I find out the appropriate CLI command to use, for spanning tree on a Cisco device there's a few, but I settled on 'show spanning-tree root'.

Cisco3560-10#show spanning-tree root

                                        Root    Hello Max Fwd
MST Instance           Root ID          Cost    Time  Age Dly  Root Port
---------------- -------------------- --------- ----- --- ---  ------------
MST0                 0 0053.002c.740a         0    2   20  15  Fa0/8        

The interesting part is the Root ID of 0053.002c.740a which is correct but we'll test with a different value to see how the output changes.

Now that we have the expected output, we can create a TextFSM template (called show_spanning-tree_root.template) that identifies the interesting parts and assigns them to values we can use in our script. 

Here is what I came up with:

Value VLAN (\S+)
Value Priority (\d+)
Value RootID (\S+)
Value RootCost (\d+)
Value HelloTime (\d+)
Value MaxAge (\d+)
Value FwdDly (\d+)
Value RootPort (\S+)
Start
  ^${VLAN}\s+${Priority}\s+${RootID}\s+${RootCost}\s+${HelloTime}\s+${MaxAge}\s+${FwdDly}\s+${RootPort} -> Continue
  ^${VLAN}\s+${Priority}\s+${RootID}\s+${RootCost}\s+${HelloTime}\s+${MaxAge}\s+${FwdDly} -> Record

Generous use of regular expressions (aka regex) like \s+ and \d+ , however still less than what would be used without TextFSM, ie if we tried to parse the output directly in the script. There are plenty of regular expression references around, such as https://www.pythoncheatsheet.org/cheatsheet/regular-expressions . 

The desired values are listed with their own regex, and then reference them where they appear in the CLI output. E.g. ${RootID} is located where I could expect it to appear in the output. The -> Continue tells TextFSM to capture the data and to keep going regardless of match, the -> Record tells TextFSM to store the gathered values.

Use https://textfsm.nornir.tech/ to help develop your template to avoid trial and error while writing your script.



It's worth checking https://github.com/networktocode/ntc-templates to see if there is a pre-made template.

Nornir

For this scenario I'll use Nornir's SimpleInventory to provide the details of each switch. I will also use the groups to store the expected root bridge ID. To make things a little more interesting I'll create a separate groups called building1 and building2 to show how you could specify different switches and root bridge IDs for each building.

First, create the config.yaml file (Nornir uses Yet Another Markup Language, YAML formatting for its configuration files)

config.yaml:
---
inventory:
  plugin: SimpleInventory
  options:
    host_file: "hosts.yaml"  # Path to your hosts file
    group_file: "groups.yaml"  # Path to your groups file (optional)
    defaults_file: "defaults.yaml"  # Path to your defaults file (optional)

runner:
  plugin: threaded  # Or "serial" for sequential execution
  options:
    num_workers: 10  # Number of threads to use (adjust as needed) 

# Optional: Configure logging
logging:
  level: WARNING  # Or "INFO", "WARNING", etc.
  enabled: True
  to_console: True
  log_file: "nornir.log"

This tells Nornir that we're using the SimpleInventory plugin and it can find the inventory and associated particulars in the hosts.yaml, groups.yaml, and defaults.yaml files.

It also specifies for it to run threaded (simultaneously) 10 workers, meaning it will work on 10 switches at a time.

A few other settings such as how verbose we would like it to log events and where.

Then create the hosts.yaml:

--- 
switch1:
  hostname: '192.168.90.10'
  platform: 'ios' 
  groups:
    - 'access_switches'
    - 'building1'
switch2:
  hostname: '192.168.90.11'
  platform: 'ios' 
  groups:
    - 'access_switches'
    - 'building2'

Here we specify our network switches. The names, the IP addresses, the platform (important for Netmiko, in this case Cisco's IOS), and what groups the device is a member of (see groups.yaml below).

groups.yaml:

---
access_switches:
  platform: 'ios'

building1:
  platform: 'ios'
  data:
    expected_root_id: '0053.002c.740a'

building2:
  platform: 'ios'
  data:
    expected_root_id: '0053.002c.740a'

Here we specify the groups, I've redundantly specified the platform again, you can set this on the host or group level. The interesting part is the 'expected_root_id' value, you can place anything under data: for reference in your scripts.

defaults.yaml:

---
port: 22  # Or the SSH port for your devices

This file is optional but here we set the port to 22 for SSH access, which is the default anyway. If you were using a non-default port, you will need to specify that here, or you can specify it in the group or host files.

Now we have a TextFSM template and the Nornir configuration completed, we're ready to write the script.

Python

Create new script called check_stp_root.py .

Import the python modules (libraries), self explanatory names:

import textfsm
from getpass import getpass
from nornir import InitNornir
from nornir.core.inventory import Inventory
from nornir.core.task import Result
from nornir_utils.plugins.functions import print_result  # For nice output
from nornir_netmiko import netmiko_send_command
from nornir.core.filter import F

In this example I'm doing something a little different, I prompt for the credentials to use to access the switches. You can set these in the Nornir SimpleInventory files (in the hosts, or groups, or defaults) but that isn't very secure so for now I just prompt - to make this script automated I suggest using something like Hashicorp Vault to securely manage the credentials, or perhaps environment variables. 

# Prompt for credentials
username = input("Enter your username: ")
password = getpass("Enter your password: ")

Create a function we can use to inject the newly received credentials into the inventory:

# Custom function to load inventory and add credentials
def load_inventory_with_credentials(inventory: Inventory):
    for host in inventory.hosts.values():
        host.username = username
        host.password = password
    return inventory

Create a function that will be ran as a task by Nornir. It calls Netmiko to connect to the device, run the command "show spanning-tree root", and run the output through TextFSM which will then return structured data we can use.

def get_stp_root_info(task):
    """Fetches and parses 'show spanning-tree root' using Netmiko."""
    result = task.run(
        task=netmiko_send_command, command_string="show spanning-tree root"
    )
    if result.result:  # Check if the command was successful
        with open('show_spanning-tree_root.template') as f:
            table = textfsm.TextFSM(f)
            task.host["stp_data"] = table.ParseText(result.result)  # Store data
    else:
        task.host["stp_data"] = []  # Store empty list on error
    return Result(host=task.host, result=result.result)  # Return result object

Create a function that checks the resulting data from TextFSM for the expected Root Bridge ID.

def check_stp_root(task, expected_root_id):
    """Checks STP root bridge info against a single expected root ID."""
    discrepancies = []
    stp_data = task.host.get("stp_data", [])  # Retrieve parsed data

    for vlan_data in stp_data:
        vlan = vlan_data[0]
        root_id = vlan_data[2]

        if root_id != expected_root_id:
            discrepancies.append(
                f"VLAN/MST {vlan}: Unexpected root bridge: {root_id} (expected: {expected_root_id})"
            )

    return discrepancies

Create a function that creates a report

def generate_report(agg_result):
    """Generates a consolidated report for all hosts."""
    print("Spanning Tree Root Bridge Check Report:\n")
    for host, result in agg_result.items():
        print(f"Host: {host}")

        discrepancies = result.result  # Get result from check_stp_root
        if discrepancies:
            print("  Discrepancies:")
            for discrepancy in discrepancies:
                print(f"    - {discrepancy}")
        else:
            print("  No discrepancies found.")

    print("-" * 30)  # Separator between hosts

Then create a Nornir instance, using the function we created earlier to inject the credentials into the inventory:

# Nornir Configuration:

nr = InitNornir(config_file="config.yaml")  # Adjust path if needed

# Update inventory with credentials
nr.inventory = load_inventory_with_credentials(nr.inventory)

Then we kick off the task that will check each "building" group of switches for any Root Bridge ID discrepancies, note how we reference the 'expected_root_id' value in Nornir's groups:

# Check STP for each building
for building in ["building1", "building2"]:
    print(f"Checking Building: {building}")
    # Run get_stp_root_info task on the building group
    nr.filter(F(has_parent_group=building)).run(task=get_stp_root_info)
    
    results_checks = nr.filter(F(has_parent_group=building)).run(
        task=check_stp_root,
        expected_root_id=nr.inventory.groups[building].data["expected_root_id"],
    )
    generate_report(results_checks) 

Running the script:

(.venv) ~/check_stp_root$ python3 check_stp_root.py

If the Root Bridge ID is matched, the output will be:

Enter your username: <username>
Enter your password: 
Checking Building: building1
Spanning Tree Root Bridge Check Report:
Host: switch1
  No discrepancies found.
------------------------------
Checking Building: building2
Spanning Tree Root Bridge Check Report:
Host: switch2
  No discrepancies found.
------------------------------

If I change the Root Bridge ID for building2 from 0053.002c.740a to 0053.002c.741a in the groups.yaml file.

Enter your username: admin
Enter your password: 
Checking Building: building1
Spanning Tree Root Bridge Check Report:
Host: switch1
  No discrepancies found.
------------------------------
Checking Building: building2
Spanning Tree Root Bridge Check Report:
Host: switch2
  Discrepancies:
    - VLAN/MST MST0: Unexpected root bridge: 
0053.002c.740a (expected: 0053.002c.741a)
------------------------------

As you can see, the script highlights the unexpected root bridge value and shows the expected value.


It's easy to build upon this script - creating additional functions that combine these tools to check and report on the network. Also you can make changes to the devices based upon the checks - eg if the root bridge is wrong, maybe see if the priority is incorrect and change it, then recheck. Perhaps the root bridge ID can be automatically derived instead of relying upon a hard set value.

Conclusion


I've demonstrated how combining TextFSM, Nornir, and Netmiko can effectively automate a network that doesn't support contemporary APIs. If it has a CLI and Netmiko supports it then it can be automated. Although you need to keep an eye out for any changes to the CLI and it's output formatting as that will break your scripts - this is where using APIs is a far better approach as any changes to an API, if they can't be avoided, are generally well documented and communicated.

An alternative to using TextFSM and Netmiko directly is Napalm - Napalm abstracts the above by handling the connection and gathering of data for you. An added benefit is that it provides the same data no matter what device is being accessed - eg a Juniper switch port's details will be presented the same way as a Cisco switch port's details. Therefore, you can write code that can be used on any device that Napalm supports. Although in this particular case, Napalm doesn't have a way to show the spanning-tree information - perhaps it will be supported later.

No comments: