Automating Network Devices with Ansible

[Dieser Artikel ist auch auf Deutsch erschienen.]

A while ago we told our German customers that we were knee-deep into replacing parts of our network infrastructure. Trying to keep up with today’s challenges, we decided to work with Ansible and Juniper to implement a fully automated network configuration. While we were looking for information that would help us realize this concept we encountered a noticeable lack of online help resources. This is our attempt to improve the situation for those interested in a similar setup.

The Basics – Git and Ansible

The former does not need much consideration – any project even slightly complex should use a version control system, be it git or something else. But why Ansible? We at sipgate have been using Puppet as well as Ansible for quite some years now. While Puppet needs an agent on the target device (which is supported by Juniper) and ideally a Puppet master setup, Ansible and some additional libraries need to be set up on every workstation (or on a central management server).

While puppet has been managing our servers’ base config for some years (authentication, logging, packet sources etc.), Ansible took care of everything ‘above’ that layer: LDAP or DNS servers, load balancers and all internal services which we deploy multiple times per day. Due to the easier, agent-less implementation and easy to read roles and playbooks we decided to go with Ansible.

Ansible – now what?

Ansible’s main advantage are the many available modules. Since version 2.0 especially many modules have been added to the network section, including several for Juniper Devices. At the same time Juniper has also published a set of modules which complement and/or partially replace the official ones. Up to now we have only been using official modules as it might ease the process of updating to a new Ansible release in the future. However, we would also choose Juniper modules if they offered anything we require that could not be achieved with the upstream modules. As we mostly work with Debian/Ubuntu based systems, here are the few steps to get the official Ansible modules up and running (using Ansible 2.2):

apt-get install python-pip libxml2-dev libffi-dev python-dev libxslt1-dev libssl-dev
pip install junos-eznc jxmlease

On the Juniper device you need to enable netconf over ssh to get things working:

set system services ssh 
set system services netconf ssh

Password-less authentication via SSH public key is possible. However, since we use Radius-based authentication, we rely on username/password. This is an example playbook, which asks for username and password, saves it in a Dictionary named netconf which can be used to configure the Juniper modules. You will see an example for its usage in the next section.

- hosts: core_switches
  serial: 1
  connection: local
  gather_facts: False
  vars_prompt:
    - name: "netconf_user"
      prompt: "Netconf login user"
      default: "root"
      private: no
    - name: "netconf_password"
      prompt: "Netconf login password"
      private: yes 
  vars:
    netconf:
      host: "{{ inventory_hostname }}"
      username: "{{ netconf_user }}"
      password: "{{ netconf_password }}"
      timeout: 30
  roles:
    - role: base_setup
    - role: core_setup

Templates, Templates everywhere!

One of Ansible’s strengths is the use of powerful templates. The official Ansible modules do not support them directly, so we need a simple workaround:

  - name: generate dns configuration
    template: src=dns.j2 dest=/tmp/junos_config_deploy/{{ inventory_hostname }}/dns.conf
    changed_when: false

  - name: install dns configuration
    junos_config:
      src: /tmp/junos_config_deploy/{{ inventory_hostname }}/dns.conf
      replace: yes 
      src_format: text
      provider: "{{ netconf }}"

At first we render our template locally into a static file. After that, we can take the file and load it onto the device using the junos_config module. By default, Junos would merge this template into the existing configuration, leading to a rather undefined state. Luckily, Junos supports a ‘replace’ syntax which allows us to specify the keyword replace: within the configuration tree. All elements marked that way will replace any existing element with the same name (and all of its sub-elements). To pick up our last example, this would be the corresponding template:

system {
    replace:
    name-server {
{% for ip in dns_ips %}
        {{ ip }};
{% endfor %}
    }
    host-name {{ inventory_hostname_short }};
    domain-name some.domain.here;
}

As you can see, this is a rather short template. We have split our configuration into smaller parts, which makes it important to ensure that only one template takes ‘ownership’ of the same configuration tree (via replace:). The split brings several advantages over one large template: It is easier to locate the source of an error (‘Did an important interface configuration fail or did I just mess up the nameserver configuration?’), we can use Ansible tags to selectively deploy only relevant parts of the configuration and, of course, short templates are easier to read.

One notable disadvantage is the playbook’s runtime cost: each junos_config run finishes with a commit which – depending on the device – might take a few seconds or an eternity. And: you do not want to wait an eternity * thirty-something devices.

Block to the rescue!

Ansible has been extended to support some sort of try..catch exception handling. They just changed it to block..rescue because reasons (seriously, we have no clue why they did that). In any case, the following template for an Ansible role makes debugging way easier:

- block:
  - name: remove config preparation folder
    file: path=/tmp/junos_config_deploy/{{ inventory_hostname }} state=absent
    changed_when: False
  - name: generate config preparation folder
    file: path=/tmp/junos_config_deploy/{{ inventory_hostname }} mode=0700 state=directory
    changed_when: False

  [...]
  template/junos_config Tasks
  [...]

  - name: remove config preparation folder
    file: path=/tmp/junos_config_deploy/{{ inventory_hostname }} state=absent
    changed_when: False
    tags: syslog

  rescue:
    - debug: msg="configuring the switch failed. you can find the generated configs in /tmp/junos_config_deploy/{{ inventory_hostname }}/*.conf and try yourself"
    - debug: msg="scp the file to the switch and execute 'load replace <filename>' + 'commit' in conf mode"
    - fail: msg="stopping the playbook run"

First we make sure that our local template folder does not already exist from a previous (failed?) Ansible run. Secondly we make sure that a fresh folder is created for this run and then start generating config parts and applying them with junos_config. These tasks will be surrounded with a block statement. When any of the tasks inside the block fails, the tasks in the rescue section will be triggered, leaving the person running the playbook with some useful hints on how to debug the error. Unfortunately the error messages produced by the junos_config module are not always that helpful, so sometimes it helps to upload the generated template to the target device and apply it manually.

We hope this gave you a first impression of automating network configurations with Ansible. Stay tuned, there is more to come!

Automating Network Devices with Ansible

The Basics – Git and Ansible

Ansible – now what?

Templates, Templates everywhere!

Block to the rescue!

Keine Kommentare

Schreibe einen Kommentar Antworten abbrechen