15. SR Linux applications

15.1. Overview

The SR Linux is a suite of modular, lightweight applications running like any others in a Linux environment. Each SR Linux application supports a different protocol or function, such as BGP, LLDP, AAA, and so on. These applications use gRPC and APIs to communicate with each other and external systems over TCP.

One SR Linux application, the application manager (app_mgr), is itself responsible for monitoring the health of the process IDs running each SR Linux applications, and restarting them if they fail. The application manager reads in application-specific YAML configuration and YANG models, and starts each application (or allows an application not to start if there no configuration exists for it). There is an instance of the app_mgr that handles applications running on the CPM, and an instance of the app_mgr on each IMM that handles applications running on the line card.

In addition to the Nokia-provided SR Linux applications, the SR Linux supports installation of user-defined applications, which are managed and configured in the same way as the default SR Linux applications.

This chapter presents examples of installing an application in SR Linux, managing installed SR Linux applications, and configuring settings for an SR Linux application by modifying its YAML configuration.

15.2. Installing an application

To install an application, copy the application files into the appropriate SR Linux directories, then reload the application manager and start the application.

The example in this section installs an application called fib_agent. The application consists of files named fib_agent.yml, fib_agent.sh, fib_agent.py, and fib_agent.yang. The fib_agent.yml file is installed in the /etc/opt/srlinux/appmgr/ directory. The .yml file for a user-defined application must reside in this directory in order for the app_mgr to read its YAML configuration.

The .yml file defines the locations of the other application files. The other application files can reside anywhere in the system other than in the /opt/srlinux/ directory or any tempfs file system.

In this example, the fib_agent.sh and fib_agent.py files are installed in the directory/user_agents/, and the fib_agent.yang file is installed in the directory/yang/. The locations for these files are defined in the fib_agent.yml file.

  1. Copy the application files into the SR Linux directories.
    cp fib_agent.yml /etc/opt/srlinux/appmgr/.
    cp fib_agent.sh /user_agents/.
    cp fib_agent.py /user_agents/.
    cp fib_agent.yang /yang/.
  2. From the SR Linux CLI, reload the application manager.
    --{ candidate }--[ ]--
    tools system app-management application app_mgr reload
  3. Apply the changes to the configuration.
    --{ candidate }--[  ]--
    fib-agent
    --{ candidate }--[ fib-agent ]--
    commit stay
    All changes have been committed. Starting new transaction.
    --{ candidate }--[ fib-agent ]--
  4. Verify that the application is running.
    show system application fib_agent
      +-----------+-----+---------+-----------------------+--------------------------+
      |  Name     | PID |  State  |        Version        |       Last Change        |
      +===========+=====+=========+=======================+==========================+
      | fib_agent | 227 | running | v19.11.2-162-g678dff8 | 2020-01-13T20:16:45.697Z |
      +-----------+-----+---------+-----------------------+--------------------------+

15.3. Managing applications

15.3.1. Starting an application

To start an SR Linux application instance, use the start option in the tools system app-management command. To terminate a running application instance and restart it, use the restart option.

Examples:

To start an SR Linux application instance:

tools system app-management application mpls_mgr start
/system/app-management/application[name=mpls_mgr]:
    Application 'mpls_mgr' was started

To restart an SR Linux application instance:

tools system app-management application mpls_mgr restart
/system/app-management/application[name=mpls_mgr]:
    Application 'mpls_mgr' was killed with signal 9
/system/app-management/application[name=mpls_mgr]:
    Application 'mpls_mgr' was restarted

15.3.2. Terminating an application

You can use the stop, quit, or kill options in the tools system app-management command to terminate an SR Linux application.

  1. stop: Gracefully terminates the application, allowing it to clean up before exiting.
  2. quit: Terminates the application and generates a core dump. The core dump files are saved in the /var/log/srlinux/cores/ directory.
  3. kill: Terminates the application immediately, without allowing it to clean up before exiting.

Examples:

To terminate an application gracefully:

tools system app-management application mpls_mgr stop
/system/app-management/application[name=mpls_mgr]:
    Application 'mpls_mgr' was killed with signal 15

To terminate an application and generate a core dump:

tools system app-management application mpls_mgr quit
/system/app-management/application[name=mpls_mgr]:
    Application 'mpls_mgr' was killed with signal 3

To terminate an application immediately:

tools system app-management application mpls_mgr kill
/system/app-management/application[name=mpls_mgr]:
    Application 'mpls_mgr' was killed with signal 9

15.3.3. Reloading application configuration

Reloading an application causes the app_mgr to reread the application’s YAML configuration and restart the application using settings in its YAML file.

Example:

To reload the configuration of the app_mgr application:

--{ * running }--[  ]--
tools system app-management application app_mgr reload
--{ * running }--[  ]--

15.3.4. Clearing application statistics

You can display statistics collected for an application with the info from state command. To reset the statistics counters for the application, use the statistics clear option in the tools system app-management command.

Example:

To reset the statistics counters for an application:

tools system app-management application mpls_mgr statistics clear

15.3.5. Restricted operations for applications

An application may have one or more operations that are restricted by default. For example, the linux_mgr application has stop, quit, and kill as restricted operations, meaning that these options are not available when entering the tools system app-management command for the linux_mgr application.

Table 14 lists the restricted operations for each SR Linux application.

Table 14:  Restricted operations for SR Linux applications  

Application

Restricted operations

aaa_mgr

reload

acl_mgr

reload

app_mgr

start, stop, restart, quit, kill

arp_nd_mgr

reload

bfd_mgr

reload

bgp_mgr

reload

chassis_mgr

stop, quit, kill, reload

device_mgr

reload

dhcp_client_mgr

stop, reload

fib_mgr

reload

gnmi_server

reload

idb_server

start, stop, restart, quit, kill, reload

json_rpc_config

reload

linux_mgr

stop, quit, kill

lldp_mgr

reload

log_mgr

reload

mgmt_server

start, stop, quit, kill, reload

mpls_mgr

reload

net_inst_mgr

start, stop, quit, kill, reload

oam_mgr

reload

plcy_mgr

reload

qos_mgr

reload

sdk_mgr

reload

static_route_mgr

reload

supportd

reload

xdp_cpm

stop, quit, kill, reload

xdp_lc

reload

Restricted options are specified in the restricted-operations setting in the YAML file for the application.

15.4. Configuring an application

To configure an SR Linux application, edit settings in the application’s YAML file, then reload the application manager to activate the changes.

The example in this section shows how to configure an application to specify the action the SR Linux device takes when the application fails. If an SR Linux application fails a specified number of times over a specified time period, the SR Linux device can reboot the system or attempt to restart the application after waiting a specified number of seconds.

For example, if the aaa_mgr application crashes 5 times within a 500-second window, the SR Linux device can be configured to wait 100 seconds, then restart the aaa_mgr application.

The following actions can be taken if an SR Linux application fails:

  1. Reboot the system
  2. Wait a specified number of seconds, then attempt to restart the failed application
  3. Move the failed application to error state without rebooting the system or attempting to restart the application

If you stop or restart an application using the tools system app-management command in the SR Linux CLI, it is not considered an application failure; the failure action for the application, if one is configured, does not occur. However, if the failed application waits a specified period of time (or forever) to be restarted, or has been moved into error state, you can restart the application manually with the tools system app-management application restart CLI command.

To configure the failure action for an application:

  1. Check the status of the SR Linux applications:
    show system application
    +-------------+-----+---------+-----------------------+--------------------------+
    |    Name     | PID |  State  |        Version        |       Last Change        |
    +=============+=====+=========+=======================+==========================+
    | aaa_mgr     | 242 | error   | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.529Z |
    | acl_mgr     | 261 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.530Z |
    | app_mgr     | 185 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.585Z |
    | arp_nd_mgr  | 270 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.530Z |
    | bfd_mgr     | 279 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.530Z |
    | bgp_mgr     | 850 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.823Z |
    | chassis_mgr | 288 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.530Z |
    | dev_mgr     | 194 | running |                       | 2019-12-07T00:34:48.803Z |
    | fib_mgr     | 311 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.531Z |
    | gnmi_server | 857 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.826Z |
    | idb_server  | 229 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.033Z |
    | json_rpc    | 864 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.828Z |
    | linux_mgr   | 320 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.531Z |
    | lldp_mgr    | 872 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.838Z |
    | log_mgr     | 330 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.532Z |
    | mgmt_server | 340 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.532Z |
    | mpls_mgr    | 357 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.532Z |
    | net_inst_mgr| 377 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.532Z |
    | oam_mgr     | 392 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.533Z |
    | plcy_mgr    | 401 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.533Z |
    | qos_mgr     | 842 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.750Z |
    | sdk_mgr     | 418 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.533Z |
    | sshd-mgmt   | 107 | running |                       | 2019-12-07T00:34:53.701Z |
    | supportd    | 480 | running |                       | 2019-12-07T00:34:49.534Z |
    | xdp_cpm     | 520 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.534Z |
    | xdp_lc_1    | 539 | running | v19.11.1-291-g4664705 | 2019-12-07T00:34:49.535Z |
    +-------------+-----+---------+-----------------------+--------------------------+
  2. Use the info from state command to check the current failure action settings for the application to configure. These settings are highlighted in the following example:
    info from state system app-management application aaa_mgr
        system {
            app-management {
                application aaa_mgr {
                    pid 242
                    state error
                    last-change 2019-12-07T00:34:49.529Z
                    author Nokia
                    failure-threshold 3
                    failure-window 300
                    failure-action reboot
                    path /opt/srlinux/bin
                    launch-command ./sr_aaa_mgr
                    search-command ./sr_aaa_mgr
                    version v19.11.1-291-g4664705
                    restricted-operations [
                        reload
                    ]
                    statistics {
                        restart-count 0
                    }
                    yang {
                        modules [
                            srl_nokia-aaa
                            srl_nokia-aaa-types
                        ]
                        source-directories [
                            /opt/srlinux/models/ietf
                            /opt/srlinux/models/srl_nokia/models/common
                            /opt/srlinux/models/srl_nokia/models/network-instance
                            /opt/srlinux/models/srl_nokia/models/system
                        ]
                    }
                }
            }
        }
    The following failure action settings can be configured for an application:
    1. failure-threshold: Number of times that the application must fail during the failure-window period before the failure-action is taken; the default is three times.
    2. failure-window: Number of seconds over which the application must fail the failure-threshold number of times before the failure-action is taken; the default is 300 seconds.
    3. failure-action: Action to take if the application fails failure-threshold times over failure-window seconds. This can be one of the following:
      1. reboot: Reboot the system; this is the default failure-action.
      2. wait=seconds: Wait this number of seconds, then attempt to restart the application.
      3. wait=forever: Move the application to error state and do not reboot the system or attempt to restart the application.
  3. Edit the YAML configuration for the application.
    The YAML configuration files for SR Linux applications are located in the directory /opt/srlinux/appmgr on the SR Linux device. They are named sr_application_name_config.yml; for example, sr_aaa_mgr_config.yml.
  4. In the .yml file, add or change the settings for the failure-threshold, failure-window, and failure-action parameters.
    In the following example, the failure action settings in the sr_aaa_mgr_config.yml file are configured so that if the aaa_mgr application fails 5 times over a 500-second period, the SR Linux device waits 100 seconds, then attempts to restart the aaa_mgr application:
    aaa_mgr_setup:
       path: /opt/srlinux/bin
       launch-command: ./aaamgr_set_env.sh
       run-as-user: root
       never-show: Yes
       never-restart: Yes
       start-order: 1
    aaa_mgr:
       path: /opt/srlinux/bin
       launch-command:  ./sr_aaa_mgr
       search-command: ./sr_aaa_mgr
       run-as-user: root
       restricted-operations: ['reload']
       failure-threshold: 5
       failure-window : 500
       failure-action: "wait=100"
       yang-modules:
           names:
               - "srl_nokia-aaa"
               - "srl_nokia-aaa-types"
           source-directories:
               - "/opt/srlinux/models/ietf"
               - "/opt/srlinux/models/srl_nokia/models/common"
               - "/opt/srlinux/models/srl_nokia/models/system"
               - "/opt/srlinux/models/srl_nokia/models/network-instance"
  5. Save and close the .yml configuration file.
  6. In the SR Linux CLI, reload the application manager:
    tools system app-management application app_mgr reload
    This command reloads any application whose .yml configuration file has changed. It does not affect any service.
  7. Use the info from state command to verify that the changes to the failure action settings are now in effect. For example:
    info from state system app-management application aaa_mgr
        system {
            app-management {
                application aaa_mgr {
                    pid 242
                    state running
                    last-change 2019-12-07T00:44:31.403Z
                    author Nokia
                    failure-threshold 5
                    failure-window 500
                    failure-action wait=100
                    path /opt/srlinux/bin
                    launch-command ./sr_aaa_mgr
                    search-command ./sr_aaa_mgr
                    version v19.11.1-291-g4664705
                    restricted-operations [
                        reload
                    ]
                    statistics {
                        restart-count 0
                    }
                    yang {
                        modules [
                            srl_nokia-aaa
                            srl_nokia-aaa-types
                        ]
                        source-directories [
                            /opt/srlinux/models/ietf
                            /opt/srlinux/models/srl_nokia/models/common
                            /opt/srlinux/models/srl_nokia/models/network-instance
                            /opt/srlinux/models/srl_nokia/models/system
                        ]
                    }
                }
            }
        }