7. Telemetry

7.1. Telemetry Overview

Telemetry is a network monitoring and fault management framework. Telemetry is driven by the need to use fresh data obtained from the network to make fast networking decisions such as traffic optimization and preventive troubleshooting.

Unlike legacy monitoring platforms such as SNMP, telemetry does not only rely on continuously pulling data from the network devices. Instead, network devices push and stream data (such as statistics) continuously to data collectors, based on subscriptions. The data collectors can then filter, analyze, store, and make decisions using the collected data from the network devices. Figure 21 shows a telemetry application.

Figure 21:  Telemetry Application 

7.2. Telemetry in SR OS

Telemetry uses the proprietary Nokia SR OS YANG data models to stream data that is encoded as protocol buffers into a language-neutral, platform-independent, extendible mechanism for serializing structured data. Remote Procedure Call (RPC) is the generic transport protocol used to subscribe to the SR OS device and receive streamed telemetry data

7.2.1. gRPC in Telemetry

The gRPC transport service uses HTTP/2 bidirectional streaming between the gRPC client (the data collector) and the gRPC server (the SR OS device). A gRPC session is a single connection from the gRPC client to the gRPC server over the TCP/TLS port. A gRPC session can be used by:

  1. a gRPC client to send a telemetry subscription request to the gRPC server
  2. a gRPC server to send asynchronous telemetry data to the gRPC collector

Each RPC within a session is a gRPC channel.

The gRPC version supported on the SR OS gRPC server is 1.0.1.

The SR OS gRPC encryption and authentication follows the basic conventions described in the OpenConfig gnmi-authentication.md published on github.com (version 0.1.0 from Oct 5, 2016).

TLS encryption is used for added security. The following summarizes the process of encryption and authentication:

  1. TLS encryption
    1. The gRPC session must be in an encrypted state before it can be used.
    2. If the gRPC client and gRPC server are unable to negotiate an encrypted gRPC session, the gRPC session fails and the gRPC server sends an error.
    3. Fallback from an encrypted to an unencrypted gRPC session is not allowed.
    For information about how to configure TLS with gRPC, see the TLS chapter.
  2. SR OS device authentication
    1. The gRPC clients do not share gRPC sessions. Each gRPC client initially starts a separate gRPC session.
    2. When a gRPC session is established, the gRPC server certificates are verified by the gRPC client to ensure that every gRPC server is authenticated by the gRPC client.
    3. If gRPC is shutdown on the gRPC server, and a gRPC client is trying to establish a gRPC session, the gRPC client will get an error for every sent RPC.
    4. If gRPC is shutdown on the gRPC server, and a gRPC session is established, all active RPCs are gracefully terminated, and an error is returned for every active RPC.
  3. User authentication
    1. Each RPC sent by the gRPC client carries a username and password.
    2. For the first RPC in the gRPC session, the gRPC server tries to authenticate the user via the specified authentication order, such as by the local user database, RADIUS, or TACACS+.
      For example, if TACACS+ is first in the authentication order, the gRPC server sends a request to the TACACs+ server to authenticate the gRPC user.
    3. For the subsequent RPCs on that same authenticated gRPC session, the username and password are reauthenticated only if changed.
    4. When there is no username and password provided with the RPC, the gRPC server returns an error.
    5. If the RPC user is changed, any active subscriber RPCs on that same gRPC session are terminated by the gRPC server.
    6. If the RPC password is changed, the active gRPC session will continue to exist until a different username and password is sent in a subsequent RPC, or the gRPC session is terminated.
    7. Each telemetry message is carried over a gRPC session that was previously encrypted; the session is not reencrypted.

Figure 22 shows the telemetry protocol stack.

Figure 22:  Protocol Stack 

The gRPC service runs on port 57400 by default in SR OS. The service is not configurable.

A single gRPC server supports concurrent gRPC sessions and channels.

  1. There is a maximum of eight concurrent gRPC sessions for all of the gRPC clients.
  2. There is a maximum of 225 concurrent gRPC channels for all of the gRPC clients. Since each RPC is a unique channel, the maximum number of subscriptions for all the gRPC clients on a single SR OS device is 225.

Disconnecting a gRPC channel disconnects an active telemetry subscription. A gRPC session that is used by the disconnected subscription should not be terminated. Terminating the entire gRPC session disconnects all active telemetry subscriptions on the terminated gRPC session.

A telemetry subscription can be administratively terminated from the CLI. An active gRPC session that is used by the terminated subscription is not terminated. See gRPC Command Reference for command details.

Figure 23 shows a gRPC service using the TLS architecture.

Figure 23:  gRPC Using TLS Architecture 

7.2.2. Operations Layer

This section summarizes support for subscription requests and subscription responses.

SR OS telemetry follows the OpenConfig gnmi.proto published on github.com (version 0.4.0). This model defines the relationship and behavior between the gRPC client and server.

SR OS telemetry follows the basic conventions described in the OpenConfig gnmi-specification.md published on github.com (version 0.2.2 from March 7th, 2017).

A subscription is initiated from the gRPC client by sending a "Subscribe" RPC that contains a "SubscribeRequest" message to the gRPC server. A “prefix” can be specified to be used with all paths specified in the "SubscribeRequest". If a "prefix" is present, then it is logically appended to the start of every "path" to provide a full path.

A subscription contains:

  1. a path list of one or more paths. The following conditions apply.
    1. A path represents the data tree as a series of repeated strings and elements. Each element represents a data tree node name and its associated attributes.
    2. A path must be syntactically valid within the set of schema modules that the gRPC server supports.
    3. The path list cannot be modified throughout the lifetime of the subscription.
    4. If the subscription path is to a container node, all children leafs of that container node are considered to be subscribed to.
    5. Any specified path must be unique within the list (paths cannot be repeated within the list). An error is returned if the same path is used more than one time in a single subscription.
    6. A specified path does not need to preexist within the current data tree on the gRPC server. If a path does not exist, the gRPC server continues to monitor for the existence of the path, and transmits telemetry updates if the path exists in the future.
    7. The gRPC server does not send any data for a nonexistent path; for example, if a path is nonexistent at the time of subscription creation or if the path was deleted after the subscription was established.
    8. The maximum number of paths per all subscriptions on a single SR OS device is 14 400. A path using a wildcard is still considered a single path.
  2. a subscription mode. The following conditions apply.
    1. SAMPLE mode is supported for each path, where the gRPC server sends notifications at the specified sampling interval.
    2. TARGET_DEFINED mode also means SAMPLE mode.
  3. a sample interval:
    1. A sample_interval is supported for each path. A sample_interval of 0 means 10 seconds by default. If a sample_interval of less than 10 seconds is specified, the gRPC server returns an error. A sample_interval is specified in nanoseconds (10 000 000 000, by default).

Figure 24 shows the SR OS support of a subscription request.

Figure 24:  Subscription Request 

When a subscription is successfully initiated on the gRPC server, SubscribeReponse messages are sent from the gRPC server to the gRPC client. The SubscribeResponse message contains update notifications as per the subscription's path list.

A sync_response notification is sent one time, after the gRPC server sends all of the updates for the subscribed-to paths. The sync_response must be set to “true” for the gRPC client to consider that the stream has synced one time. A sync_response is used to signal the gRPC client that it has a full view of the subscribed-to data.

The gRPC server sends an error if required. The error contains a description of the context of the error.

An update notification contains:

  1. a timestamp of the statistics collection time, represented in nanoseconds
  2. a prefix:
    1. If a prefix is present, it is logically appended to the start of every path to provide the full path.
    2. The presence of a prefix in the SubscriptionResponse message is not related to the presence of a prefix in the original SubscriptionRequest message. The prefix in the SubscriptionResponse message is optimized by the gRPC server.
  3. a list of update (path and value pairs):
    1. A path represents the data tree path as a series of repeated strings or elements, where each element represents a data tree node name and its associated attributes. See the Schema Paths section for more information.
    2. The TypedValue message represents the data tree node’s value where encoding is always “JSON”.
    Figure 25 shows the SR OS support of a subscription response.
    Figure 25:  Subscription Response 

7.2.3. Schema Paths

Telemetry subscriptions include a set of schema paths used to identify which data nodes are of interest to the collector.

The paths in Telemetry Subscribe RPC requests follow the basic conventions described in the OpenConfig gnmi-path-conventions.md published on github.com (version 0.2.0 from February 24th, 2017).

A path consists of a set of path segments often shown with a “/” character as a delimiter; for example, /configure/router[router-instance=Base]/interface[interface-name=my-interface1]/description.

These paths are encoded as a set of individual string segments in gnmi.proto (without any “/” characters); for example, ["configure", "router[router-instance=Base]", "interface[interface-name=my-interface1]", "description"].

A path selects an entire subtree of the data model and includes all descendants of the node indicated in the path. Table 95 describes the types of schema paths that are supported in SR OS telemetry.

Table 95:  Schema Paths  

Path example

Description

/configure/router[router-instance=Base]/interface[interface-name=abc]

Selects all config leafs of interface abc and all descendants.

/configure/router[router-instance=Base]/interface[interface-name=abc]/description

Selects only the description leaf of interface abc.

/state/router[router-instance=Base]/interface[interface-name=*]

Selects all state information for all Base router interfaces using a wildcard in a single segment of a path.

/configure/router[router-instance=Base]/interface[interface-name=*]/description

Selects all state information for all Base router interfaces using a wildcard in a single segment of a path.

/

The root path. This selects all config and state data from all models (in all namespaces) supported on the router. Encoded as “” in gRPC/gPB.

The following list describes types of telemetry paths that are not supported in SR OS:

  1. Wildcards for entire path segments are not supported.
    For example: /state/service/*/oper-status
  2. If a wildcard is used for any key of a list, a wildcard must be used for all the keys of that list. In a single path segment, all the keys must either have specific values or all the keys must have wildcards. A mix of wildcards and specific values for different parts of a list key is not supported.
    For example:
    Supported:
    /a/b[key1=*][key2=*]/c[key1=foo]
    /a/b[key1=foo][key2=bar]/c[key1=*]
    Not supported:
    /a/b[key1=foo][key2=*]
  3. Functions such as “current()”, “last()” and mathematical operators, such as stat<5 or octets>3 are not supported in paths. The “|” (OR operator, used to select multiple paths) is not supported.
  4. Wildcards in multiple segments of a path are supported.
    For example: /state/card[slot-number=*]/mda[mda-slot=*]
  5. The “//” wildcard pattern is not supported.
    For example: /state//oper-status

7.3. Telemetry Examples

This section contains examples of Telemetry subscription requests and responses. The following examples are dumps of protobuf messages from a Python API. Formats may vary across different implementations.

Example 1 — Subscribe to a single path

2017-06-05 17:06:13,189 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=test]"
      element: "statistics"
      element: "ip"
      element: "in-packets"
    }
    mode: SAMPLE
    sample_interval: 10000000000
  }
}
2017-06-05 17:06:13,190 - RCVD::SubsribeResponse
2017-06-05 17:06:23,492 - RCVD::Subscribe
2017-06-05 17:06:23,492 - update {
  timestamp: 1496675183491595139
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=test]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
2017-06-05 17:06:23,494 - RCVD::Subscribe
2017-06-05 17:06:23,494 - sync_response: true
2017-06-05 17:06:33,589 - RCVD::Subscribe
2017-06-05 17:06:33,589 - update {
  timestamp: 1496675213491595139
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=test]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““28””
  }
}
....
....

Example 2 — Subscribe to a single path with wildcard

2017-06-05 17:08:29,055 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=*]"
      element: "statistics"
      element: "ip"
      element: "in-packets"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}
2017-06-05 17:08:29,056 - RCVD::SubsribeResponse
2017-06-05 17:08:59,133 - RCVD::Subscribe
2017-06-05 17:08:59,133 - update {
  timestamp: 1496675339132056575
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=system]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      jason_val: ““0””
    }
  }
}
2017-06-05 17:08:59,135 - RCVD::Subscribe
2017-06-05 17:08:59,135 - update {
  timestamp: 1496675339133006678
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
2017-06-05 17:08:59,135 - RCVD::Subscribe
2017-06-05 17:08:59,135 - update {
  timestamp: 1496675339133006678
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_D]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
2017-06-05 17:08:59,136 - RCVD::Subscribe
2017-06-05 17:08:59,136 - sync_response: true
2017-06-0517:09:29,139 - RCVD::Subscribe
2017-06-0517:09:29,139 - update {
  timestamp: 1496682569121314
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=system]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
2017-06-05 17:09:29,142 - RCVD::Subscribe
2017-06-05 17:09:29,142 - update {
  timestamp: 1496682569124342
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
2017-06-05 17:09:29,145 - RCVD::Subscribe
2017-06-05 17:09:29,145 - update {
  timestamp: 1496682569127344
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_D]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
....
....

Example 3: Subscribe to more than one path

2017-01-24 12:54:18,228 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=to_node_B]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "mpls"
      element: "statistics"
      element: "lsp-egress-stats[lsp-name=lsp_to_dest_f]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}

Example 4: Subscribe to a list with wildcard

2017-01-24 13:45:30,947 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=*]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}

Example 5: Subscribe to path where the object did not exist before subscription

2017-01-24 13:53:50,165 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=to_node_B]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}
2017-01-24 13:53:50,166 - RCVD::SubsribeResponse
2017-01-24 13:54:20,169 - RCVD::Subscribe
2017-01-24 13:54:20,169 - sync_response: true
2017-01-24 13:54:50,174 - RCVD::Subscribe
2017-01-24 13:54:50,174 - update {
  timestamp: 1485262490169309451
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
  }
  update {
...
...
  }
}

Example 6: Subscribe to a path where the object existed before subscription then was deleted after subscription

2017-01-24 14:00:41,292 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=to_node_B]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}
2017-01-24 14:00:41,294 - RCVD::SubsribeResponse
2017-01-24 14:01:11,295 - RCVD::Subscribe
2017-01-24 14:01:11,295 - update {
  timestamp: 1485262871290064704
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
  }
  update {
...
...
  }
}
2017-01-24 14:01:11,359 - RCVD::Subscribe
2017-01-24 14:01:11,359 - sync_response: true
2017-01-24 14:01:41,293 - RCVD::Subscribe
2017-01-24 14:02:11,296 - RCVD::Subscribe