7. Telemetry

7.1. Telemetry Overview

Telemetry is a network monitoring and fault management framework. It is driven by natural network growth (network volume increases) and the need to use fresh data obtained from the network to make fast networking decisions such as traffic optimization and preventive troubleshooting.

Unlike legacy monitoring platforms such as SNMP, Telemetry does not only rely on collectors to continuously pull data from the network elements. Instead, network devices push and stream data (such as statistics) continuously to collectors based on subscriptions. Collectors can then filter, analyze, store, and make decisions using the collected data from the network devices. Figure 21 illustrates this process.

Figure 21:  Telemetry Application 

7.2. About Telemetry

Telemetry uses the proprietary NOKIA SR OS YANG data models to stream data that is encoded as Google Protocol Buffers (gPB) messages. Google Remote Procedure Call (gRPC) is the transport used to subscribe to the SR OS device and receive streamed telemetry data. SR OS supports gPB version 3.0.0-b2.

7.2.1. gRPC in Telemetry

The gRPC transport method uses HTTP/2 bidirectional streaming between the gRPC client (the collector) and the gRPC server (the SR OS device). A gRPC session is a single connection from the gRPC client to the gRPC server over the TCP/TLS port. A gRPC session can be used by:

  1. a gRPC client to send a telemetry subscription request to the gRPC server
  2. a gRPC server to send asynchronous telemetry data to the gRPC collector

A gRPC channel is a single RPC call.

The gRPC version supported on the SR OS gRPC server is 1.0.1.

The SR OS gRPC encryption and authentication follows the basic conventions described in the OpenConfig gnmi-authentication.md published on github.com (version 0.1.0 from Oct 5, 2016).

TLS encryption is used for added security. The following summarizes the process of encryption and authentication:

  1. SR OS device authentication
    1. The gRPC clients do not share gRPC sessions. Each gRPC client should initially start a separate gRPC session.
    2. When a gRPC session is established, the gRPC server certificates are verified by the gRPC client to ensure every gRPC server is authenticated by the gRPC client.
    3. When gRPC is shutdown on the gRPC server and a gRPC client is trying to establish a gRPC session, the gRPC client will get an error for every sent RPC.
    4. When a gRPC session is established, gRPC is shutdown on the gRPC server, all active RPCs are gracefully terminated, and an error is returned for every active RPC.
  2. TLS encryption
    1. The gRPC session should be in an encrypted state before it can be used.
    2. If the gRPC client and gRPC server are unable to negotiate an encrypted gRPC session, the gRPC session fails and the gRPC server sends an error.
    3. Fallback from an encrypted to an unencrypted gRPC session is not allowed.
    For information on how to configure TLS with gRPC, see the TLS chapter.
  3. User authentication
    1. Each RPC sent by the gRPC client carries a user/password.
    2. For the first RPC on the gRPC session, the gRPC server tries to authenticate the user via the specified authentication order; for instance, local user database, RADIUS, or TACACS+.
      For example, if TACACS+ is first in the authentication order, the gRPC server sends a request to the TACACs+ server to authenticate the gRPC user.
    3. For the subsequent RPCs on that same authenticated gRPC session, the user/password are re-authenticated only if changed.
    4. When there is no user/password provided with the RPC, the gRPC server returns an error.
    5. If the RPC user is changed, then any active subscriber RPCs on that same gRPC session are terminated by the gRPC server.
    6. If the RPC password is changed, then the active gRPC session will continue to exist until a different user/password is sent in a subsequent RPC, or the gRPC session is terminated.
    7. Each telemetry message is carried over an encrypted gRPC session which was previously encrypted; the session is not re-encrypted.

Figure 22 shows the telemetry protocol stack.

Figure 22:  Telemetry Stack 

The gRPC service runs on port 57400 by default on the SR OS. The service is not configurable.

A single gRPC server supports concurrent gRPC sessions and channels.

  1. There is a maximum of eight concurrent gRPC sessions for all of the gRPC clients.
  2. There is a maximum of 225 concurrent gRPC channels for all of the gRPC clients. Because each RPC is a unique channel, the maximum number of subscriptions for all the gRPC clients on a single SR OS device is 225.

Closing a gRPC channel terminates an active Telemetry subscription. A gRPC session that is used by the disconnected subscription is not to be terminated. Closing the entire gRPC session terminates all active Telemetry subscriptions on the disconnected gRPC session.

A Telemetry subscription can be administratively terminated from the CLI. An active gRPC session that is used by the terminated subscription is not terminated. See gRPC Command Reference for command details.

Figure 23 shows a gRPC service using the TLS architecture.

Figure 23:  gRPC Using TLS Architecture 

7.2.2. Operations Layer

This section summarizes support for subscription requests and subscription responses.

SR OS Telemetry follows the OpenConfig gnmi.proto published on github.com (version 0.3.1, from April 20th, 2017). This model defines the relationship and behavior between the gRPC client and server.

SR OS Telemetry follows the basic conventions described in the OpenConfig gnmi-specification.md published on github.com (version 0.2.2 from March 7th, 2017).

A subscription is initiated from the gRPC client by sending a "Subscribe" RPC that contains a "SubscribeRequest" message to the gRPC server. A “prefix” can be specified to be used with all paths specified in the "SubscribeRequest". If a "prefix" is present then it is logically appended to the start of every "path" to provide a full "path".

A subscription contains:

  1. a path list of one or more paths. The following conditions apply.
    1. A path represents the data tree as a series of repeated strings/elements. Each element represents a data tree node name and its associated attributes.
    2. A path should be syntactically valid within the set of schema modules that the gRPC server supports.
    3. The path list cannot be modified throughout the lifetime of the subscription.
    4. If the subscription path is to a container node, then all children leafs of that container node are considered to be subscribed to.
    5. Any specified path must be unique within the list (paths cannot be repeated within the list). An error is returned if the same path is used more than one time in a single subscription.
    6. A specified path does not need to pre-exist within the current data tree on the gRPC server. In the case that a particular path does not exist, the gRPC server continues to monitor for the existence of the path, and transmits telemetry updates if the path exist in the future.
    7. The gRPC server does not send any data for a non-existing path. For instance, if a path is non-existing at the time of subscription creation or if the path was deleted after the subscription is established.
    8. The maximum number of paths per all subscriptions on a single SR OS device is 14400. A path using a wildcard is still considered a single path.
  2. a subscription mode. The following conditions apply.
    1. SAMPLE mode is supported for each path, where the gRPC server sends notifications at the specified sampling interval.
    2. Using “TARGET_DEFINED” mode still means “SAMPLE” mode.
  3. A sample interval:
    1. A sample_interval is supported for each path. A sample interval of 0 means 10 seconds by default. If a sample_interval of less than 10 seconds is specified, the gRPC server returns an error. A sample interval is specified in nanoseconds.

Figure 24 illustrates the SR OS support of a subscription request.

Figure 24:  Subscription Request 

When a subscription is successfully initiated on the gRPC server, SubscribeReponse message are sent from the gRPC server to the gRPC client. One set of messages is sent with every sample_interval. The SubscribeResponse message contains update notifications as per the subscription's path list.

A sync_response notification is sent one time, after the gRPC server sends all of the updates for the subscribed-to paths. The sync_response must be set to “true” for the gRPC client to consider the stream has synced one time. A sync_response is used to signal the gRPC client that it has a full view of the subscribed-to data.

The gRPC server sends an error if required. The error contains a description of the context of the error.

An update notification contains:

  1. a timestamp of the statistics collection time, represented in nanoseconds
  2. a prefix:
    1. If a prefix is present, then it is logically appended to the start of every path to provide the full path.
    2. The presence of a prefix in the SubscriptionResponse message is not related to the presence of a prefix in the original SubscriptionRequest message. The prefix in the SubscriptionResponse message is optimized by the gRPC server.
  3. a list of update path and value pairs
    1. A path represents the data tree path as a series of repeated strings or elements, where each element represents a data tree node name and its associated attributes. See the Schema Paths section for more information.
    2. The TypedValue message represents the data tree node’s value where encoding is always “JSON”.
    Figure 25 illustrates the SR OS support of a subscription response.
    Figure 25:  Subscription Response 

7.2.3. Schema Paths

Telemetry subscriptions include a set of schema paths used to identify which data nodes are of interest to the collector.

The paths in Telemetry Subscribe RPC requests follow the basic conventions described in the OpenConfig gnmi-path-conventions.md published on github.com (version 0.2.0 from February 24th, 2017).

A path consists of a set of path segments often shown with a ‘/’ character as a delimiter. For example: configure/router[router-instance=Base]/interface[interface-name=my-interface1]/description.

These paths are encoded as a set of individual string segments in gnmi.proto (without any ‘/’ characters). For example, ["configure", "router[router-instance=Base]", "interface[interface-name=my-interface1]", "description"]

A path selects an entire subtree of the data model and includes all descendants of the node indicated in the path. The following table summarizes the types of paths that are supported in SR OS telemetry:

Table 94:  Schema Paths  

Path example

Description

/configure/router[router-instance=Base]/interface[interface-name=abc]

Selects all config leafs of interface abc and all descendants.

/configure/router[router-instance=Base]/interface[interface-name=abc]/description

Selects only the description leaf of interface abc.

/state/router[router-instance=Base]/interface[interface-name=*]

Selects all state information for all Base router interfaces. Wildcard in a single segment of a path.

/configure/router[router-instance=Base]/interface[interface-name=*]/description

Selects the description leaf for all Base router interfaces. Wildcard in a single segment of a path.

/

The root path. This selects all config and state data from all models (in all namespaces) supported on the router. Encoded as “” in gRPC/gPB.

The following items describe types of telemetry paths that are not supported in SR OS:

  1. Wildcards for entire path segments are not supported.
    For example, /state/service/*/oper-status
  2. If a wildcard is used for any key of a list, then a wildcard must be used for all the keys of that list. In a single path segment, all the keys must either have specific values or all the keys must have wildcards. A mix of wildcards and specific values for different parts of a list key is not supported.
    For example:
    Supported:
    /a/b[key1=*][key2=*]/c[key1=foo]
    /a/b[key1=foo][key2=bar]/c[key1=*]
    Not supported:
    /a/b[key1=foo][key2=*]
  3. Functions such as ‘current()’, 'last()' and mathematical operators, such as stat<5 or octets>3 are not supported in paths. The '|' (OR operator, used to select multiple paths) is not supported.
  4. Wildcards in multiple segments of a path are supported.
    For example: /state/card[slot-number=*]/mda[mda-slot=*]
  5. The ‘//’ wildcard pattern is not supported.
    For example: /state//oper-status

7.3. Telemetry Examples

This section contains examples of Telemetry subscription requests and responses. The following examples are dumps of protobuf messages from a Python API. Formats may vary across different implementations.

Example 1 — Subscribe to a single path

2017-06-05 17:06:13,189 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=test]"
      element: "statistics"
      element: "ip"
      element: "in-packets"
    }
    mode: SAMPLE
    sample_interval: 10000000000
  }
}
2017-06-05 17:06:13,190 - RCVD::SubsribeResponse
2017-06-05 17:06:23,492 - RCVD::Subscribe
2017-06-05 17:06:23,492 - update {
  timestamp: 1496675183491595139
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=test]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
2017-06-05 17:06:23,494 - RCVD::Subscribe
2017-06-05 17:06:23,494 - sync_response: true
2017-06-05 17:06:33,589 - RCVD::Subscribe
2017-06-05 17:06:33,589 - update {
  timestamp: 1496675213491595139
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=test]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““28””
  }
}
....
....

Example 2 — Subscribe to a single path with wild card

2017-06-05 17:08:29,055 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=*]"
      element: "statistics"
      element: "ip"
      element: "in-packets"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}
2017-06-05 17:08:29,056 - RCVD::SubsribeResponse
2017-06-05 17:08:59,133 - RCVD::Subscribe
2017-06-05 17:08:59,133 - update {
  timestamp: 1496675339132056575
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=system]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      jason_val: ““0””
    }
  }
}
2017-06-05 17:08:59,135 - RCVD::Subscribe
2017-06-05 17:08:59,135 - update {
  timestamp: 1496675339133006678
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
2017-06-05 17:08:59,135 - RCVD::Subscribe
2017-06-05 17:08:59,135 - update {
  timestamp: 1496675339133006678
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_D]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
2017-06-05 17:08:59,136 - RCVD::Subscribe
2017-06-05 17:08:59,136 - sync_response: true
2017-06-0517:09:29,139 - RCVD::Subscribe
2017-06-0517:09:29,139 - update {
  timestamp: 1496682569121314
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=system]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
2017-06-05 17:09:29,142 - RCVD::Subscribe
2017-06-05 17:09:29,142 - update {
  timestamp: 1496682569124342
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
2017-06-05 17:09:29,145 - RCVD::Subscribe
2017-06-05 17:09:29,145 - update {
  timestamp: 1496682569127344
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_D]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    val {
      json_val: ““0””
    }
  }
}
....
....

Example 3: Subscribe to more than one path

2017-01-24 12:54:18,228 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=to_node_B]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "mpls"
      element: "statistics"
      element: "lsp-egress-stats[lsp-name=lsp_to_dest_f]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}

Example 4: Subscribe to a list with wild card

2017-01-24 13:45:30,947 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=*]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}

Example 5: Subscribe to path where the object did not exist before subscription

2017-01-24 13:53:50,165 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=to_node_B]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}
2017-01-24 13:53:50,166 - RCVD::SubsribeResponse
2017-01-24 13:54:20,169 - RCVD::Subscribe
2017-01-24 13:54:20,169 - sync_response: true
2017-01-24 13:54:50,174 - RCVD::Subscribe
2017-01-24 13:54:50,174 - update {
  timestamp: 1485262490169309451
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
  }
  update {
...
...
  }
}

Example 6: Subscribe to a path where the object existed before subscription then got deleted after subscription

2017-01-24 14:00:41,292 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=to_node_B]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}
2017-01-24 14:00:41,294 - RCVD::SubsribeResponse
2017-01-24 14:01:11,295 - RCVD::Subscribe
2017-01-24 14:01:11,295 - update {
  timestamp: 1485262871290064704
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
  }
  update {
...
...
  }
}
2017-01-24 14:01:11,359 - RCVD::Subscribe
2017-01-24 14:01:11,359 - sync_response: true
2017-01-24 14:01:41,293 - RCVD::Subscribe
2017-01-24 14:02:11,296 - RCVD::Subscribe