7. Telemetry

7.1. In This Chapter

This chapter provides information to configure Telemetry.

Topics in this chapter include:

7.2. Telemetry Overview

Telemetry is a network monitoring and fault management framework. It is driven by natural network growth (network volume increases) and the need to use fresh data obtained from the network to make fast networking decisions such as traffic optimization and preventive troubleshooting.

Unlike legacy monitoring platforms such as SNMP, Telemetry does not only rely on collectors to continuously pull data from the network elements. Instead, network devices push and stream data (such as statistics) continuously to collectors based on subscriptions. Collectors can then filter, analyze, store, and make decisions using the collected data from the network devices. Figure 21 illustrates this process.

Figure 21:  Telemetry Application 

7.3. About Telemetry

Telemetry uses the proprietary NOKIA SR OS YANG data models to stream data that is encoded as Google Protocol Buffers (gPB) messages. Google Remote Procedure Call (gRPC) is the transport used to subscribe to the SR OS device and receive streamed telemetry data. SR OS supports gPB version 3.0.0-b2.

7.3.1. gRPC in Telemetry

The gRPC transport method uses HTTP/2 bidirectional streaming between the gRPC client (the collector) and the gRPC server (the SR OS device). A gRPC session is a single connection from the gRPC client to the gRPC server over the TCP/TLS port. A gRPC session can be used by:

  1. a gRPC client to send a telemetry subscription request to the gRPC server
  2. a gRPC server to send asynchronous telemetry data to the gRPC collector

A gRPC channel is a single RPC call.

The gRPC version supported on the SR OS gRPC server is 1.0.1.

The SR OS gRPC encryption and authentication follows the basic conventions described in the OpenConfig gnmi-authentication.md published on github.com (version 0.1.0 from Oct 5, 2016).

TLS encryption is used for added security. The following summarizes the process of encryption and authentication:

  1. SR OS device authentication
    1. The gRPC clients do not share gRPC sessions. Each gRPC client should initially start a separate gRPC session.
    2. When a gRPC session is established, the gRPC server certificates are verified by the gRPC client to ensure every gRPC server is authenticated by the gRPC client.
    3. When gRPC is shutdown on the gRPC server and a gRPC client is trying to establish a gRPC session, the gRPC client will get an error for every RPC.
    4. When a gRPC session is established, gRPC is shutdown on the gRPC server, all active RPCs are gracefully terminated, and an error is returned for every RPC.
  2. TLS encryption
    1. The gRPC session should be in an encrypted state before it can be used.
    2. If the gRPC client and gRPC server are unable to negotiate an encrypted gRPC session, the gRPC session fails and the gRPC server sends an error.
    3. Fallback from an encrypted to an unencrypted gRPC session is not allowed.
    For information on how to configure TLS with gRPC, see the TLS chapter.
  3. User authentication
    1. Each RPC sent by the gRPC client carries a user/password.
    2. For the first RPC on the gRPC session, the gRPC server tries to authenticate the user via the specified authentication order; for instance, local user database, RADIUS, or TACACS+.
      For example, if TACACS+ is first in the authentication order, the gRPC server sends a request to the TACACs+ server to authenticate the gRPC user.
    3. For the subsequent RPCs on that same authenticated gRPC session, the user/password are re-authenticated only if changed.
    4. When there is no user/password provided with the RPC, the gRPC server returns an error.
    5. If the RPC user is changed, then any active subscriber RPCs on that same gRPC session are terminated by the gRPC server.
    6. If the RPC password is changed, then the active gRPC session will continue to exist until a different user/password is sent in a subsequent RPC, or the gRPC session is terminated.
    7. Each telemetry message is carried over an encrypted gRPC session which was previously encrypted; the session is not re-encrypted.

Figure 22 shows the telemetry protocol stack.

Figure 22:  Telemetry Stack 

The gRPC service runs on port 57400 by default on the SR OS. The service is not configurable.

A single gRPC server supports concurrent gRPC sessions and channels.

  1. There are a maximum of eight concurrent gRPC sessions for all of the gRPC clients.
  2. There are a maximum of 225 concurrent gRPC channels for all of the gRPC clients. Since each RPC is a unique channel, the maximum number of subscriptions for all the gRPC clients on a single SR OS device is 225.

Closing a gRPC channel terminates an active Telemetry subscription. A gRPC session that is used by the disconnected subscription is not be terminated. Closing the entire gRPC session terminates all active Telemetry subscriptions on the disconnected gRPC session.

A Telemetry subscription can be administratively terminated from the CLI. An active gRPC session that is used by the terminated subscription is not terminated. See the CLI section for command details.

Figure 23 shows a gRPC service using the TLS architecture.

Figure 23:  gRPC Using TLS Architecture 

7.3.2. Operations Layer

This section summarizes support for subscription requests and subscription responses.

SR OS Telemetry follows the OpenConfig gnmi.proto published on github.com (version 0.2.0, from Nov 8, 2016). This model defines the relationship and behavior between the gRPC client and server.

SR OS Telemetry follows the basic conventions described in the OpenConfig gnmi-specification.pdf published on github.com (version 0.2.1 from Nov 10, 2016).

A subscription is initiated from the gRPC client by sending a "Subscribe" RPC that contains a "SubscribeRequest" message to the gRPC server. A “prefix” can be specified to be used with all paths specified in the "SubscribeRequest". If a "prefix" is present then it is logically appended to the start of every "path" to provide a full "path".

A "subscription" contains:

  1. A "path" list of one or more paths:
    1. A path represents the data tree as a series of repeated strings/elements. Each element represents a data tree node name and its associated attributes.
    2. A path should be syntactically valid within the set of schema modules that the gRPC server supports.
    3. That list cannot be modified throughout the lifetime of the subscription.
    4. If the subscription path is to a container node, then all children leafs of that container node are considered to be subscribed to.
    5. Any specified path must be unique within the list (paths cannot be repeated within the list). An error is returned upon using the same path more than once in a single subscription.
    6. A specified path does not need to pre-exist within the current data tree on the gRPC server. In the case that a particular path does not exist, the gRPC server continues to monitor for the existence of the path, and transmits telemetry updates if the path exist in the future.
    7. The gRPC server does not send any data for a non-existing path. For instance, if a path is non-existing at the time of subscription creation or if the path was deleted after the subscription is established.
    8. The maximum number of explicit paths per a single subscription that can be specified is 64. This means that the maximum number of explicit paths per all subscriptions on a single SR OS device is 14400 (225 subscriptions multiplied by 64 paths). Upon receiving a SubscribeRequest message that is trying to subscribe to more than 64 explicit paths, an error is returned by the SR OS device. A path using a wildcard is still considered a single explicit path.
  2. A subscription mode:
    1. “SAMPLE” mode is supported for each path, where the gRPC server sends notifications at the specified sampling interval.
    2. Using “TARGET_DEFINED” mode still means “SAMPLE” mode.
  3. A sample interval:
    1. A "sample_interval" is supported for each path. A sample interval of 0 means 10 seconds by default. If a “sample_interval”of less than 10 seconds is specified, the gRPC server returns an error. A sample interval is specified in nano seconds.

Figure 24 illustrates the SR OS support of a subscription request.

Figure 24:  Subscription Request 

When a subscription is successfully initiated on the gRPC server, “SubscribeReponse” message are sent from the gRPC server to the gRPC client. One set of messages is sent with every "sample_interval". The "SubscribeResponse" message contains "update” notifications as per the subscription's path list.

A "sync_response" notification is sent every time the gRPC server sends all of the updates for the subscribed-to paths. The "sync_response" must be set to true for the gRPC client to consider the stream has synced. A "sync_response" is used to signal the gRPC client that it has a full view of the subscribed-to data.

The gRPC server sends an error if required. The error contains a description of the context of the error.

An "update" notification contains:

  1. A "timestamp" of the statistics collection time. Time stamps are always represented as nanoseconds.
  2. A “prefix”:
    1. If a "prefix" is present, then it is logically appended to the start of every "path" to provide the full path.
    2. The presence of a "prefix" in the "SubscriptionResponse" message is not related to the presence of a "prefix" in the original "SubscriptionRequest" message. The "prefix" in the "SubscriptionResponse" message is optimized by the gRPC server.
  3. A list of "update" path and value pairs.
    1. A path represents the data tree path as a series of repeated strings/elements. Where each element represents a data tree node name and its associated attributes. See the Schema Paths section for more information.
    2. The “Value” message represents the data tree node’s “value” and “encoding” which is always “JSON”.
    Figure 25 illustrates the SR OS support of a subscription response.
    Figure 25:  Subscription Response 

7.3.3. Schema Paths

Telemetry subscriptions include a set of schema paths used to identify which data nodes are of interest to the collector.

The paths in Telemetry 'Subscribe" RPC requests follow the basic conventions described in the OpenConfig gnmi-path-conventions.md published on github.com (version 0.1.0 from Nov 11, 2016).

A path consists of a set of path segments often shown with a ‘/’ character as a delimiter. For example: configure/router[router-instance=Base]/interface[interface-name=my-interface1]/description.

These paths are encoded as a set of individual string segments in gnmi.proto (without any ‘/’ characters). For example, ["configure", "router[router-instance=Base]", "interface[interface-name=my-interface1]", "description"]

A path selects an entire subtree of the data model and includes all descendants of the node indicated in the path.The following table summarizes the types of paths that are supported in SR OS telemetry:

Table 91:  Schema Paths  

Path example

Description

/configure/router[router-instance=Base]/interface[interface-name=abc]

Selects all config leafs of interface abc and all descendants.

/configure/router[router-instance=Base]/interface[interface-name=abc]/description

Selects only the description leaf of interface abc.

/state/router[router-instance=Base]/interface[interface-name=*]

Selects all state information for all Base router interfaces. Wildcard in a single segment of a path.

/configure/router[router-instance=Base]/interface[interface-name=*]/description

Selects the description leaf for all Base router interfaces. Wildcard in a single segment of a path.

/

The root path. This selects all config and state data from all models (in all namespaces) supported on the router. Encoded as “” in gRPC/gPB.

The following items describe types of telemetry paths that are not supported in SR OS:

  1. Wildcards for entire path segments are not supported.
    For example, /state/service/*/oper-status
  2. If a wildcard is used for any key of a list, then a wildcard must be used for all the keys of that list. In a single path segment, all the keys must either have specific values or all the keys must have wildcards. A mix of wildcards and specific values for different parts of a list key is not supported.
    For example: /state/cflowd/collector[ip-address=138.120.44.45][port=*]/oper-status
  3. Functions such as ‘current()’, 'last()' and mathematical operators, such as stat<5 or octets>3 are not supported in paths. The '|' (OR operator, used to select multiple paths) is not supported.
  4. Wildcards in multiple segments of a path are not supported.
    For example: /state/card[slot-number=*]/mda[mda-slot=*]
  5. The ‘//’ wildcard pattern is not supported.
    For example: /state//oper-status

7.4. Telemetry Examples

This section contains examples of Telemetry subscription requests and responses. The following examples are dumps of protobuf messages from a python API. Format may vary across different implementations.

Example 1 — Subscribe to a single path

2017-01-20 12:34:51,594 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=test]"
      element: "statistics"
      element: "ip"
      element: "in-packets"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}
2017-01-20 12:34:51,605 - RCVD::SubsribeResponse
2017-01-20 12:35:21,611 - RCVD::Subscribe
update {
  timestamp: 1484912121607764002
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=test]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    value {
      value: "0"
      type: JSON
    }
  }
}
2017-01-20 12:35:21,650 - RCVD::Subscribe
sync_response: true
2017-01-20 12:35:51,612 - RCVD::Subscribe
update {
  timestamp: 1484912151608586530
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=test]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    value {
      value: "16"
      type: JSON
    }
  }
}
2017-01-20 12:35:51,614 - RCVD::Subscribe
sync_response: true
....
....

Example 2 — Subscribe to a single path with wild card

2017-01-24 08:58:06,175 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=*]"
      element: "statistics"
      element: "ip"
      element: "in-packets"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}
2017-01-24 08:58:06,181 - RCVD::SubsribeResponse
2017-01-24 08:58:36,191 - RCVD::Subscribe
update {
  timestamp: 1485244716188240643
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=system]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    value {
      value: "0"
      type: JSON
    }
  }
}
2017-01-24 08:58:36,231 - RCVD::Subscribe
update {
  timestamp: 1485244716192259548
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    value {
      value: "0"
      type: JSON
    }
  }
}
2017-01-24 08:58:36,233 - RCVD::Subscribe
update {
  timestamp: 1485244716194644789
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_D]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    value {
      value: "0"
      type: JSON
    }
  }
}
2017-01-24 08:58:36,235 - RCVD::Subscribe
sync_response: true
2017-01-24 08:59:06,192 - RCVD::Subscribe
update {
  timestamp: 1485244746189318112
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=system]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    value {
      value: "0"
      type: JSON
    }
  }
}
2017-01-24 08:59:06,196 - RCVD::Subscribe
update {
  timestamp: 1485244746193708158
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    value {
      value: "0"
      type: JSON
    }
  }
}
2017-01-24 08:59:06,199 - RCVD::Subscribe
update {
  timestamp: 1485244746196077911
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_D]"
    element: "statistics"
    element: "ip"
  }
  update {
    path {
      element: "in-packets"
    }
    value {
      value: "0"
      type: JSON
    }
  }
}
2017-01-24 08:59:06,200 - RCVD::Subscribe
sync_response: true
....
....

Example 3: Subscribe to more than one path

2017-01-24 12:54:18,228 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=to_node_B]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "mpls"
      element: "statistics"
      element: "lsp-egress-stats[lsp-name=lsp_to_dest_f]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}

Example 4: Subscribe to a list with wild card

2017-01-24 13:45:30,947 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=*]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}

Example 5: Subscribe to path where the object did not exist before subscription

2017-01-24 13:53:50,165 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=to_node_B]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}
2017-01-24 13:53:50,166 - RCVD::SubsribeResponse
2017-01-24 13:54:20,169 - RCVD::Subscribe
sync_response: true
2017-01-24 13:54:50,174 - RCVD::Subscribe
update {
  timestamp: 1485262490169309451
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
  }
  update {
...
...

Example 6: Subscribe to a path where the object existed before subscription then got deleted after subscription

2017-01-24 14:00:41,292 - SENT::SubscribeRequest
subscribe {
  subscription {
    path {
      element: "state"
      element: "router[router-instance=Base]"
      element: "interface[interface-name=to_node_B]"
    }
    mode: SAMPLE
    sample_interval: 30000000000
  }
}
2017-01-24 14:00:41,294 - RCVD::SubsribeResponse
2017-01-24 14:01:11,295 - RCVD::Subscribe
update {
  timestamp: 1485262871290064704
  prefix {
    element: "state"
    element: "router[router-instance=Base]"
    element: "interface[interface-name=to_node_B]"
  }
  update {
...
...
  }
}
2017-01-24 14:01:11,359 - RCVD::Subscribe
sync_response: true
2017-01-24 14:01:41,293 - RCVD::Subscribe
sync_response: true
2017-01-24 14:02:11,296 - RCVD::Subscribe
sync_response: true