ICMP ping check connectivity checking using ICMP echo request and response

In specific network configurations, it is not always possible to deploy preferred standards-based purpose-built robust connectivity verification tools such as Bidirectional Forwarding Detection (BFD), or Ethernet Connectivity Fault Management Continuity Check Message (ETH-CCM), or other more suited tools. When circumstances prevent the preferred connectivity validation methods, ICMP echo request and response ping connectivity check (icmp ping check) using ping templates can be used as an alternate connectivity checking method. Before deploying this approach an understanding of the treatment of these types of packets on the involved network elements is required. The ping check affects the operational state of the VRPN or IES service IPv4 interface (the service IP interface) being verified.

Deployment of this feature requires the following:

configuring the ping template
the assignment of the ping template to a service IP interface
optionally, configuring the distributed CPU protection for the icmp-ping-check protocol

The ping template defines timers and thresholds that determine the basis for connectivity verification and influence the service IP interface operational state. The configuration of the ping template is located in the config>test-oam>icmp context. The configuration options are separated to allow different failure detection and recovery behaviors.

The transmission frequency (interval), loss detection (timeout) and threshold (failure-threshold) are used to check connectivity when the service IP interface is steady and operationally up, or steady and operationally down. When these values are monitoring connectivity and the service IP interface is operationally up, consecutive failures that reach the failure threshold transitions the interface to operationally down. When these values are monitoring connectivity and the service IP interface is operationally down, a first success triggers the recovery values to complete the validation. For example, if the interval 10 (seconds), timeout 5 (seconds) and failure-threshold 3 (count) the failure detection takes 30 seconds.

When a service IP interface transitions from operationally up to operationally down because of icmp ping check the log event ‟UTC WARNING: SNMP #2004 vprn1000999 int-PE-CE-999. Interface int-PE-CE-999 is not operational” is generated.

When a service IP interface has transitioned from operationally up to the operationally down state because of icmp ping check, the transmission continues at the specified interval until there is a successful ICMP echo response related to the ICMP echo request. When the first success is received, there is a possible transition from operationally down to operationally up and the function moves to the recovering phase. The icmp ping check packets for the affected service IP interface starts to transmit at frequency (reactivation-interval), invoking loss detection (reactivation-timeout) and consecutive success count (reactivation-threshold). If the reactivation threshold is reached, the service IP interface transitions from operationally down to operationally up. The transmission frequency (interval), loss detection (timeout) and threshold (failure-threshold) are used to monitor the service IP interface.

When a service IP interface transitions from operationally down to operationally up because of icmp ping check the log event ‟UTC WARNING: SNMP #2005 vprn1000999 int-PE-CE-999. Interface int-PE-CE-999 is operational” is generated.

If a failure occurs in the recovering phase, the reactivation-failure-threshold is consulted to determine the number of retries that should be attempted in this phase. This option allows a service IP interface a specified number of retries in this phase before returning to transmitting at interval and those associated values. The reactivation-failure-threshold parameter is bypassed if there was a previous success for the service IP interface in the recovery phase for the latest transition. This parameter determines the number of consecutive failures, without a previous success, before declaring the recovering is not proceeding and returns to the interval values. In larger scale environments this value may need to be increased.

Only packets related to the icmp ping check, ICMP echo request and ARP packets specifically associated with the assigned local ping template, can be sent when the interface is operationally down because of an icmp ping check failure. Only packets related to the icmp ping check, ICMP echo response and ARP packets specifically associated with the assigned local ping template, can be received when the interface is operationally down because of the ping check failure.

A ping check function should never be configured on both peers. This leads to deadlock conditions that can only be resolved by manually disabling the ping template under the interface. As previously stated, only packets associated with the local ping template can be transmitted and received on a service IP interface when the interface is operationally down because of icmp check.

The configured ping template values can be updated without having to change the administrative state or existing references. However, the service IP interfaces that reference a specific ping template configuration imports the values when the ping-template is administratively enabled under the service IP interface. There is no automatic updating of modified ping template values on service IP interfaces referencing a ping-template. To push the changes to the referencing service IP interface the command tools>perform>test-oam>icmp>ping-template-sync template-name is available. This command updates all interfaces that reference the specified ping-template. Executing this command updates all the referencing service IP interfaces in the background after the command is accepted. If there is an HA event and the tools command has not completed updating, all the interfaces that had were not updated at the time of the HA event do not receive the new values. If an HA event occurs and there is a concern that all interfaces may not have received the update the command should be executed again on the newly active. The command does not survive an HA event.

For a service IP interface to import and start using the icmp ping check, the ping-template template-name must be enabled and the destination-address ip-address, must be configured. When the ping-template is added to the service IP interface the values associated with that ping-template are imported. When the ping-template’s administrative state under the service IP interface is enabled, the values are checked again to ensure the latest values associated with the ping-template are being used. The source ip address of the packet is the primary IPv4 address of the service IP interface. This is not a configurable parameter.

When the ping-template command is administratively enabled under a service IP interface that is operationally up, the interface is assumed to have connectivity until proven otherwise. This means the interface state is not affected unless the ping template determines that there are connectivity issues based on the interval, timeout, and failure-threshold commands. If the wanted behavior is for the ping-template to validate service IP interface connectivity before allowing the service IP interface to become operational, the service IP interface can be administratively disabled, the ping-template enabled under that interface, and then the interface administratively enabled. This is considered to be operationally down because of underlying conditions.

When the ping-template command is administratively enabled under a service IP interface that is operationally down because of an underlying condition unrelated to icmp ping check, when the underlying condition is cleared, the icmp ping check prevents the interface from entering the operationally up state until it can verify the connectivity. When the underlying condition is cleared the icmp ping check function enters the recovering phase using the reactivation-interval, reactivation-timeout, reactivation-threshold, and the reactivation-failure-threshold values.

When a node is rebooted, service IP interfaces, with administratively enabled ping templates, must verify the interface connectivity before allowing it to progress to an operationally up state. This ensures that the interface does not bounce from operationally up to operationally down after a reboot and the service IP interface state is properly reflected when the reboot is complete. Service IP interfaces that have an administratively enabled ping-template enter the recovering phase using the reactivation-interval, reactivation-timeout, reactivation-threshold and the reactivation-failure-threshold values following a reboot.

When a soft reset condition is raised icmp ping check state for the service IP interface is held in the same state it entered the process until the soft reset is complete. The interfaces exit the soft reset in the same phase they entered but all counters are cleared. The service IP interfaces that have an administratively enabled ping template enter this held state if they are in any way related to any hardware that is undergoing a soft reset. Two examples to demonstrate the expected behavior are shown below. When a service IP interface is related to a LAG, if a single port member in that LAG is affected by the soft reset, the interface enters this held state. Similarly, if the service IP interface is connected using an R-VPLS configuration it enters the held state.

The protocol used to determine the icmp ping check function has been added to the distributed CPU protection list of protocols, icmp-ping-check. The distributed CPU protection function can be used to limit the amount of icmp ping check packets received on a service IP interface with an enabled ping template. This is an optional configuration that would prevent crossover impact on unrelated service IP interfaces using icmp ping check because of a rogue interface.

The show>service>id>interface ip-int-name detail command has been updated with the ping-template values and operational information. The most effective way to view the output is to use a match criterion for ‟Ping Template Values in Use”. The ‟Ping Template Values in Use” section of the output reports the current values that were imported from the referenced config>test-oam>icmp>ping-template. The ‟Operational Data” section of the output includes the administrative state (Up or Down) and destination address being tested (IP address or notConfigured). It also includes the current interval in use (interval or reactivation-interval) and the current state being reported, (operational, notRunning, failed). There are also pass and fail counters reporting, while in the current state, the number of consecutive passes or fails that have occurred. This provides a stability indicator. If these values are low, it may indicate that even though no operational state transitions have occurred there are intermittent but frequent failures. If neither of these counter are incrementing it is likely an underlying condition has been detected and the icmp ping check is not attempting to send and cannot receive connectivity packets. These counters are cleared when moving between different intervals, and for a soft reset.

show service id <service-id> interface <ip-int-name> detail | match "Ping Template Values in Use" post-lines 29
Ping Template Values in Use
Name             : customer-access-basic
Description      : basic service detection and recovery
Dscp             : nc1
Dot1p            : 7
Interval         : 10
Timeout          : 1
Failure Threshold: 3
React Fail Thresh: 9
React Interval   : 1
React Timeout    : 1
React Threshold  : 3
Size             : 56
TTL              : 1
Ping Template Operational Data
Admin State      : Up
Destination      : 192.9.99.2
Current Interval : Interval
Current State    : Operational
Ping Template Counters
Fail Counter     : 0
Pass Counter     : 107

The show>test-oam>icmp>ping-template and show>test-oam>icmp>ping-template-using have been added to display the various config>test-oam>icmp>ping-template configurations and services referencing the ping templates.

Using icmp ping check enabled on service IP interfaces incur longer recovery delays on failure and reboot because of the additional validations required to validate those interfaces.

The icmp ping check function supports IPv4 interfaces created on SAPs in VRPN and IES services and R-VPLS services, as well as Ethernet satellite (esat) connections. When the service IP interface is making use of an R-VPLS configuration, the interface between the VRPN or IES service and the VPLS service is a virtual connection. In order for the icmp ping check to function properly in R-VPLS environments, the connection being used to validate the peer must be reachable over a SAP.

The icmp ping check should only be used when other purpose-built connectivity checking is not a deployable solution. Interaction with contending protocols may be unexpended.

The interaction between icmp ping check and service IP interface hold-time, in general, the hold-time up option delays the deactivation of the associated IP interface by the specified number of seconds. The hold-time down option delays the activation of the associated IP interface by the specified number of seconds.

With the hold-time up option, if a service IP interface is about to transition from operational up to down because the port transitioned from operational up to down, loss of signal, administrative down, and so on, then hold-time up timer is started. The interface remains operationally up until the timer expires. The icmp ping check runs in parallel because the underlying operational state has been delayed. If it lasts longer than the detection for the icmp ping check it could fail while the interval is counting down. If the hold-time up counter expires the interface transitions to operationally down and the icmp ping check now recognizes the underlying issue and stops trying to transmit. Normal underlying condition recovery noted earlier in this section follow.

If however, the hold-time up is short circuited because the port returns to an operationally up state before the expiration of the hold-time up, the following interactions are noted:

If the icmp ping check has not failed before the port returns to operational up, the service IP interface stays operational and the icmp ping check continues at interval without ever have affecting the operational state of the interface.
If the icmp ping check has registered a failure during this time the service IP interface transitions to operational down because of the icmp ping check and the icmp ping check must recover the interface using the reactivation-interval.

With the hold-time down option, if a service IP interface is about to transition from operationally down to up because the port transitioned from operationally down to up, the interface remains down until the expiration of the down timer. When the timer expires, the icmp ping check follows the normal underlying condition recovery noted earlier in this section follows.

These validations do not support or impact IPv6 interfaces.

There is no support for config>system>enable-icmp-vse Nokia-specific ICMP packets on interfaces that are using ping templates.

ICMP ping check connectivity is only supported on FP3-based and above platforms and should not be configured on any service IP interfaces that are configured over hardware that does not meet this requirement.