CRC (Cyclic Redundancy Check) Monitoring

CRC errors typically occur when Ethernet links are compromised due to optical fiber degradation, weak optical signals, bad optical connections, or problems on a third-party networking element. As well, higher-layer OAM options such as EFM and BFD may not detect errors and trigger appropriate alarms and switchovers if the errors are intermittent, since this does not affect the continuous operation of other OAM functions.

CRC error monitoring on Ethernet ports allows degraded links to be alarmed or failed in order to detect network infrastructure issues, trigger necessary maintenance, or switch to redundant paths. This is achieved through monitoring ingress error counts and comparing them to the configured error thresholds. The rate at which CRC errors are detected on a port can trigger two alarm states. Crossing the configured signal degrade (SD) threshold (sd-threshold) causes an event to be logged and an alarm to be raised, which alerts the operator to a potential issue on a link. Crossing the configured signal failure (SF) threshold (sf-threshold) causes the affected port to enter the operationally down state, and causes an event to be logged and an alarm to be raised.

The CRC error rates are calculated as M✕10E-N, which is the ratio of errored frames allowed for total frames received. The operator can configure both the threshold (N) and a multiplier (M). If the multiplier is not configured, the default multiplier (1) is used.

For example, setting the SD threshold to 3 results in a signal degrade error rate threshold of 1✕10E-3 (1 errored frame per 1000 frames). Changing the configuration to an SD threshold of 3 and a multiplier of 5 results in a signal degrade error rate threshold of 5✕10E-3 (5 errored frames per 1000 frames). The signal degrade error rate threshold must be lower than the signal failure error rate threshold because it is used to notify the operator that the port is operating in a degraded but not failed condition.

A sliding window (window-size) is used to calculate a statistical average of CRC error statistics collected every second. Each second, the oldest statistics are dropped from the calculation. For example, if the default 10-s sliding window is configured, at the 11th second the oldest second of statistical data is dropped and the 11th second is included. This sliding average is compared against the configured SD and SF thresholds to determine if the error rate over the window exceeds one or both of the thresholds, which will generate an alarm and log event.

When a port enters the failed condition as a result of crossing an SF threshold, the port is not automatically returned to service. Because the port is operationally down without a physical link, error monitoring stops. The operator can enable the port by using the shutdown and no shutdown port commands or by using other port transition functions such as clearing the MDA (clear mda command) or removing the cable. A port that is down due to crossing an SF threshold can also be re-enabled by changing or disabling the SD threshold. The SD state is self-clearing, and it clears if the error rate drops below 1/10th of the configured SD rate.