Allowing ETH-CFM to understand the state of MC-LAG and adjust the behavior of the MP (MEP and MIP) according to that state has benefits.
MC-LAG represents the two upstream nodes as a single system to the node terminating a standard LAG. Linking the ETH-CFM MPs to the state of the MC-LAG allows the operator to configure MPs across the two boxes that appear the same. Under the default configuration, this would introduce various defect conditions to be raised and event conditions. However, when ETH-CFM is tracking the state of the MC-LAG, the MPs performs a role that represents the state of the resiliency mechanism. To enable this new behavior, configure the system-wide command standby-mep-shutdown under the config>eth-cfm>redundancy>mc-lag hierarchy.
When a MP is part of the active MC-LAG system, it performs as a normal MP: terminating, generating, responding to, and processing all appropriate ETH-CFM packets. An MP that is on the standby MC-LAG node enters a pseudo-shutdown state. These MPs terminates all ETH-CFM that are part of the regular interception process, but does not process them. They are silently discarded. Also, an MP that exists on a standby MC-LAG system does not generate any ETH-CFM packets. All proactive and on-demand functions are blocked on the standby MC-LAG node. When scheduled tests are executed through SAA these test attempt to execute. The tests record failures as a result of the MEP state. These failures are not representative of the network.
This feature relies on the correct configuration, design, and deployment of the MC-LAG protocol. There are numerous optimizations and configuration parameters that are available as part of the MC-LAG functions. For example, by default, when a currently active MC-LAG port transitions to standby, by any means including manual operator intervention, the remote node terminating the standard LAG sees the LAG transition because all ports in the LAG are down for an instance in time. This is standard LAG behavior does not change as a result of the linkage of MP state to MC-LAG state. This transition causes the propagation of faults for MEPs configured on that node. Normal architectural LAG design must take these types of events into consideration. MC-LAG provides numerous tuning parameters that need to be considered before deploying in the field. These include a hold-time down option on the node terminating the standard LAG, as well as other parameters for revertive behavior such as the hold-time up option. It is important to ensure that the operator’s specific environment be taken into consideration when tuning the MC-LAG parameters to avoid the propagation of error conditions during normal recover events. In the case that the resumption of data forwarding exceed the timeout value of a MEP (3.5 times the CCM-Interval), the appropriate defect conditions are raised.
ETH-CFM registers a fault propagation delay timer equal to propagate-hold-time under the config>eth-cfm>redundancy>mc-lag hierarchy (default of 1s) to delay notification of an event that may be a result of MC-LAG failover. This allows the system time to coordinate events and triggers that together represent the MC-LAG transition from active to standby.
A fixed timer value of 1s delays an UP MEP from announcing a SAP down condition through CCM Interface-Status-TLV bits, is Down. ETH-CFM maintains a status of last sent to the UP MEPs peer. When the SAP transitions either to UP or DOWN that fault is held for the fixed 1s interval and the last Interface-Status-TLV bits are set based on the previous transmission. If the condition, different from the previous sent, still exists at the end of the 1s fixed timer and when the next CCM interval expires, the representative value of the SAP is sent in the Interface-Status-TLV. These two timers help to smooth out network transitions at the cost of propagation and clearing of faults.
When a node with ETH-CFM linked to MC-LAG is transitioning from standby to active ETH-CFM assumes there are no underlying conditions for any of the SAPs that are now part of the newly activating MC-LAG. The initial notification to an UP MEPs peer does not include any faults. It assumes that the transitioning SAPs are stabilizing as the switchover proceeds. The fixed 1s timer starts and a second CCM PDU based on the UP MEPs interval is sent without any recognition of potential fault on the SAP. However, after the expiration of the fixed timer and on the next CCM-Interval, the Interface-Status-TLV represents the state of the SAP.
In scaled environments it is important to configure the propagation-hold-time and the CCM intervals to achieve the needed goals. If these timers are set too aggressively, then fault and defect conditions may be generated during times of network stabilization. The use of fault propagation and AIS transmission needs to be carefully considered in environments where MC-LAG protection mechanisms are deployed. Timer values do not guarantee that transitional state is not propagated to the peer. The propagation of such state may be more taxing and disruptive that allowing the transmission states to complete. For example, if AIS generation is being used in this type of solution the operator should use a 60s AIS interval to avoid transitional state from being advertised.
AIS generation is paced in a first come first serve model not to exceed the system capability, scale is dependent on the type of system. If AIS is configured in an MC-LAG solution the operator must make sure that the same MEPs on each system are configured to generate AIS and this number does not exceed the maximum. This would require the operator to configure both nodes with the same MEPs that can generate AIS and not exceed the system capacity. If the nodes are configured differently or exceed the system scale there is a very high potential where a transition may see a different set of MEPs pacing out the AIS than the original set of MEPs. There is no synchronization of AIS state across nodes.
Administrative functions, like admin down, are special cases. When the administrative state changes from up to down, the timer is bypassed and communication from ETH-CFM is immediate.
When an MP is configured in an MC-LAG environment, Nokia recommends that each aspect of the MP be configured the same, including MAC address. Also, although this may be obvious, both nodes participating in the MC-LAG requiring this functionality should include the global command in the config>eth-cfm>redundancy>mc-lag>standby-mep>shutdown context to avoid unpredictable behavior.
In summary, a SAP with ETH-CFM tracking the state of the MC-LAG represents the state of the MC-LAG. MPs configured on the standby MC-LAG ports enters a state similar to shutdown. MPs on the MC-LAG ports on the active MC-LAG ports performs all normal processing.
The following illustration, shows how MEPS can be linked to MC-LAG state. In this example, a service MEP is created on the LAG SAP on NODE1 within service VPLS 100. The MEPs configured on the MC-LAG nodes within service 100 are both configured the same. Both MEPs use the same MEP-ID, the same MAC address.
Only one of the MEPs on the MC-LAG nodes is active for VPLS service 100. The other MEP is in a shutdown mode, so that even when the MC-LAG is in standby and the port state is Link Up, the MEP is in a pseudo shutdown state.
The following configuration example is not meant to provide all possible MC-LAG configuration statement to tune each provider’s network. It does provide a base configuration to demonstrate the ETH-CFM feature.
NODE1
config>port# info (both ports)
----------------------------------------------
ethernet
mode access
encap-type qinq
autonegotiate limited
exit
no shutdown
----------------------------------------------
config>lag# info
----------------------------------------------
mode access
encap-type qinq
access
adapt-qos link
exit
port 1/1/5
port 1/1/6
lacp active administrative-key 32768
hold-time down 10
no shutdown
----------------------------------------------
config>eth-cfm# info
----------------------------------------------
domain 3 format none level 3
association 1 format icc-based name "03-0000000100"
bridge-identifier 100
exit
ccm-interval 1
remote-mepid 101
exit
exit
----------------------------------------------
config>service>vpls# info
----------------------------------------------
stp
shutdown
exit
sap 1/1/3:100.100 create
exit
sap lag-1:100.100 create
eth-cfm
mep 100 domain 3 association 1 direction down
ccm-enable
mac-address d0:0d:1e:00:01:00
no shutdown
exit
exit
exit
no shutdown
----------------------------------------------
TOP (MC-LAG Standby)
config>port# info
----------------------------------------------
ethernet
mode access
encap-type qinq
autonegotiate limited
exit
no shutdown
----------------------------------------------
config>lag# info
----------------------------------------------
mode access
encap-type qinq
access
adapt-qos link
exit
port 1/1/2
lacp active administrative-key 32768
no shutdown
----------------------------------------------
config>router# info
----------------------------------------------
#--------------------------------------------------
echo "IP Configuration"
#--------------------------------------------------
interface "Core2"
address 192.168.1.2/30
port 1/2/2
exit
interface "system"
exit
----------------------------------------------
config>redundancy# info
----------------------------------------------
multi-chassis
peer 192.168.1.1 create
source-address 192.168.1.2
mc-lag
lag 1 lacp-key 1 system-id 00:00:00:00:00:01 system-priority
100
no shutdown
exit
no shutdown
exit
exit
synchronize boot-env
----------------------------------------------
config>eth-cfm# info
----------------------------------------------
domain 3 format none level 3
association 1 format icc-based name "03-0000000100"
bridge-identifier 100
exit
ccm-interval 1
remote-mepid 100
exit
exit
redundancy
mc-lag
standby-mep-shutdown
exit
exit
----------------------------------------------
config>service>vpls# info
----------------------------------------------
stp
shutdown
exit
sap lag-1:100.100 create
eth-cfm
mep 101 domain 3 association 1 direction down
exit
ccm-enable
mac-address d0:0d:1e:00:01:01
no shutdown
exit
exit
exit
no shutdown
----------------------------------------------
# show lag 1
===============================================================================
Lag Data
===============================================================================
Lag-id Adm Opr Port-Threshold Up-Link-Count MC Act/Stdby
-------------------------------------------------------------------------------
1 up down 0 0 standby
===============================================================================
# show port
===============================================================================
Ports on Slot 1
===============================================================================
Port Admin Link Port Cfg Oper LAG/ Port Port Port C/QS/S/XFP/
Id State State MTU MTU Bndl Mode Encp Type MDIMDX
-------------------------------------------------------------------------------
… snip …
1/1/2 Up Yes Link Up 1522 1522 1 accs qinq xcme
…snip…
==========================================================================
BOT (MC-LAG Active)
config>port# info
----------------------------------------------
ethernet
mode access
encap-type qinq
autonegotiate limited
exit
no shutdown
----------------------------------------------
config>lag# info
----------------------------------------------
mode access
encap-type qinq
access
adapt-qos link
exit
port 1/1/2
lacp active administrative-key 32768
no shutdown
----------------------------------------------
config>router# info
----------------------------------------------
#--------------------------------------------------
echo "IP Configuration"
#--------------------------------------------------
interface "Core1"
address 192.168.1.1/30
port 1/2/1
exit
interface "system"
exit
----------------------------------------------
config>redundancy# info
----------------------------------------------
multi-chassis
peer 192.168.1.2 create
source-address 192.168.1.1
mc-lag
lag 1 lacp-key 1 system-id 00:00:00:00:00:01 system-priority
100
no shutdown
exit
no shutdown
exit
exit
synchronize boot-env
----------------------------------------------
config>eth-cfm# info
----------------------------------------------
domain 3 format none level 3
association 1 format icc-based name "03-0000000100"
bridge-identifier 100
exit
ccm-interval 1
remote-mepid 100
exit
exit
redundancy
mc-lag
standby-mep-shutdown
exit
exit
----------------------------------------------
config>service>vpls# info
----------------------------------------------
stp
shutdown
exit
sap lag-1:100.100 create
eth-cfm
mep 101 domain 3 association 1 direction down
exit
ccm-enable
mac-address d0:0d:1e:00:01:01
no shutdown
exit
exit
exit
no shutdown
----------------------------------------------
# show lag 1
===============================================================================
Lag Data
===============================================================================
Lag-id Adm Opr Port-Threshold Up-Link-Count MC Act/Stdby
-------------------------------------------------------------------------------
1 up up 0 1 active
===============================================================================
# show port
===============================================================================
Ports on Slot 1
===============================================================================
Port Admin Link Port Cfg Oper LAG/ Port Port Port C/QS/S/XFP/
Id State State MTU MTU Bndl Mode Encp Type MDIMDX
-------------------------------------------------------------------------------
…snip…
1/1/2 Up Yes Up 1522 1522 1 accs qinq xcme
…snip…
===============================================================================