Intercommunication link failure detection

7750 SR and 7450 ESS DHCP server is a client of Multi-chassis Synchronization (MCS) application with 7750 SR and 7450 ESS. After MCS transitions into an out-of-sync state, the 7750 DHCP server redundancy assumes that there is a failure in the network. The DHCP server failure in dual-chassis configuration relies on the failure detection mechanism of MCS.

MCS runs over TCP, port 45067 and it is using either data traffic or keepalives to detect failure on the communication link between the two nodes. In the absence of any MCS data traffic for more than 0.5 seconds, MCS sends its own keepalive to the peer. If a reply is not received within 3 seconds, MCS declares its operational state as down and the database sync state as out-of-sync. MCS consequently notifies its clients (DHCP Server being one of them) of this condition.

It can take up to 3 seconds before the DHCP client realizes that the interclass communication link failed.

MCS clients (applications) can optimally send their own proprietary keepalive messages to its partner over MCS to detect failure. DHCP Server does not use this method and it strictly relies on the failure notifications by MCS.

Note that the intercommunication link failure does not necessarily assume the same failed fate for the access links. In other words, it is perfectly possible (although unlikely) that both access links are operational while the inter-chassis communication link is broken.

The failure detection of the intercommunication link leads to specific failover state transitions on the DHCP server. The DHCP lease handling in the local-remote model depends on the failover state on the DHCP server and the duration of each failover state is determined by preconfigured timers.