Overview

7750 SR and 7450 ESS DHCP server multi-homing ensures continuity of the IP address and prefix assignment and renewal processes when an entire 7750 SR and 7450 ESS DHCP server fails or in case of a failure of the active link that connects clients to one of the 7750 SR and 7450 ESS DHCP servers in the access part of the network. DHCP server multi-homing is an integral part of the overall subscriber management multi-chassis protection scheme.

DHCP server multi-homing can be implemented outside of the BNG, without subscriber management enabled. However, in the following text, it is assumed that the subscriber management multi-homing (SRRP, MC-LAG, subscriber synchronization) is deployed along with DHCP server multi-homing.

Although the subscriber synchronization process and the DHCP lease states synchronization process use the same synchronization infrastructure within 7750 SR and 7450 ESS (Multi Chassis Redundancy protocol), they are two separate processes that are not aware of each other. As such, the mechanisms that drive their switchover are different. For example, the mechanism that drives subscriber switchover from one node to the other is driven by the access protection mechanism (SRRP/MC-LAG) while the switchover (or takeover) of the IP address-range and prefixes in a DHCP pool is driven by the state of the intercommunication link over which the leases are synchronized. The failure of an entire node makes those differences irrelevant because the access-link failure coincides with the intercommunication link failure and the other way around. However, link-only failures become critical when it comes to their interpretation by the protection mechanisms (SRRP, MC-LAG, DHCP server multi-homing). Regardless of nature of the failure, an overall DHCP server multi-chassis protection scheme must be devised in so that the two 7750 SR and 7450 ESS DHCP servers never allocate the same IP address and prefix to two different clients. Otherwise, IP address or prefix duplication ensues. Unique IP address and prefix allocation is achieved by making only one 7750 SR and 7450 ESS DHCP server responsible for IP address prefix delegation out of the shared IP address-range or prefix.

There are two basic models for DHCP server dual-homing:

Shared IP address-ranges and prefixes are designated as local on one 7750 SR and 7450 ESS DHCP server and as remote on the other.

In this case, the DHCP relays must point to both DHCP servers; the one configured with the local IP address-range and prefix as well as the one with the remote IP address-range and prefix.

Under normal circumstances, the new IP addresses and prefixes can be only allocated from the DHCP server configured with the local IP address-range and prefix.

The DHCP server configured with the remote IP address-range and prefix starts delegating new lease from it only when it declares that the redundant peer with the local IP address-range and prefix becomes unavailable.

Detection of the peer unavailability is triggered by the failure of the intercommunication link which can be caused either by the nodal failure or simply by the loss of connectivity between the two nodes protecting each other. Thus, the loss of intercommunication link does not necessarily mean that the peering node is truly gone. It can simply mean that the two nodes became isolated and unable to synchronize their DHCP leases between each other. In such environment, both nodes can potentially allocate the same IP address at the same time. To prevent this, additional intercommunication link states and associated timers are introduced to give the operator ample time to fix the problem.

For example, the DHCP server takes over the remote IP address-range and prefix after the MCLT period expires while the intercommunication link is in PARTNER-DOWN state. The PARTNER-DOWN state is entered after a preconfigured timer (partner-down-delay) expires. The consequence of these two additional timers (partner-down-delay and MCLT) is that the new IP address delegation from the remote (shared) IP address-range and prefix is not possible until the preset timers expire. This is needed and justified if the intercommunication link is interrupted, the nodes become isolated, and consequently, the DHCP lease state synchronization becomes impaired. On the other hand, if the DHCP server with local IP address-range and prefix becomes truly unavailable, those additional restoration times causes interruption in service because the new IP addresses from the remote IP address-range and prefix is not immediately available for delegation.

Only new IP address delegation from the remote IP address-range and prefix is affected by this behavior. The existing IP leases can be extended on both nodes at any time irrespective of whether the configured address-range and prefix is designated as local or remote.

To ensure uninterrupted service even for new lease delegation in this model (local-remote), two approaches can be adopted:
- Segment the IP address and prefix space so that each node has an IP address-range and prefix designated as local. For example, instead of designating IP address-range 10.10.10.0/24 as local on DHCP server A and as remote on DHCP server B, the 10.10.10./24 IP address is split into two: 10.10.10.0/25 and 10.10.10.128/25. The 10.10.10.0/25 would be designated as local on the DHCP server A and as remote on DHCP server B. The 10.10.10.128/25 would be designated as remote on the DHCP server A and as local on DHCP server B. In this fashion, one node is always available to assign new leases without any overlap.
- If only one shared IP address-rage/prefix is deployed, the operator can bypass the timers (partner-down-delay and MCLT) that are put in place in case that DHCP server nodes become isolated. This bypass of the timers can be achieved with configuration. In this case, a safe operation is warranted only if the operator is confident that the intercommunication link failure is caused by the nodal failure, and not the physical link failure between the two nodes.
Shared IP address-range and prefix is designated as access-driven on both 7750 SR and 7450 ESS DHCP servers.

In this scenario, the shared IP address-range and prefix is owned by both nodes and the ownership is not driven by the state of the intercommunication link.

To avoid IP address duplication, only one DHCP server at any time must be responsible for IP address assignment from this shared IP address-range and prefix.

This is ensured by the access protection mechanism (SRRP/MC-LAG) that provides a single active path from the clients to the one of the DHCP servers.

In case that clients have access to the same IP address-range and prefix on both DHCP servers at the same time, the IP address duplication may occur.

Consider the following case:

Two DHCP clients send DHCP Discovers in the following fashion:
- DHCP client A sends DHCP Discover to the DHCP server A
- DHCP client B sends DHCP Discover to the DHCP server B
- DHCP server A assigns IP address 10.10.10.10 to the DHCP client A
- DHCP server B assigns IP address 10.10.10.10 to the DHCP client B
- This is a legitimate scenario because the DHCP lease states are not synchronized until the DHCP lease assignment is completed.
- Just before the DHCP ACK is sent to the respective clients from both nodes, the DHCP lease sync messages are exchanged between the peers.
- DHCP servers do not wait for the reply to the sync message before they send the DHCP Ack to the client.
- After the DHCP lease syn message is received from the peer, the DHCP server realizes that the IP lease already exist. In this case, the newer IP lease overrides the older.
- The result is that clients A and B use the same IP address and consequently the forwarding of the traffic is impaired.
In access-driven model, the ESM subscriber host must be colocated with the DHCP server. In other words, the DHCP server must be instantiated in redundant BNGs. The dhcp-relays must point to the respective local DHCP servers. There must be no cross-referencing of DHCP servers in this model. In addition, the IP address that the DHCP servers are associated with, must be the same on both DHCP servers. This is necessary to ensure uninterrupted service levels when the switchover in the access occurs.

For example:
- DHCP server A on BNG A is associated with the IP address 1.1.1.1 (for example loopback interface A on BNG A)
- DHCP server B on the peering BNG B is associated with the IP address 1.1.1.1 (configured in BNG B under loopback interface B)
- DHCP relay on BNG A points to the IP address 1.1.1.1 (DHCP server in BNG A)
- DHCP relay on BNG B points to the IP address 1.1.1.1 (DHCP server in BNG B)

Consider the following when contemplating deployment of the two described models:

Local-remote model is agnostic of the access protection mechanism. In fact, the access protection mechanism is not needed at all for safe operation.

Fast takeover of the single shared (remote) IP address-range and prefix can be provided only in cases where the operator can guarantee that the intercommunication link failure is caused by the nodal failure (entire DHCP server node becomes unavailable). Fast takeover is provided by bypassing the partner-down-timer and MCLT.

If multiple IP address-ranges and prefixes are deployed, bypass of the timers is not needed because the local IP address-ranges and prefixes are available on both nodes.
Access-driven model allows a single IP address-range and prefix to be shared across the redundant DHCP server nodes. The access to the single DHCP server node from the client side is ensured by the protection mechanism deployed in the access part of the network (SRRP or MC-LAG).