Each configured Diameter node in SR OS can support several peers that are simultaneously open. Only one of those peers is used to forward application messages for a specific user session. If there are multiple peers for the same realm, then the next-hop peer with the numerically lowest preference value is selected. If all peers have same preference for a specific realm, then the peer index is used to break the tie.
Peer failover is performed by a Diameter base protocol and is supported only for application request messages (for example, a peer failover does not apply to a CER message). Events that trigger peer failover can be categorized as:
explicit notifications from a peer through an error message, informing the recipient of the error message that the peer cannot deliver the request message to the destination
The receipt of the DIAMETER_UNABLE_TO_DELIVER (3002) or DIAMETER_TOO_BUSY (3004) protocol error messages triggers a retransmission of the original request messages to the next best peer. The retransmission bit (T-bit) in the retransmitted message is set. At the same time, the Tx timer (a configurable parameter in the SR OS) is started. Continued errored replies (3002/3004) for the retransmitted messages over newly selected peers continue to trigger the selection of the next best peer until:
a valid peer is found
all eligible peers are attempted without success
the Tx timer expires
If there are no viable peers available to deliver the request message (3002/3004 errors received from all attempted peers), or the Tx timer expires, the Diameter base notifies the application layer (NASREQ, Gx, Gy) which may invoke a server failover procedure (if enabled on the application level by configuration).
termination of an active peering connection
A peering connection can be explicitly closed by either side because of a Disconnect-Peer-Request message attributed to the timeout of the watchdog messages or any other error on the TCP transport level. Termination of an active peer triggers a search for the next best peer through forwarding/routing for all messages in the transmit queue toward that peer. The T-bit is set on all retransmitted messages.
If no eligible peers are left, the Diameter base notifies the application layer (NASREQ, Gx, Gy) which may invoke a server failover procedure (if enabled on the application level through configuration).
The destination-host AVP in the retransmitted messages because of the peer failover remains the same as in the original request message.