AAA RADIUS server operation status

The different operating states of a RADIUS server are shown in Figure: RADIUS server operating states. When a RADIUS server is first provisioned into the AAA using the radius-server-policy command, the operating state is ‟unknown”. This state indicates that the RADIUS server has yet to receive a RADIUS request message. To send a request message, the radius-server-policy command provides three different access algorithms: direct, round-robin, and hash. With the direct algorithm, request messages are always sent to the in-service RADIUS server with the lowest configured server index. With the round-robin algorithm, the RADIUS requests are load-balanced in a round-robin manner. The hash algorithm offers a load-balanced alternative; the 7750 SR generates a hash-key based on the subscriber information, and the RADIUS request is then sent to a server based on the hash key. The hash method differs from the round-robin method in that, under normal working conditions, RADIUS requests from a particular subscriber are always forwarded to the same RADIUS server. When a server replies to a RADIUS request, it transitions from the operational state of ‟unknown” to ‟in-service”. A server may transition from ‟unknown” to ‟out-of-service” if the server fails to respond to the initial RADIUS message.

Figure: RADIUS server operating states

A RADIUS server is declared ‟out-of-service” when the down-timeout timer expires. The router starts the down-timeout timer when an access-request is sent. The timer only resets to ‟0” when a reply is received from the RADIUS server. This means that the timer can be reset to ‟0” if a reply message is received for another subscriber. For example, the RADIUS server may miss a message but stay ‟in-service” if the server responds to another access request from a different subscriber or from a retry of the same subscriber, if the reply is received within the down-timeout interval.

Note: It is highly recommended that the down-timeout command be set to its default value, which is down-timeout.

The down-timeout default value is the timeout value multiplied by the number of retry attempts. The timeout value is the time that the router waits for the RADIUS server to reply, and the retry value is the number of attempts the 7750 SR makes to contact the RADIUS server. If the RADIUS server remains unresponsive, the timer continues to increment until it reaches the configured down-timeout value and the server is declared ‟out-of-service”.

For RADIUS servers that do not respond to all RADIUS requests, a test user account can be optionally set up to periodically send RADIUS request messages to keep the server in service. Typically, a RADIUS server should always respond to all access requests. However, creating a test user account for periodic keep-alive may place an unnecessary load on the processor and may lower the overall scale of the router.

At the start of the out-of-service state, a down-timeout timer starts. The timer holds down the RADIUS server and prevents it from operating; no RADIUS messages are sent to an out-of-service server. This is beneficial for the following reasons.

After the hold-down-time timer expires, the server enters into the ‟probing” state. There must be multiple RADIUS servers and at least one healthy server for the server to enter the probing state. Probing is always performed by the test user account; actual subscriber requests are never used during probing. If no test user account exists, an actual subscriber request is used to perform the probe. There are no retry attempts; only a single RADIUS message is used to probe a RADIUS server. If the RADIUS server responds, it is declared ‟in-service” immediately. If the RADIUS server fails to respond within the timeout value, it is declared ‟out-of-service” again and the hold-down-time timer restarts. Subscriber RADIUS messages used for probing are not cached, and if the server fails to respond, the subscriber is required to send the RADIUS message again by sending an address request; for example, DHCP, PPP, or Stateless Address Auto-Configuration (SLAAC) or by performing a data-trigger.