Update message error handling

The approach to handling update message errors has evolved in the past couple of years. The original BGP protocol specification called for all update message errors to be handled the same way (that is, send a notification to the peer and immediately close the BGP session). This error handling approach was motivated by the goal to ensure protocol ‟correctness” above all else. But, it ignored several important points.

In recognition of these points and the general trend toward more flexibility in BGP error handling, SR OS supports a BGP configuration option called update-fault-tolerance that allows the operator to decide whether the router should apply new or legacy error handling procedures to update message errors. If update-fault-tolerance is configured, then non-critical errors as described above are handled using the ‟treat-as-withdraw” or ‟attribute-discard” approaches to error handling; these approaches do not cause a session reset. If update-fault-tolerance is not configured then legacy procedures continue to apply and all errors (critical and non-critical) trigger a session reset.

If the update-fault-tolerance command was previously configured and a non-critical error was already triggered, the BGP session is still reset when the operator configures no update-fault-tolerance.