Each BGP RIB with IP routes (unlabeled IPv4, labeled-unicast IPv4, unlabeled IPv6, and labeled-unicast IPv6) submits its best path for each prefix to the common IP route table, unless the disable-route-table-install command is configured or the selective-label-ipv4-install command has prevented the installation. The best path is selected by the BGP decision process. The default preference for BGP routes submitted by the label-IPv4 and label-IPv6 RIBs (these appear in the route table and FIB as having a BGP-LABEL protocol type) can be modified by using the label-preference command. The default preference for BGP routes submitted by the unlabeled IPv4 and IPv6 RIBs can be modified by using the preference command.
If a BGP RIB has multiple BGP paths for the same IPv4 or IPv6 prefix that qualify as the best path up to a specific point in the comparison process, then a specified number of these multi-paths can be submitted to the common IP route table. This is called BGP multi-path and must be explicitly enabled using one or more commands in the multi-path context. These commands specify the maximum number of BGP paths, including the overall best path, that each BGP RIB can submit to the route table for any particular IPv4 or IPv6 prefix. If ECMP, with a limit of n, is enabled in the base router instance, then up to n paths are selected for installation in the IP FIB. In the data-path, traffic matching the IP route is load-shared across the ECMP next hops based on a per-packet hash calculation.
By default, the hashing is not sticky, meaning that when one or more of the ECMP BGP next hops fail, all traffic flows matching the route are potentially moved to new BGP next hops. If required, a BGP route can be marked (using the sticky-ecmp action in route policies) for sticky ECMP behavior so that BGP next hop failures are handled by moving only the affected traffic flows to the remaining next hops as evenly as possible. If new ECMP BGP next hops become available for a marked BGP, then route flows are moved as evenly as possible onto the resultant set of next hops.
Each sticky ECMP route uses 64 distribution buckets to apportion flows onto the available next hops. Figure: Sticky ECMP flow distribution as next hops are removed part 1, Figure: Sticky ECMP flow distribution as next hops are removed part 2, and Figure: Sticky ECMP flow distribution as next hops are removed part 3 provide an example of the distribution of flows over multiple BGP next hops as next hops are removed.
Table: Sticky ECMP flow distribution as next hops are removed for 1.1.1.1/32 lists the sticky ECMP flow distribution as next hops are removed for 1.1.1.1/32.
Initial sticky ECMP distribution for 1.1.1.1/32 in Figure: Sticky ECMP flow distribution as next hops are removed part 1 | ECMP distribution for 1.1.1.1/32 if next hop 3 fails in Figure: Sticky ECMP flow distribution as next hops are removed part 2 | ECMP distribution for 1.1.1.1/32 if next hop 2 subsequently fails in Figure: Sticky ECMP flow distribution as next hops are removed part 3 | |||
---|---|---|---|---|---|
Bucket | NH | Bucket | NH | Bucket | NH |
00 |
1 |
00 |
1 |
00 |
1 |
01 |
2 |
01 |
2 |
01 |
1 |
02 |
3 |
02 |
1 |
02 |
1 |
03 |
1 |
03 |
1 |
03 |
1 |
04 |
2 |
04 |
2 |
04 |
1 |
05 |
3 |
05 |
2 |
05 |
1 |
06 |
1 |
06 |
1 |
06 |
1 |
07 |
2 |
07 |
2 |
07 |
1 |
08 |
3 |
08 |
1 |
08 |
1 |
09 |
1 |
09 |
1 |
09 |
1 |
10 |
2 |
10 |
2 |
10 |
1 |
11 |
3 |
11 |
2 |
11 |
1 |
12 |
1 |
12 |
1 |
12 |
1 |
13 |
2 |
13 |
2 |
13 |
1 |
14 |
3 |
14 |
1 |
14 |
1 |
15 |
1 |
15 |
1 |
15 |
1 |
16 |
2 |
16 |
2 |
16 |
1 |
17 |
3 |
17 |
2 |
17 |
1 |
18 |
1 |
18 |
1 |
18 |
1 |
19 |
2 |
19 |
2 |
19 |
1 |
20 |
3 |
20 |
1 |
20 |
1 |
21 |
1 |
21 |
1 |
21 |
1 |
22 |
2 |
22 |
2 |
22 |
1 |
23 |
3 |
23 |
2 |
23 |
1 |
24 |
1 |
24 |
1 |
24 |
1 |
25 |
2 |
25 |
2 |
25 |
1 |
26 |
3 |
26 |
1 |
26 |
1 |
27 |
1 |
27 |
1 |
27 |
1 |
28 |
2 |
28 |
2 |
28 |
1 |
29 |
3 |
29 |
2 |
29 |
1 |
30 |
1 |
30 |
1 |
30 |
1 |
31 |
2 |
31 |
2 |
31 |
1 |
32 |
3 |
32 |
1 |
32 |
1 |
33 |
1 |
33 |
1 |
33 |
1 |
34 |
2 |
34 |
2 |
34 |
1 |
35 |
3 |
35 |
2 |
35 |
1 |
36 |
1 |
36 |
1 |
36 |
1 |
37 |
2 |
37 |
2 |
37 |
1 |
38 |
3 |
38 |
1 |
38 |
1 |
39 |
1 |
39 |
1 |
39 |
1 |
40 |
2 |
40 |
2 |
40 |
1 |
41 |
3 |
41 |
2 |
41 |
1 |
42 |
1 |
42 |
1 |
42 |
1 |
43 |
2 |
43 |
2 |
43 |
1 |
44 |
3 |
44 |
1 |
44 |
1 |
45 |
1 |
45 |
1 |
45 |
1 |
46 |
2 |
46 |
2 |
46 |
1 |
47 |
3 |
47 |
2 |
47 |
1 |
48 |
1 |
48 |
1 |
48 |
1 |
49 |
2 |
49 |
2 |
49 |
1 |
50 |
3 |
50 |
1 |
50 |
1 |
51 |
1 |
51 |
1 |
51 |
1 |
52 |
2 |
52 |
2 |
52 |
1 |
53 |
3 |
53 |
2 |
53 |
1 |
54 |
1 |
54 |
1 |
54 |
1 |
55 |
2 |
55 |
2 |
55 |
1 |
56 |
3 |
56 |
1 |
56 |
1 |
57 |
1 |
57 |
1 |
57 |
1 |
58 |
2 |
58 |
2 |
58 |
1 |
59 |
3 |
59 |
2 |
59 |
1 |
60 |
1 |
60 |
1 |
60 |
1 |
61 |
2 |
61 |
2 |
61 |
1 |
62 |
3 |
62 |
1 |
62 |
1 |
63 |
1 |
63 |
1 |
63 |
1 |
Figure: Sticky ECMP flow distribution as next hops are added part 1, Figure: Sticky ECMP flow distribution as next hops are added part 2, and Figure: Sticky ECMP flow distribution as next hops are added part 3 provide an example of the distribution of flows over multiple BGP next hops as next hops are added.
Table: Sticky ECMP flow distribution as next hops are added for 1.1.1.1/32 lists the sticky ECMP flow distribution as next hops are added for 1.1.1.1/32.
Initial sticky ECMP distribution for 1.1.1.1/32 in Figure: Sticky ECMP flow distribution as next hops are added part 1 | ECMP distribution for 1.1.1.1/32 if next hop 2 becomes available in Figure: Sticky ECMP flow distribution as next hops are added part 2 | ECMP distribution for 1.1.1.1/32 if next hop 3 additionally becomes available in Figure: Sticky ECMP flow distribution as next hops are added part 3 | |||
---|---|---|---|---|---|
Bucket | NH | Bucket | NH | Bucket | NH |
00 |
1 |
00 |
1 |
00 |
1 |
01 |
1 |
01 |
2 |
01 |
2 |
02 |
1 |
02 |
1 |
02 |
3 |
03 |
1 |
03 |
2 |
03 |
2 |
04 |
1 |
04 |
1 |
04 |
1 |
05 |
1 |
05 |
2 |
05 |
3 |
06 |
1 |
06 |
1 |
06 |
1 |
07 |
1 |
07 |
2 |
07 |
2 |
08 |
1 |
08 |
1 |
08 |
3 |
09 |
1 |
09 |
2 |
09 |
2 |
10 |
1 |
10 |
1 |
10 |
1 |
11 |
1 |
11 |
2 |
11 |
3 |
12 |
1 |
12 |
1 |
12 |
1 |
13 |
1 |
13 |
2 |
13 |
2 |
14 |
1 |
14 |
1 |
14 |
3 |
15 |
1 |
15 |
2 |
15 |
2 |
16 |
1 |
16 |
1 |
16 |
1 |
17 |
1 |
17 |
2 |
17 |
3 |
18 |
1 |
18 |
1 |
18 |
1 |
19 |
1 |
19 |
2 |
19 |
2 |
20 |
1 |
20 |
1 |
20 |
3 |
21 |
1 |
21 |
2 |
21 |
2 |
22 |
1 |
22 |
1 |
22 |
1 |
23 |
1 |
23 |
2 |
23 |
3 |
24 |
1 |
24 |
1 |
24 |
1 |
25 |
1 |
25 |
2 |
25 |
2 |
26 |
1 |
26 |
1 |
26 |
3 |
27 |
1 |
27 |
2 |
27 |
2 |
28 |
1 |
28 |
1 |
28 |
1 |
29 |
1 |
29 |
2 |
29 |
3 |
30 |
1 |
30 |
1 |
30 |
1 |
31 |
1 |
31 |
2 |
31 |
2 |
32 |
1 |
32 |
1 |
32 |
3 |
33 |
1 |
33 |
2 |
33 |
2 |
34 |
1 |
34 |
1 |
34 |
1 |
35 |
1 |
35 |
2 |
35 |
3 |
36 |
1 |
36 |
1 |
36 |
1 |
37 |
1 |
37 |
2 |
37 |
2 |
38 |
1 |
38 |
1 |
38 |
3 |
39 |
1 |
39 |
2 |
39 |
2 |
40 |
1 |
40 |
1 |
40 |
1 |
41 |
1 |
41 |
2 |
41 |
3 |
42 |
1 |
42 |
1 |
42 |
1 |
43 |
1 |
43 |
2 |
43 |
2 |
44 |
1 |
44 |
1 |
44 |
3 |
45 |
1 |
45 |
2 |
45 |
2 |
46 |
1 |
46 |
1 |
46 |
1 |
47 |
1 |
47 |
2 |
47 |
3 |
48 |
1 |
48 |
1 |
48 |
1 |
49 |
1 |
49 |
2 |
49 |
2 |
50 |
1 |
50 |
1 |
50 |
3 |
51 |
1 |
51 |
2 |
51 |
2 |
52 |
1 |
52 |
1 |
52 |
1 |
53 |
1 |
53 |
2 |
53 |
3 |
54 |
1 |
54 |
1 |
54 |
1 |
55 |
1 |
55 |
2 |
55 |
2 |
56 |
1 |
56 |
1 |
56 |
3 |
57 |
1 |
57 |
2 |
57 |
2 |
58 |
1 |
58 |
1 |
58 |
1 |
59 |
1 |
59 |
2 |
59 |
3 |
60 |
1 |
60 |
1 |
60 |
1 |
61 |
1 |
61 |
2 |
61 |
2 |
62 |
1 |
62 |
1 |
62 |
3 |
63 |
1 |
63 |
2 |
63 |
2 |
A BGP route to an IPv4 or IPv6 prefix is a candidate for installation as an ECMP next hop only if it meets all of the following criteria.
The multi-path route must be the same type of route as the best path (same AFI/SAFI and, in some cases, same next-hop resolution method).
The multi-path route must be tied with the best path for all criteria of greater significance than next-hop cost, except for criteria that are configured to be ignored.
If the best path selection reaches the next-hop cost comparison, the multi-path route must have the same next-hop cost as the best route unless the unequal-cost option is configured.
The multi-path route must not have the same BGP next hop as the best path or any other multi-path route.
The multi-path route must not cause the ECMP limit of the routing instance to be exceeded (configured using the ecmp command with a value in the range 1 to 64).
The multi-path route must not cause the applicable max-paths limit to be exceeded. If the best path is an EBGP learned route and the ebgp option is used, the ebgp-max-paths limit overrides the max-paths limit. If the best path is an IBGP-learned route and the ibgp option is used, the ibgp-max-paths limit overrides the max-paths limit. All path limits are configurable up to a maximum of 64. Multi-path is disabled if the value is set to 1.
The multi-path route must have the same neighbor AS in its AS path as the best path if the restrict same-neighbor-as option is configured. By default, any path with the same AS path length as the best path (regardless of neighbor AS) is eligible for multi-path.
The route must have the same AS path as the best path if the restrict exact-as-path option is configured. By default, any path with the same AS path length as the best path (regardless of the actual AS numbers) is eligible for multi-path.
SR OS also supports IBGP multi-path. In some topologies, a BGP next hop is resolved by an IP route that has multiple ECMP next hops. When ibgp-multipath is not configured, only one of the ECMP next hops is programmed as the next hop of the BGP route in the IOM. When ibgp-multipath is configured, the IOM attempts to use all the ECMP next hops of the resolving route in the forwarding state. Although the name of the ibgp-multipath command implies that it is specific to IBGP-learned routes, this is not the case. It also applies to routes learned from any multi-hop BGP session including routes learned from multi-hop EBGP peers.
Be aware that multi-path and ibgp-multipath are not mutually exclusive and work together. The first context enables ECMP load-sharing across different BGP next hops (corresponding to different BGP routes) while the ibgp-multipath enables ECMP load-sharing across the next hops of IP routes that resolve the BGP next hops.
Finally, ibgp-multipath does not control traffic load sharing toward a BGP next hop that is resolved by a tunnel, as when dealing with BGP shortcuts or labeled routes (VPN-IP, label-IPv4, or label-IPv6). When a BGP next hop is resolved by a tunnel that supports ECMP, the load-sharing of traffic across the ECMP next hops of the tunnel is automatic.
SR OS supports direct resolution of a BGP next hop to multiple RSVP-TE or SR-TE tunnels. In addition, a BGP next hop can be resolved by multiple LDP ECMP next hops that each correspond to a separate LDP-over-RSVP or LDP-over-SRTE tunnel. It is also possible for a BGP next hop to be resolved by an IGP shortcut route that has multiple RSVP-TE or SR-TE tunnels as its ECMP next hops.