2. Using BGP for underlay routing

A routing protocol is needed to dynamically discover the shortest loop-free path through the underlay of a DC fabric to reach every destination IP subnet. The Border Gateway Protocol (BGP) is one of the leading technologies for this purpose due to its simplicity, scalability, and ease of multi-vendor interoperability.

BGP also provides policy mechanisms to perform hop-by-hop traffic engineering, leveraging functionality originally designed for this same purpose in the public Internet.

2.1. Applicability

The information and configuration in this chapter are based on SR Linux Release 19.11 and later.

2.2. Overview

Figure 1 shows a 3-stage CLOS fabric design using only BGP for underlay routing.

Figure 1:  BGP underlay routing example 

The design example in Figure 1 shows the following:

  1. Each Top-of-Rack (TOR) switch is a BGP router assigned with its own unique Autonomous System Number (ASN).
  2. Each TOR switch is dual-homed to the two leaf switches in its same POD/container. More leaf switches could be added later to achieve capacity scale-out.
  3. Each TOR forms one single-hop External Border Gateway Protocol (EBGP) session to each of its upstream leaf switches. From a TOR perspective, these are single-hop because each leaf switch is a BGP neighbor in the same IP subnet as its interface address towards the leaf switch.
  4. Each leaf switch is a BGP router. All of the leaf switches in one POD/container belong to the same ASN, but this ASN is unique in the data center.
  5. Each leaf switch has two uplinks into the spine layer. More uplinks could be added later to achieve capacity scale-out. Each leaf switch forms one single-hop EBGP session with each of its upstream spine switches.
  6. Each spine switch is a BGP router. All of the spine switches in one data center belong to the same ASN but this ASN is unique in the network.

2.2.1. Advantages of BGP for underlay routing

Using BGP as shown in the Figure 1 example has these advantages:

  1. Standard operation of the BGP best path selection algorithm chooses the route to each destination with the AS_PATH length. This equates to the lowest hop count when each device prepends one AS number to the AS_PATH.
  2. Standard operation of the BGP multipath algorithm sprays traffic across all paths with the same shortest AS_PATH length.
  3. When a link goes down in the topology, the BGP session is taken down immediately if fast-failover is enabled. This may cause a new BGP best path to be advertised by the routers at each end of the failed session. Other routers may also advertise their own new best paths, but typically the failure does not propagate beyond routers that do not change their best path.
  4. Traffic can be rerouted around any node in the topology by having it prepend extra AS numbers to the AS_PATH.
  5. The best path or set of multipaths available to reach a destination TOR are visible in any device by looking at the AS_PATH attribute. This can be helpful with troubleshooting.

2.3. Configuring BGP for underlay routing

The following example defines how to bring up a static, pre-configured EBGP session between Router 3 and Router 5 as shown in Figure 1. Use the following two procedures to define the minimum configuration required for each router:

2.3.1. Example: Configure router 3 for static EBGP session

To configure Router 3:

  1. In candidate mode, create a network-instance that owns the IP subinterface toward the Router 5. Ensure:
    1. The network-instance is operationally enabled.
    2. The subinterface is operationally enabled.
    3. The subinterface has at least one IPv4 or IPv6 address assigned.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default ]--
    # info detail
        type default
        admin-state enable
        ip-load-balancing {
        }
        interface ethernet-1/1.0 {
        }
        protocols {
        }
  2. Add the BGP protocol to the network-instance. By default it is administratively enabled when the configuration is committed.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default ]--
    # protocols bgp
  3. Assign a global autonomous system number to the BGP instance. This is the AS number reported to peers when this network-instance opens a BGP session toward another router (unless it is overridden by a local-as configuration).
    Router 3 (R3 in Figure 1) has a global autonomous system number of 65201.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # autonomous-system 65201
  4. Assign a router-ID to the BGP instance. This is the BGP identifier reported to peers when this network-instance opens a BGP session toward another router. This overrides the router-ID configuration at the network-instance level.
    Router 3 (R3 in Figure 1) has a router-ID of 192.0.3.1.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # router-id 192.0.3.1
  5. Enable all address families that should be enabled globally as a default for all peers of the BGP instance.
    When you later configure individual neighbors or groups, you can override the enabled families at those levels.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # ipv4-unicast admin-state enable
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # ipv6-unicast admin-state enable
  6. Create a peer-group to contain the neighbor session with Router 5 (R5 in Figure 1). A peer-group should include sessions that have a similar or almost identical configuration.
    In this example, the peer-group is named "spine" since it will be used to contain all spine layer peers. New groups are administratively enabled by default.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # group spine
  7. All of the configuration that is common to all peers in the group must be configured at the group level. In this example, this includes:
    1. peer-as (of the spine peers)
    2. export-policy
    The export policy (named ‘pass-all’ in the example) is shown in the configuration output below and was previously created in this work flow (if it did not exist, the commit would fail). The export policy is required to advertise any routes to R5. This is because R5 is an EBGP peer and by default no routes are advertised to EBGP peers without an export policy. Note that this can be controlled by a setting in the network-instance protocols ‘bgp ebgp-default-policy’ container.
    The "pass-all" is a simple policy that matches all BGP routes and accepts them, while rejecting all non-BGP routes.
    Example configuration details:
    --{ candidate shared default}--[ network-
    instance default protocols bgp group spine ]
    --
    # info
        peer-as 65301
        export-policy pass-all
    --{ candidate shared }--[  ]
    # info from running routing-policy
        routing-policy {
            policy pass-all {
                default-action {
                    reject {
                    }
                }
                statement 10 {
                    match {
                        protocol bgp
                    }
                    action {
                        accept
                    }
                }
            }
        }
  8. Configure the BGP session with router R5. In this example, router R5 is reachable to R3 through the ethernet-1/1.0 subinterface. On this subnet, router R5 has the global-unicast IPv6 address 2001:db8::c11.
    In this minimal configuration example, the only required configuration for the neighbor is its association with the group "spine" that was previously created. New neighbors are administratively enabled by default.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # neighbor 2001:db8::c11 peer-group spine
  9. Review all changes and if everything looks correct, commit the changes.
    # commit stay

2.3.2. Example: Configure router 5 for static EBGP session

To configure Router 5:

  1. In candidate mode, create a network-instance that owns the IP subinterface toward the Router 3. Ensure:
    1. The network-instance is operationally enabled.
    2. The subinterface is operationally enabled.
    3. The subinterface has at least one IPv4 or IPv6 address assigned.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default ]--
    # info detail
        type default
        admin-state enable
        ip-forwarding {
            receive-ipv4-check true
            receive-ipv6-check true
        }
        ip-load-balancing {
        }
        interface ethernet-3/1.1 {
        }
        protocols {
        }
        mtu {
            path-mtu-discovery true
            min-path-mtu 552
        }
  2. Next add the BGP protocol to the network-instance. By default it will be administratively enabled when the configuration is committed.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default ]--
    # protocols bgp
  3. Assign a global autonomous system number to the BGP instance.
    Router 5 (R5 in Figure 1) has a global autonomous system number of 65301.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # autonomous-system 65301
  4. Assign a router-ID to the BGP instance. This is the BGP identifier reported to peers when this network-instance opens a BGP session toward another router. This overrides the router-ID configuration at the network-instance level.
    Router 5 (R5 in Figure 1) has a router-ID of 192.0.5.1.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # router-id 192.0.5.1
  5. Enable all address families that should be enabled globally as a default for all peers of the BGP instance.
    When you later configure individual neighbors or groups, you can override the enabled families at those levels.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # ipv4-unicast admin-state enable
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # ipv6-unicast admin-state enable
  6. Create a peer-group to contain the neighbor session with Router 3 (R3 in Figure 1). A peer-group should include sessions that have a similar or almost identical configuration.
    In this example, the peer-group is named "leaf-pod1" since it is used to contain all leaf peers in POD1. New groups are administratively enabled by default.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # group leaf-pod1
  7. All of the configuration that is common to all peers in the group must be configured at the group level. In this example, this includes:
    1. peer-as (of the leaf peers in POD1)
    2. export-policy
    The export policy (named ‘pass-all’ in the example) is shown in the running configuration output below, and is required to advertise any routes to R3. This is because R3 is an EBGP peer and by default no routes are advertised to EBGP peers without an export policy. Note that this can be controlled by a setting in the network-instance protocols ‘bgp ebgp-default-policy’ container.
    The "pass-all" is a simple policy that matches all BGP routes and accepts them, while rejecting all non-BGP routes.
    Example configuration details:
    --{ candidate Shared default}--[ network-instance default protocols bgp group leaf-
    pod1 ]--
    # info
        peer-as 65201
        export-policy pass-all
    --{ candidate }--[ network-instance default protocols bgp group leaf-pod1 ]--
    # exit all
    --{ candidate shared default}--[  ]
    # info from running routing-policy
        routing-policy {
            policy pass-all {
                default-action {
                    reject {
                    }
                }
                statement 10 {
                    match {
                        protocol bgp
                    }
                    action {
                        accept
                    }
                }
  8. Configure the BGP session with router R3. In this example, router R3 is reachable to R5 through the ethernet-3/1.1 subinterface. On this subnet, router R5 has the global-unicast IPv6 address 2001:db8::c12.
    In this minimal configuration example, the only required configuration for the neighbor is its association with the group ‘leaf-pod1’ that was previously created. New neighbors are administratively enabled by default.
    Example configuration details:
    --{ candidate shared default}--[ network-instance default protocols bgp ]--
    # neighbor 2001:db8::c12 peer-group leaf-pod1
  9. Review all changes and if everything looks correct, commit the changes.
  10. From Router 3 (R3), verify that the session is up (State is established) using the show neighbor report under the network-instance protocols bgp hierarchy
    Example:
    --{running}--{ network-instance default protocols bgp }--
    srlinux# show neighbor
    ------------------------------------------------------------------------------------------------
    BGP neighbor summary for network-instance "default"
    Flags: S static, D dynamic, L discovered by LLDP, B BFD enabled, - disabled, * slow
    ------------------------------------------------------------------------------------------------
    +----------+---------------+-------+-------+-------+-------------+----------+--------------+---------+
    | Net-Inst |  Peer         | Group | Flags | Peer- | State       | Uptime   | AFI/SAFI     |  RX/    |
    |          |               |       |       |  AS   |             |          |              | Active  |
    |          |               |       |       |       |             |          |              |  /TX    |
    +==========+===============+=======+=======+=======+=============+==========+==============+=========+
    | default  | 2001:db8::cli | spine | S     | 65301 | established | 0d:0h:   | ipv4-unicast | [4/3/1] |
    |          |               |       |       |       |             | 34min 7s | ipv6-unicast | [1/1/1] |
    +----------+---------------+-------+-------+-------+-------------+----------+--------------+---------+
    ------------------------------------------------------------------------------------------------
    Summary:
    1 configured neighbors, 1 configured sessions are established, 0 disabled peers
    None dynamic sessions are established

2.4. Advanced configuration: BGP timers

When two BGP routers form a session, they each propose a value for the session hold-time in their OPEN messages. The lowest of the two proposed values becomes the operational hold-time for the lifetime of the session. If the operational hold-time is greater than zero, then both routers are agreeing to send KEEPALIVE messages to each other. This ensures that any loss of connectivity between them can be detected.

Each router restarts its hold-timer every time it receives a message from the other peer. If the operational hold-timer reaches zero without receiving any KEEPALIVE or related message from the peer, then the session is torn down (returned to the Idle state). Each router sends a KEEPALIVE message to its peer no more than 1 message every keepalive interval. The default value for the keepalive interval is one third of the operational hold-time, but it possible to configure a different interval.

In a data center environment, an EBGP session failure is usually caused by an interface going down. Interface events are propagated to BGP if fast-failover is enabled. The hold-timer expiry is not the usual mechanism for detecting connectivity problems. However, there may be some circumstances where some adjustment of the hold-time and/or the keepalive-interval is desired.

2.4.1. Timer-related defaults and how to modify

With the SR Linux, the default hold-time is 90 seconds and the default keepalive interval is 30 seconds. To change the hold-time on a session to 24 seconds with a keepalive interval of 8 seconds (1/3 of 24) it would be sufficient to change the hold-time value to 24, as shown in the following example:

--{ candidate shared default}--[ network-instance default protocols bgp neighbor 
 2001:db8::c11 ]--
# timers hold-time 24

After this change is committed, the affected session will flap and the new operational timer values are seen in the output of the show network-instance protocols bgp neighbor detail report. For example:

srlinux# show network-instance default protocols bgp neighbor 2001:db8::c11 detail
----------------------------------------------------------------------------------
Peer : 2001:db8::c11, remote AS: 65301, local AS: 65201
Type : static
Description : None
Group : spine
Export policies : pass-all
Import policies: pass-all
---------------------------------------------------------------------------------
Admin-state is enable, session-state is established, up for 0d:0h:6m:37s
TCP connection is 2001:db8::c12 [45492] -> 2001:db8::c11 [179]
0 messages in input queue, 0 messages in output queue
--------------------------------------------------------------------------------
Last-state was active, last-event was recvOpen, 24 peer-flaps
Last received Notification was Error:Message Header Error SubError: Bad Message Type
Failure detection: BFD is False, fast-failover is False
------------------------------------------------------------------------------------
Graceful Restart
   Restarts by the peer : 0
   Last restart : N/A
   Peer requested restart-time : 300
   Stale routes time : 360
+-----------------------------------+---------------------------------+------------+
|                Timer              |     Configured/Operational      |     Next   |
+===================================+=================================+============+
| connect-retry                     | 120                             | -          |
| keepalive-interval                | 30/8                            | -          |
| hold-time                         | 24/24                           | -          |
| minimum-advertisement-interval    | 5                               | -          |
+-----------------------------------+---------------------------------+------------+
------------------------------------------------------------------------------------
Cap Sent:  MP_BGP ROUTE_REFRESH EXT_NH_ENCODING 4-OCTET_ASN
Cap Recv:  MP_BGP ROUTE_REFRESH EXT_NH_ENCODING 4-OCTET_ASN
------------------------------------------------------------------------------------
+----------------------------------+--------------------------------+--------------+
|              Messages            |              Sent              |  Received    |
+==================================+================================+==============+
| Non Updates                      | 55                             | 52           |
| Updates                          | 424                            | 2            |
| Malformed updates                | 0                              | 0            |
+----------------------------------+--------------------------------+--------------+
------------------------------------------------------------------------------------
Ipv4-unicast AFI/SAFI
    End of RIB                     : sent, not received
    Received routes                : 4
    Rejected routes                : None
    Active routes                  : 3
    Advertised routes              : 1
    Prefix-Limit                   : None
    Default originate              : disabled
    Advertise with IPv6 next-hops  : False
    Peer requested GR helper       : None
    Peer preserved forwarding state: None
-----------------------------------------------------------------------------------
Ipv6-unicast AFI/SAFI
    End of RIB                     : sent, not received
    Received routes                : 1
    Rejected routes                : None
    Active routes                  : 1
    Advertised routes              : 1
    Prefix-Limit                   : None
    Default originate              : disabled
    Advertise with IPv6 next-hops  : N/A
    Peer requested GR helper       : None
    Peer preserved forwarding state: None
-----------------------------------------------------------------------------------
--{ candidate shared }--[  ]--

2.5. Advanced configuration: BGP convergence optimization

By default, the SR Linux BGP process (running the BGP control plane) does not advertise a route for an IPv4 or IPv6 prefix until it has positive confirmation from the FIB manager process that the route is installed into the FIB of all installed line cards. This ensures that the router does not attract traffic destined for an IP prefix until all line cards have the ability to forward the traffic. Note that the BGP process does not delay route withdrawals until it knows that all line cards have removed the FIB state as this is not needed.

It is recommended that the wait-for-fib-install feature remain enabled on routers that are in the datapath (that is, routers that set BGP next-hop-self). However, this does cause the rate of RIB-OUT route advertisements to slow to the rate of FIB programming. If the objective of a BGP performance test is to reach the highest possible route advertisement rate, the wait-for-fib-install configuration leaf can be set to false. For example:

--{ candidate shared default}--[network-instance default protocols bgp route-
advertisement]--
# wait-for-fib-install false

2.5.1. Optimizing the convergence process after restarts

The BGP protocol and its state machine must attempt to re-converge whenever the following occurs:

  1. the router starts up
  2. the BGP manager (control plane) application restarts
  3. all peers of a network-instance are hard-reset by a tools reset-peer command

When any of these conditions are met, the router re-synchronizes its BGP RIB with the BGP RIB of other routers in the network. Once re-synchronization completes, BGP has "converged". During convergence, the following occurs to the restarting router:

  1. It must re-establish its sessions with configured (and discovered) BGP neighbors.
  2. It must relearn all BGP routes advertised by its direct BGP neighbors (their best paths, plus potentially some additional paths).
  3. It must advertise to its direct neighbors, its own locally originated BGP routes plus the received routes that it considers its own set of best paths.

The default behavior of SR Linux BGP is to execute all of the above steps in parallel. As soon as the first BGP session has re-established, the restarting router begins to advertise its own best paths to that BGP neighbor (even though it is still in the early stages of rebuilding its RIB-IN database).

As more sessions come up and more routes are learned, it is likely that routes previously considered best are no longer best, leading to multiple route advertisements for the same prefix with each incrementally better than the previous one. The best route is not determined until the last advertisement. The intermediate route advertisements can substantially increase the processing workload on the restarting router as well as its BGP neighbors. This can lengthen the overall convergence time and cause short term inefficiencies in traffic forwarding.

Instead of re-converging as described above, SR Linux BGP can also be configured to delay the advertisement of BGP routes in a particular address family until convergence has occurred for that address family or until a configured time limit has expired. This behavior is activated by configuring a non-zero value for the min-wait-to-advertise configuration leaf. For example:

--{ candidate shared default}--[ network-
instance default protocols bgp convergence ]--
# min-wait-to-advertise 600

The max-wait-to-advertise leaf value for the IPv4-unicast and IPv6-unicast address families can be configured, or you can accept their default values (3x the min-wait-to-advertise value). If configuring a max-wait-to-advertise leaf with a non-default value, the value must be greater than the configured min-wait-to-advertise timer. In the example below, the max-wait-to-advertise timer is set to 900 seconds for IPv4-unicast and set to 800 seconds for IPv6-unicast.

--{ candidate shared default}--[ network-instance default protocols bgp ]--
# ipv4-unicast convergence max-wait-to-advertise 900
# ipv6-unicast convergence max-wait-to-advertise 800

The min-wait-to-advertise timer begins after one of the following triggers occurs and the first BGP session becomes established.

  1. BGP instance admin state set to enable or disable
  2. Running ‘tools clear network-instance protocols bgp reset-peer’
  3. BGP application restart
  4. Node reboot

When the first session that supports the exchange of IPv4-unicast routes is established, the max-wait-to-advertise timer of the IPv4 address family starts. Likewise, when the first session that supports the exchange of IPv6-unicast routes is established, the max-wait-to-advertise timer of the IPv6 address family starts.

While the min-wait-to-advertise timer is running, BGP sessions come up, and routes are learned and sorted according to preference by the BGP decision process. However, no routes are advertised to any of the peers.

When the min-wait-to-advertise expires, BGP makes a list of IPv4 and IPv6 peers (that is, peers that support the exchange of IPv4-unicast routes and IPv6-unicast routes). It expects to receive the IPv4-unicast End of RIB (EOR) marker from each neighbor in the list of IPv4 peers, and it expects to receive the IPv6-unicast EOR from each neighbor in the list of IPv6 peers.

When BGP in the restarting router receives the last expected IPv4-unicast EOR, it declares that address family as converged and starts to advertise its best IPv4-unicast routes. Likewise, when BGP receives the last expected IPv6-unicast EOR it i declares that address family as converged and starts to advertise its best IPv6-unicast routes.

If the max-wait-to-advertise timer expires before the last expected EOR is received for an address family, the convergence state for the address family moves to 'timeout' and RIB-OUT advertisement is triggered. This occurs even though convergence is not complete. The max-wait-to-advertise timers are a fail-safe. They handle the scenario when one or more peers come up within the min-wait-to-advertise window, but their EORs are not sent.

In the example that follows, the BGP convergence process is triggered by a hard reset of all peers of the BGP instance:

--{ * candidate shared default}--[  ]--
dut1# tools network-instance default protocols bgp reset-peer
/network-instance[name=default]/protocols/bgp:
    Successfully executed the tools clear command.
--{ candidate shared }--[  ]--

If the show network-instance protocols bgp summary command is issued a few minutes after the session restarts, a snapshot of the convergence process can be viewed. For example, in the sample output below, ten IPv4-unicast sessions are established when the min-wait-to-advertise timer expires and IPv4-unicast convergence takes 517 seconds.

dut1# show network-instance default protocols bgp summary
-----------------------------------------------------------------------------
BGP is enabled and up in network-instance "default"
Global AS number  : 65201
BGP identifier    : 192.0.3.1
-----------------------------------------------------------------------------
  Total paths               : 27
  Received routes           : 200000
  Received and active routes: 200000
  Total UP peers            : 20
  Configured peers          : 20, 0 are disabled
  Dynamic peers             : None
-----------------------------------------------------------------------------
Default preferences
  BGP Local Preference attribute: 100
  EBGP route-table preference   : 170
  IBGP route-table preference   : 170
-----------------------------------------------------------------------------
Wait for FIB install to advertise: True
Send rapid withdrawals           : False
-----------------------------------------------------------------------------
Ipv4-unicast AFI/SAFI
    Received routes               : 100000
    Received and active routes    : 100000
    Max number of multipaths      : 8, 1
    Multipath can transit multi AS: True
-----------------------------------------------------------------------------
    Min adv delay after restart(slow peer thresh): 600s
    Currently established sessions               : 10
    Sessions established at slow peer thresh     : 10
    First session establishment after restart    : 5s
    Last session established after restart       : 252s
-----------------------------------------------------------------------------
    Max advertisement delay after first peer UP: 900s
    Max adv delay exceeded after last restart  : None
    Current convergence state                  : converged
    Converged peers                            : 10
    Convergence time after last restart        : 517s
-----------------------------------------------------------------------------
Ipv6-unicast AFI/SAFI
    Received routes               : 100000
    Received and active routes    : 100000
    Max number of multipaths      : 1,1
    Multipath can transit multi AS: True
-----------------------------------------------------------------------------
    Min adv delay after restart(slow peer thresh): 600s
    Currently established sessions               : 10
    Sessions established at slow peer thresh     : 10
    First session establishment after restart    : 8s
    Last session established after restart       : 312s
-----------------------------------------------------------------------------
    Max advertisement delay after first peer UP: 800s
    Max adv delay exceeded after last restart  : None
    Current convergence state                  : converged
    Converged peers                            : 10
    Convergence time after last restart        : 705s
-----------------------------------------------------------------------------
--{ candidate shared }--[  ]--

2.6. Advanced configuration: advertising IPv4 routes with IPv6 next-hops

Some data centers are migrating away from an IPv4/IPv6 dual-stack infrastructure and moving towards an IPv6-only infrastructure. In an IPv6-only design, each interface in the fabric (such as the leaf-spine, leaf-TOR) is assigned one or more IPv6 addresses, but no IPv4 addresses.

In order to route and forward IPv4 packets over an IPv6-only fabric, the leaf and spine switches must support the following:

  1. The ability to advertise a BGP route for an IPv4 Network Level Reachability Information (NLRI) with an IPv6 BGP next-hop address.
  2. The ability to receive a BGP route for an IPv4 NLRI with an IPv6 BGP next-hop address.
  3. The ability to accept IPv4 packets on an IPv6-only interface.

2.6.1. Advertising a BGP route for an IPv4 NLRI with an IPv6 BGP next-hop address

In the SR Linux, the ability to advertise a BGP route for an IPv4 NLRI with an IPv6 BGP next-hop address is not enabled by default. To enable, use the advertise-ipv6-next-hops command, which is available on a per session basis. The following is a sample configuration:

--{ candidate shared default}--[ network-instance default protocols bgp ]--
# ipv4-unicast advertise-ipv6-next-hops true

2.6.2. Receiving a BGP route for an IPv4 NLRI with an IPv6 BGP next-hop address

In the SR Linux, the ability to receive a BGP route for an IPv4 NLRI with an IPv6 BGP next-hop address is not enabled by default. To enable, use the receive-ipv6-next-hops command, which is available on a per session basis.

This command allows SR Linux to advertise the extended-next-hop-encoding BGP capability, defined in RFC 5549, to the peers included in the scope of the command. This BGP capability encodes NLRI AFI 1, NLRI SAFI 1, and next-hop AFI 2. It informs peers that they can advertise MP-BGP encoded IPv4 routes with IPv6 next-hops. When the routes are received, the router will then attempt to resolve them using IPv6 routes.

If the router receives an IPv4 route with an IPv6 next-hop that is resolved by a static or direct IPv6 route (and an IPv6 neighbor entry for the next-hop host address), the IPv4 route is programmed in the FIB so that matching IPv4 packets are sent without additional encapsulation. Packets are sent through the indicated interface with a MAC destination address provided by the IPv6 neighbor entry.

The following is a sample configuration:

--{ candidate shared }--[ network-instance default protocols bgp ]--
# ipv4-unicast receive-ipv6-next-hops true

2.6.3. Accepting IPv4 packets on an IPv6-only interface

The datapath of the SR Linux discards all IPv4 packets that are received on an IPv6-only subinterface (that is, a subinterface with no configured IPv4 addresses). This is done for security reasons. However, if the router has advertised IPv4 routes with IPv6 next-hops to a peer, the check should be disabled on all subinterfaces that could be used by the peer when it installs the IPv4 route.

To disable this check on all subinterfaces bound to a particular network-instance, set the ipv4-receive-check leaf to false.

The following is a sample configuration:

--{ candidate shared default}--[ network-instance default ]--
# ip-forwarding receive-ipv4-check false

2.7. References

All BGP-related CLI commands can be found in the SR Linux Data Model Reference.