Ethernet Virtual Private Network (EVPN) is a standard technology in multi-tenant Data Centers (DCs). EVPN provides a control frame framework for many functions. This chapter details the configuration and operation of an EVPN and VXLAN (EVPN-VXLAN) solution for the following components:
The information and configuration in this chapter are based on SR Linux Release 21.3.
EVPN-VXLAN provides Layer-2 connectivity in multi-tenant DCs. EVPN-VXLAN Broadcast Domains (BD) can span several leaf routers connected to the same IP fabric, allowing hosts attached to the same BD to communicate as though they were connected to the same layer-2 switch. VXLAN tunnels bridge the layer-2 frames between leaf routers with EVPN providing the control plane to automatically setup tunnels and use them efficiently.
Figure 5 shows this concept. In this example, four leaf routers are attached to the same BD that is instantiated by a MAC-VRF on each leaf. SR Linux leaf routers support standard-based EVPN-VXLAN [RFC8365]; therefore third-party leaf routers (LEAF-4 in Figure 5) can be attached to the same BD as the SR Linux leaf routers as long as they follow standard [RFC8365].
Figure 6 shows a configuration example of an EVPN-VXLAN BD that is distributed in multiple leaf nodes in the same DC. The BD is instantiated by MAC-VRF-1 in each of the three leaf nodes. The sections that follow describe how to configure and operate MAC-VRF-1 in each node.
Prior to configuring the overlay BD, the underlay connectivity must be configured. In Figure 6, the leaf routers are connected to the spines using routed links. A routing protocol is enabled in the default network-instance of each leaf and spine node, so that reachability of all the leaf VXLAN Termination End Point (VTEP) addresses is distributed throughout the IP fabric. In SR Linux, you can use the following for the underlay routing protocol:
The EVPN family must also be enabled for the distribution of EVPN routes among leaf routers of the same tenant. EVPN is enabled using iBGP and typically a Route Reflector (RR), or eBGP.
As an example, the following configuration on LEAF-3 shows an eBGP-underlay BGP group to enable the IPv4 and IPv6 unicast families, and an iBGP-evpn group for the distribution of the EVPN routes. A full mesh of iBGP EVPN sessions is established among the three leaf routers, but a pair of RRs is typical.
Example: Underlay Configuration
In the example above, eBGP is used for underlay reachability, and iBGP for overlay EVPN route distribution. The command local-as overrides the configuration of the bgp>autonomous-system so that the overlay BGP sessions are established using the same autonomous system in the three leaf routers.
The system0.0 interface hosts the loopback address used to originate and typically terminate VXLAN packets. This address is also used by default as the next-hop of all EVPN routes.
The following example shows the configuration of the system0.0 interface in LEAF-3.
Example: system0.0 interface configuration
Once LEAF-3 is configured as defined in Configuring the underlay network, use the following steps to enable EVPN-VXLAN on LEAF-3.
In this example, Ethernet-1/2 connects HOST-3 to LEAF-3. Although this interface could be defined untagged, this example configures the interface as tagged and using vlan-id (vlan-tagging true).
A sub-interface with index 1 is created under the interface. The sub-interface must be configured as type bridged. Bridged sub-interfaces can be associated to MAC-VRF instances so that MAC learning and layer-2 forwarding can be enabled on each.
Once all leaf routers attached to the same BD are configured, the state of the MAC-VRF and connectivity to LEAF-2 and LEAF-3 should be checked.
Example: Check vxlan-interface configuration
The following command checks that the vxlan-interface is properly configured and associated to the network-instance. If the network-instance vxlan-interface is oper-down, a reason is shown. The egress source-ip shown in the command should match the IPv4 address configured in the sub-interface system0.0.
Example: Check mac-vrf, bgp-vpn, and bgp-evpn parameters
The following command checks that the mac-vrf, bgp-vpn, and bgp-evpn parameters are properly configured. A manual or auto-derived RD/RT must exist, or the bgp-evpn bgp-instance will be oper-down.
Example: Check VXLAN tunnels
This example checks for the creation of VXLAN tunnels to the remote VTEPs. After receiving EVPN routes from the remote leaf routers with VXLAN encapsulation, the vxlan_mgr creates VTEPs from the EVPN routes next-hops. Each VTEP gets an index allocated by fib_mgr (per source and destination tunnel IP addresses) if the next-hop is resolved in the default network-instance. The state of the two remote VTEPs is shown with their own indexes. EVPN routes are received with Next Hops 2.2.2.2 and 4.4.4.4 respectively.
Example: Check tunnel-table entries
The following command displays all tunnel-table entries. Once a VTEP is created in the vxlan-tunnel table and a non-zero index is allocated, a tunnel-table entry is created in the tunnel-table of the default network-instance.
If the next hop is not resolved to a route in the default network-instance route-table, the index in the vxlan-tunnel table shows as “0” for the VTEP and no tunnel-table is created. If the tunnel prefix in the tunnel-table is resolved, but the system runs out of hardware index resources, the tunnel will show in the tunnel-table, but will not be programmed. A non-programmed-reason will display.
Example: Check statistics
Once the three leaf routers exchange packets over the VXLAN, LEAF-3 displays statistics for all individual VTEPs. Statistics include:
Statistics can be cleared using the command: tools tunnel vxlan-tunnel vtep 2.2.2.2 statistics clear
Example: Check for received IMET routes and multicast destination creation
IMET routes are used for auto-discovery and the creation of the default flood list for vxlan in the MAC-VRF. When LEAF-3 receives and imports the IMET routes from LEAF-2 and LEAF-4, it creates a VXLAN default flood list. BUM frames received on a bridged sub-interface are ingress-replicated to the VTEPs on the list.
The following command checks that the IMET routes for the BD from LEAF-2 and LEAF-4 have been received and they have created multicast destinations in the MAC-VRF. Note that the VNI is received in the PMSI tunnel attribute and not in the route’s Network Layer Reachability Information (NLRI).
Example: Check for multicast-destinations
If the IMET routes from LEAF-2 and LEAF-4 are imported for MAC-VRF-1, the corresponding multicast VXLAN destinations are added and can be checked with the following command:
Example: Check mac-table and MAC/IP routes
When traffic is exchanged between HOST-3 and HOST-12, the MACs are learned on the access bridged sub-interfaces and advertised in MAC/IP routes. The MAC/IP routes are imported, and the MACs programmed in the mac-table. The following command can check the MAC/IP routes and the programmed MACs.
Example: Check unicast destinations
The reception of MAC/IP routes also creates unicast destinations in the vxlan-interface. In some cases, the unicast destinations are Ethernet Segment (ES) destinations if the MAC/IP routes are advertised from an ES. See Configuring multi-homing for EVPN broadcast domains for details. The following command displays the unicast destinations.
MAC mobility and MAC protection are implemented following [RFC7432]. MAC loop protection follows [draft-ietf-bess-rfc7432bis].
MAC Mobility is an event that triggers the fast move and re-learn of a MAC in a different leaf router. Mobility is common in DCs with some workloads moving between racks in the same DC. EVPN provides tools for fast mobility since MAC/IP routes are advertised with a sequence number that indicates the latest location of a MAC. This sequence number is used by the leaf routers to program the MAC with the correct VXLAN destination.
Example: MAC mobility (1 of 2)
Using Figure 6 as reference, if HOST-2 moves from LEAF-2 to LEAF-3, and you review the programming of the MAC in LEAF-4, MAC 00:00:00:00:00:02 is learned against VXLAN destination vtep:2.2.2.2 vni:1:
Example: MAC mobility (2 of 2)
After the mobility event, LEAF-4 receives the MAC with a higher sequence number. This makes LEAF-4 reprogram the MAC against LEAF-3 as shown below:
MAC protection refers to the property of a MAC that does not move between leaf routers. It is always learned against a bridged sub-interface. You can configure this MAC as static, but should observe the following:
Example: MAC protection (1 of 2)
Using Figure 6 as reference, the following commands show MAC 00:ca:fe:ca:fe:04 is configured as static in LEAF-4.
Example: MAC protection (2 of 2
On the remote leaf routers, the MAC is received as evpn-static and programmed this way. For example, LEAF-4 receives the route and programs it as follows:
Note that the static MACs state depends on the state of the sub-interface which they are configured against. If the sub-interface goes oper-down, the static MAC and EVPN route are removed. Static Blackhole MACs (where the configured destination is "blackhole") also behave as static MACs and are advertised as "evpn-static".
MAC loop protection in EVPN BDs is based on the SR Linux MAC Duplication feature. This feature detects MAC duplication for MACs moving:
It does not detect MAC duplication for MACs moving from one VTEP to a different VTEP in the same MAC-VRF. In addition, when a MAC is declared as a duplicate:
SR Linux supports all-active multi-homing for multi-homed peers connected using VXLAN, as per [RFC8365].
Using Figure 6 as a reference, LEAF-2 and LEAF-3 are multi-homed to server-1 using all-active multi-homing. The representation of the multi-homed device in the EVPN control plane is referred to as an Ethernet Segment (ES). It is considered "all-active" and not "active-active", because SR Linux supports up to four leaf routers multi-homed to the same CE or server with all links being active (not just two).
The all-active multi-homing function relies on three different procedures to handle multi-homing in the ES:
The DF is the leaf that forwards BUM traffic received from the VXLAN into the ES (to the server). Only one DF can exist per ES at a time, and it is elected based on the exchange of ES routes (EVPN routes type 4) and the subsequent DF election algorithm. All leaf routers, DF and non-DF, will forward known-unicast traffic to the multi-homed server.
The split-horizon or local bias is the procedure that avoids looped packets on the server. If server-1 hashes the BUM traffic to the non-DF leaf (for example, LEAF-2), without any split-horizon technique, the flooded BUM packets move to the DF (LEAF-4) and back to server-1. Split-horizon prevents these flooded packets from forwarding back to server-1. When the data plane is VXLAN, the split-horizon mechanism is based on a "local-bias" forwarding mode as defined in RFC8365. This implies that:
Aliasing is the procedure that allows ecmp (load-balancing) from remote leaf routers (LEAF-3) to all leaf routers attached to the same ES, even though the MAC is only advertised by one of the leaf routers in the ES. For example, in Figure 6, flows from 00:00:00:00:00:01 can only be hashed to LEAF-2. LEAF-2 is the only router in the ES advertising the MAC. However, since LEAF-2 and LEAF-4 advertise the association to the same Ethernet Segment Identifier (ESI), and the MAC/IP route for 00:00:00:00:00:01 is tagged with ESI-1, LEAF-3 can "alias" the unicast traffic to both leaf routers.
Note that a leaf advertises its association to an ES via AD per ES routes, and its association to an ES for a given MAC-VRF via AD per EVI routes. Both are EVPN routes type 1.
Configuration of all-active multi-homing involves four major steps:
With SR Linux, ESs are control plane entities that reside in the system network-instance. The system network-instance contains a BGP-VPN instance similar to the one in mac-vrfs. This instance hosts the bgp information used by EVPN for multi-homing routes, and the ES configuration and state.
The following example provides ES configuration details.
The ES model uses a BGP-VPN instance where the route-distinguisher and export/import route-targets are taken by BGP and used for the ES routes. Only one instance is allowed, and all ESs live under this BGP instance. The default route-distinguisher for the instance is automatically derived from system-ip:0 and used in ES routes.
The default import/export route-target is automatically derived from the ESI (bytes 1 to 6; from the second highest order byte up to the seventh byte). This route-target is of type ESI-import route-target (as per [RFC7432]) and is used in ES routes to ensure they are imported on Leaf routers attached to the ES.
ESs have an admin-state this is disabled by default. It must be toggled to change any of the parameters affecting the EVPN control plane.
The following timers can be configures for ESs: general boot, per ES boot, and activation:
The ES requires a manual 10-byte ESI configuration. Reserved ESI values such as ESI-0 or MAX-ESI (0xFF..FF) are not allowed. ESI values with 00-00-00-00-00-00 in bytes 1-6 are not allowed. This prevents the auto-derivation of ESI-import route-targets as all 0s.
SR Linux supports the default DF election algorithm, as per [RFC8584]. No configuration is required.
The ES association to an interface must be configured. The interface type can be Ethernet or LAG. If LACP is used on the CE, as shown in Figure 6, only a LAG can be associated.
LEAF-2 and LEAF-4 behave as a single system to server-1. Therefore, they must be configured with the same LACP parameters for the LAG on server-1 to come up. The admin-key, system-id-mac, and system-priority must match on both leaf routers so that the LAG comes up.
Note: This step is only required if Multi-Homing is used in an EVPN-VXLAN BD distributed among multiple Leaf routers. If the Multi-Homing ES is used locally as a Layer-2 MLAG (Multi-chassis Link Aggregation Group) technique, this step can be skipped. |
An example of ESs in a non-EVPN layer-2 BD is shown in Figure 7. In this scenario, EVPN only runs locally between the leaf routers of the two pairs, but not globally in the network. The two leaf tiers (LEAF-4/5 and LEAF-2/3) are connected via layer-2 sub-interfaces, and not VXLAN. Each leaf is configured with two ESs. LEAF-4 is configured with ES-6 for multi-homing to server-6, and ES-45 for multi-homing to the LEAF-2/3 tier. The configuration of the ES previously described also apply to these topologies, with the exception that each ES would use a different name, ESI, and lag interface.
As noted in the multi-homing configuration procedure, configuring the ecmp with a value greater than 1 in all the leaf routers attached to the same BD is not required If the multi-homing ES is used locally as a layer-2 MLAG.
After the multi-homed leaf routers and the remote leaf are configured, the ES operation must be checked.
Example: Check the ES status
Use the following to check the status of the ES on LEAF-2 and LEAF-4. The output from this command must show the same DF for the same mac-vrf on both leaf routers, and the same candidate list for the ES on both leaf routers. The detail form of this command also provides information about timers and the DF election.
Example: Check ES and EVI routes
The exchange of ES routes (used for DF election and ES discovery) and AD per ES/EVI routes (used to indicate the association of Leaf services to ES) can be checked with the following commands.
Note that LEAF-2 should have a type 4 route for the ES originating from LEAF-4. There should also be one AD per EVI and one AD per ES route from LEAF-4 for MAC-VRF-1. AD routes are advertised for each MAC-VRF that are part of the ES.
Example: Check the MAC/IP route
MAC/IP routes advertised for MACs learned on the ES sub-interfaces are advertised with the ESI of ES-1. The RIB state for the MAC/IP routes can be used to check this. The following command checks the MAC/IP route for MAC 00:00:00:00:00:01 on LEAF-3:
Example: Check the ES destination
Once AD routes are received from LEAF-2 and LEAF-4, and the MAC/IP route is tagged with the ESI, LEAF-3 creates an ES destination resolved to as many VTEPs as leafs advertising the AD routes (up to a maximum of the ecmp setting on LEAF-3). Use the following to check the ES destination for ES-1 created on LEAF-3.
Example: Check MAC programming
The following command shows how LEAF-3 programs the MACs received for the ES as associated to an ES destination
Example: Check sub-interface ES association LEAF-2)
On the non-DF (in this example, LEAF-2), the MAC-VRF interface included in ES-1 (lag1.1) shows the "multicast-forwarding" flag as none. This means that the interface will not forward BUM traffic to the CE/server when it is received on VXLAN. The following shows how the state for the sub-interface itself will show the association with ES-1 and the DF state:
Example: Check sub-interface ES association LEAF-4)
On the DF (in this example, LEAF-4), the MAC-VRF interface in ES-1 (lag1.1) shows the "multicast-forwarding" flag as BUM. The following shows how the state for the sub-interface itself will show the association with ES-1 and the DF state.
Example: EVPN related logs
The evpn_mgr application has logs that are triggered when the DF state changes on an ES as shown in the following: