5. PIM

5.1. PIM Overview

PIM-SM leverages the unicast routing protocols that are used to create the unicast routing table, OSPF, IS-IS, BGP, and static routes. Because PIM uses this unicast routing information to perform the multicast forwarding function it is effectively IP protocol independent. Unlike DVMRP, PIM does not send multicast routing tables updates to its neighbors.

PIM-SM uses the unicast routing table to perform the Reverse Path Forwarding (RPF) check function instead of building up a completely independent multicast routing table.

PIM-SM only forwards data to network segments with active receivers that have explicitly requested the multicast group. PIM-SM in the ASM model initially uses a shared tree to distribute information about active sources. Depending on the configuration options, the traffic can remain on the shared tree or switch over to an optimized source distribution tree. As multicast traffic starts to flow down the shared tree, routers along the path determine if there is a better path to the source. If a more direct path exists, then the router closest to the receiver sends a join message toward the source and then reroutes the traffic along this path.

As stated above, PIM-SM relies on an underlying topology-gathering protocol to populate a routing table with routes. This routing table is called the Multicast Routing Information Base (MRIB). The routes in this table can be taken directly from the unicast routing table, or it can be different and provided by a separate routing protocol such as MBGP. Regardless of how it is created, the primary role of the MRIB in the PIM-SM protocol is to provide the next hop router along a multicast-capable path to each destination subnet. The MRIB is used to determine the next hop neighbor to whom any PIM join/prune message is sent. Data flows along the reverse path of the join messages. Thus, in contrast to the unicast RIB that specifies the next hop that a data packet would take to get to some subnet, the MRIB gives reverse-path information, and indicates the path that a multicast data packet would take from its origin subnet to the router that has the MRIB.

Note:

For proper functioning of the PIM protocol, multicast data packets need to be received by the CPM CPU. Therefore CPM Filters and Management Access Filters must be configured to allow forwarding of multicast data packets.

5.1.1. PIM-SM Functions

PIM-SM functions in three phases:

5.1.1.1. Phase One

In this phase, a multicast receiver expresses its interest in receiving traffic destined for a multicast group. Typically it does this using IGMP or MLD, but other mechanisms might also serve this purpose. One of the receiver’s local routers is elected as the DR for that subnet. When the expression of interest is received, the DR sends a PIM join message towards the RP for that multicast group. This join message is known as a (*,G) join because it joins group G for all sources to that group. The (*,G) join travels hop-by-hop towards the RP for the group, and in each router it passes through the multicast tree state for group G is instantiated. Eventually the (*,G) join either reaches the RP or reaches a router that already has (*,G) join state for that group. When many receivers join the group, their join messages converge on the RP and form a distribution tree for group G that is rooted at the RP. This is known as the RP tree and is also known as the shared tree because it is shared by all sources sending to that group. Join messages are resent periodically as long as the receiver remains in the group. When all receivers on a leaf-network leave the group, the DR will send a PIM (*,G) prune message towards the RP for that multicast group. However if the prune message is not sent for any reason, the state will eventually time out.

A multicast data sender starts sending data destined for a multicast group. The sender’s local router (the DR) takes those data packets, unicast-encapsulates them, and sends them directly to the RP. The RP receives these encapsulated data packets, removes the encapsulation, and forwards them onto the shared tree. The packets then follow the (*,G) multicast tree state in the routers on the RP tree, being replicated wherever the RP tree branches, and eventually reaching all the receivers for that multicast group. The process of encapsulating data packets to the RP is called registering, and the encapsulation packets are known as PIM register packets.

At the end of phase one, multicast traffic is flowing encapsulated to the RP, and then natively over the RP tree to the multicast receivers.

5.1.1.2. Phase Two

In this phase, register-encapsulation of data packets is performed. However, register-encapsulation of data packets is unsuitable for the following reasons:

  1. Encapsulation and de-encapsulation can be resource intensive operations for a router to perform depending on whether or not the router has appropriate hardware for the tasks.
  2. Traveling to the RP and then back down the shared tree can cause the packets to travel a relatively long distance to reach receivers that are close to the sender. For some applications, increased latency is unwanted.

Although register-encapsulation can continue indefinitely, for these reasons, the RP will normally switch to native forwarding. To do this, when the RP receives a register-encapsulated data packet from source S on group G, it will normally initiate an (S,G) source-specific join towards S. This join message travels hop-by-hop towards S, instantiating (S,G) multicast tree state in the routers along the path. (S,G) multicast tree state is used only to forward packets for group G if those packets come from source S. Eventually the join message reaches S’s subnet or a router that already has (S,G) multicast tree state, and then packets from S start to flow following the (S,G) tree state towards the RP. These data packets can also reach routers with (*,G) state along the path towards the RP - if so, they can short-cut onto the RP tree at this point.

While the RP is in the process of joining the source-specific tree for S, the data packets will continue being encapsulated to the RP. When packets from S also start to arrive natively at the RP, the RP will be receiving two copies of each of these packets. At this point, the RP starts to discard the encapsulated copy of these packets and it sends a register-stop message back to S’s DR to prevent the DR unnecessarily encapsulating the packets. At the end of phase 2, traffic will be flowing natively from S along a source-specific tree to the RP and from there along the shared tree to the receivers. Where the two trees intersect, traffic can transfer from the shared RP tree to the shorter source tree.

Note:

A sender can start sending before or after a receiver joins the group, and thus, phase two may occur before the shared tree to the receiver is built.

5.1.1.3. Phase Three

In this phase, the RP joins back towards the source using the shortest path tree. Although having the RP join back towards the source removes the encapsulation overhead, it does not completely optimize the forwarding paths. For many receivers the route via the RP can involve a significant detour when compared with the shortest path from the source to the receiver.

To obtain lower latencies, a router on the receiver’s LAN, typically the DR, may optionally initiate a transfer from the shared tree to a source-specific shortest-path tree (SPT). To do this, it issues an (S,G) Join towards S. This instantiates state in the routers along the path to S. Eventually this join either reaches S’s subnet or reaches a router that already has (S,G) state. When this happens, data packets from S start to flow following the (S,G) state until they reach the receiver.

At this point the receiver (or a router upstream of the receiver) will be receiving two copies of the data - one from the SPT and one from the RPT. When the first traffic starts to arrive from the SPT, the DR or upstream router starts to drop the packets for G from S that arrive via the RP tree. In addition, it sends an (S,G) prune message towards the RP. The prune message travels hop-by-hop instantiating state along the path towards the RP indicating that traffic from S for G should not be forwarded in this direction. The prune message is propagated until it reaches the RP or a router that still needs the traffic from S for other receivers.

By now, the receiver will be receiving traffic from S along the shortest-path tree between the receiver and S. In addition, the RP is receiving the traffic from S, but this traffic is no longer reaching the receiver along the RP tree. As far as the receiver is concerned, this is the final distribution tree.

5.1.2. Encapsulating Data Packets in the Register Tunnel

Conceptually, the register tunnel is an interface with a smaller MTU than the underlying IP interface towards the RP. IP fragmentation on packets forwarded on the register tunnel is performed based upon this smaller MTU. The encapsulating DR can perform path-MTU discovery to the RP to determine the effective MTU of the tunnel. This smaller MTU takes both the outer IP header and the PIM register header overhead into consideration.

5.1.3. PIM Bootstrap Router Mechanism

For proper operation, every PIM-SM router within a PIM domain must be able to map a particular global-scope multicast group address to the same RP. If this is not possible, then black holes can appear (this is where some receivers in the domain cannot receive some groups). A domain in this context is a contiguous set of routers that all implement PIM and are configured to operate within a common boundary.

The bootstrap router (BSR) mechanism provides a way in which viable group-to-RP mappings can be created and distributed to all the PIM-SM routers in a domain. Each candidate BSR originates bootstrap messages (BSMs). Every BSM contains a BSR priority field. Routers within the domain flood the BSMs throughout the domain. A candidate BSR that hears about a higher-priority candidate BSR suppresses its sending of further BSMs for a period of time. The single remaining candidate BSR becomes the elected BSR and its BSMs inform the other routers in the domain that it is the elected BSR.

It is adaptive, meaning that if an RP becomes unreachable, it will be detected and the mapping tables will be modified so the unreachable RP is no longer used and the new tables will be rapidly distributed throughout the domain.

5.1.4. PIM-SM Routing Policies

Multicast traffic can be restricted from certain source addresses by creating routing policies. Join messages can be filtered using import filters. PIM join policies can be used to reduce denial of service attacks and subsequent PIM state explosion in the router and to remove unwanted multicast streams at the edge of the network before it is carried across the core. Route policies are created in the config>router>policy-options context. Join and register route policy match criteria for PIM-SM can specify the following:

  1. Router interface or interfaces specified by name or IP address.
  2. Neighbor address (the source address in the IP header of the join and prune message).
  3. Multicast group address embedded in the join and prune message.
  4. Multicast source address embedded in the join and prune message.

Join policies can be used to filter PIM join messages so no *,G or S,G state will be created on the router.

Table 15:  Join Filter Policy Match Conditions  

Match Condition

Matches the:

Interface

RTR interface by name

Neighbor

The neighbors source address in the IP header

Group Address

Multicast Group address in the join/prune message

Source Address

Source address in the join/prune message

PIM register message are sent by the first hop designated router that has a direct connection to the source. This serves a dual purpose:

  1. Notifies the RP that a source has active data for the group
  2. Delivers the multicast stream in register encapsulation to the RP and its potential receivers.
  3. If no one has joined the group at the RP, the RP will ignore the registers.

In an environment where the sources to particular multicast groups are always known, it is possible to apply register filters at the RP to prevent any unwanted sources from transmitting multicast stream. You can apply these filters at the edge so that register data does not travel unnecessarily over the network towards the RP.

Table 16:  Register Filter Policy Match Conditions  

Match Condition

Matches the:

Interface

RTR interface by name

Group Address

Multicast Group address in the join/prune message

Source Address

Source address in the join/prune message

5.1.5. Reverse Path Forwarding Checks

Multicast implements a reverse path forwarding check (RPF). RPF checks the path that multicast packets take between their sources and the destinations to prevent loops. Multicast requires that an incoming interface is the outgoing interface used by unicast routing to reach the source of the multicast packet. RPF forwards a multicast packet only if it is received on an interface that is used by the router to route to the source.

If the forwarding paths are modified due to routing topology changes then any dynamic filters that may have been applied must be re-evaluated. If filters are removed then the associated alarms are also cleared.

5.1.6. Anycast RP for PIM-SM

The implementation of Anycast RP for PIM-SM environments enable fast convergence when a PIM rendezvous point (RP) router fails by allowing receivers and sources to rendezvous at the closest RP. It allows an arbitrary number of RPs per group in a single shared-tree protocol Independent Multicast-Sparse Mode (PIM-SM) domain. This is, in particular, important for triple play configurations that opt to distribute multicast traffic using PIM-SM, not SSM. In this case, RP convergence must be fast enough to avoid the loss of multicast streams which could cause loss of TV delivery to the end customer.

Anycast RP for PIM-SM environments is supported in the base routing/PIM-SM instance of the service router. This feature is supported in Layer 3-VPRN instances that are configured with PIM.

5.1.6.1. Implementation

The Anycast RP for PIM-SM implementation is defined in RFC 4610, Anycast-RP Using Protocol Independent Multicast (PIM), and is similar to that described in RFC 3446, Anycast Rendezvous Point (RP) mechanism using Protocol Independent Multicast (PIM) and Multicast Source Discovery Protocol (MSDP), and extends the register mechanism in PIM so Anycast RP functionality can be retained without using Multicast Source Discovery Protocol (MSDP) (see Multicast in Virtual Private Networks).

The mechanism works as follows:

  1. An IP address is chosen to use as the RP address. This address is statically configured, or distributed using a dynamic protocol, to all PIM routers throughout the domain.
  2. A set of routers in the domain are chosen to act as RPs for this RP address. These routers are called the Anycast-RP set.
  3. Each router in the Anycast-RP set is configured with a loopback interface using the RP address.
  4. Each router in the Anycast-RP set also needs a separate IP address to be used for communication between the RPs.
  5. The RP address, or a prefix that covers the RP address, is injected into the unicast routing system inside of the domain.
  6. Each router in the Anycast-RP set is configured with the addresses of all other routers in the Anycast-RP set. This must be consistently configured in all RPs in the set.
Figure 1:  Anycast RP for PIM-SM Implementation Example 

Assume the scenario in Figure 1 is completely connected where R1A, R1B, and R2 are receivers for a group, and S1 and S2 send to that group. Assume RP1, RP2, and RP3 are all assigned the same IP address which is used as the Anycast-RP address (for example, the IP address is RPA).

Note:

The address used for the RP address in the domain (the Anycast-RP address) must be different than the addresses used by the Anycast-RP routers to communicate with each other.

The following procedure is used when S1 starts sourcing traffic:

  1. S1 sends a multicast packet.
  2. The DR directly attached to S1 forms a PIM register message to send to the Anycast-RP address (RPA). The unicast routing system delivers the PIM register message to the nearest RP, in this case RP1A.
  3. RP1 receives the PIM register message, de-encapsulates it, and sends the packet down the shared-tree to get the packet to receivers R1A and R1B.
  4. RP1 is configured with RP2 and RP3’s IP address. Because the register message did not come from one of the RPs in the anycast-RP set, RP1 assumes the packet came from a DR. If the register message is not addressed to the Anycast-RP address, an error has occurred and it should be rate-limited logged.
  5. RP1 sends a copy of the register message from S1’s DR to both RP2 and RP3. RP1 uses its own IP address as the source address for the PIM register message.
  6. RP1 may join back to the source-tree by triggering a (S1,G) Join message toward S1; however, RP1 must create (S1,G) state.
  7. RP2 receives the register message from RP1, de-encapsulates it, and also sends the packet down the shared-tree to get the packet to receiver R2.
  8. RP2 sends a register-stop message back to the RP1. RP2 may wait to send the register-stop message if it decides to join the source-tree. RP2 should wait until it has received data from the source on the source-tree before sending the register-stop message. If RP2 decides to wait, the register-stop message will be sent when the next register is received. If RP2 decides not to wait, the register-stop message is sent now.
  9. RP2 may join back to the source-tree by triggering a (S1,G) Join message toward S1; however, RP2 must create (S1,G) state.
  10. RP3 receives the register message from RP1, de-encapsulates it, but since there are no receivers joined for the group, it can discard the packet.
  11. RP3 sends a register-stop message back to the RP1.
  12. RP3 creates (S1,G) state so when a receiver joins after S1 starts sending, RP3 can join quickly to the source-tree for S1.
  13. RP1 processes the register-stop message from each of RP2 and RP3. RP1 may cache on a per-RP/per-(S,G) basis the receipt of register-stop message messages from the RPs in the anycast-RP set. This option is performed to increase the reliability of register message delivery to each RP. When this option is used, subsequent register messages received by RP1 are sent only to the RPs in the Anycast-RP set which have not previously sent register-stop message messages for the (S,G) entry.
  14. RP1 sends a register-stop message back to the DR the next time a register message is received from the DR and (when the option in the last bullet is in use) if all RPs in the Anycast-RP set have returned register-stop messages for a particular (S,G) route.

The procedure for S2 sending follows the same steps as above, but it is RP3 which sends a copy of the register originated by S2’s DR to RP1 and RP2. Therefore, this example shows how sources anywhere in the domain, associated with different RPs, can reach all receivers, also associated with different RPs, in the same domain.

5.1.7. Distributing PIM Joins over Multiple ECMP Paths

Commonly used multicast load-balancing method is per bandwidth/round robin, but the interface in an ECMP set can also be used for a particular channel to be predictable without knowing anything about the other channels using the ECMP set.

The mc-ecmp-hashing-enabled command enables PIM joins to be distributed over the multiple ECMP paths based on a hash of S and G. When a link in the ECMP set is removed, the multicast streams that were using that link are re-distributed over the remaining ECMP links using the same hash algorithm. When a link is added to the ECMP set, new joins may be allocated to the new link based on the hash algorithm. Existing multicast streams using the other ECMP links stay on those links until they are pruned, unless the rebalance option is specified.

The default is no mc-ecmp-hashing-enabled, which means that the use of multiple ECMP paths (if enabled at the config>service>vprn context) is controlled by the existing implementation and CLI commands, that is, mc-ecmp-balance.

The mc-ecmp-hashing-enabled command and the mc-ecmp-balance command cannot be used together in the same context.

To achieve distribution of streams across the ECMP links, following are the hashings steps:

  1. For a given S, G get all possible nHops.
  2. Sort these nHops based on nhops address.
  3. xor S and G addresses.
  4. Hash the xor address over number of pim next hops.
  5. Use the hash value obtained in step 4, and get that element, in the sorted list, we obtained in step 2 as the preferred nHop.
  6. If this element is not available/is not a PIM Next hop (PIM neighbor), the next available next hop is chosen.

The following example displays PIM status indicating ECMP Hashing is disabled:

*B:BB# show router 100 pim status
 
===============================================================================
PIM Status ipv4
===============================================================================
Admin State                       : Up
Oper State                        : Up
 
IPv4 Admin State                  : Up
IPv4 Oper State                   : Up
 
BSR State                         : Accept Any
 
Elected BSR
    Address                       : None
    Expiry Time                   : N/A
    Priority                      : N/A
    Hash Mask Length              : 30
    Up Time                       : N/A
    RPF Intf towards E-BSR        : N/A
 
Candidate BSR
    Admin State                   : Down
    Oper State                    : Down
    Address                       : None
    Priority                      : 0
    Hash Mask Length              : 30
 
Candidate RP
    Admin State                   : Down
    Oper State                    : Down
    Address                       : 0.0.0.0
    Priority                      : 192
    Holdtime                      : 150
 
SSM-Default-Range                 : Enabled
SSM-Group-Range
    None
 
MC-ECMP-Hashing                   : Disabled
 
Policy                            : None
 
RPF Table                         : rtable-u
 
Non-DR-Attract-Traffic            : Disabled
===============================================================================
 
----------------------------------------------
*B:BB>config>service>vprn>pim# no mc-ecmp-balance mc-ecmp-balance mc-ecmp-balance-
hold
*B:BB>config>service>vprn>pim# no mc-ecmp-balance 
*B:BB>config>service>vprn>pim# mc-ecmp-mc-ecmp-balance mc-ecmp-balance-hold mc-ecmp-
hashing-enabled
*B:BB>config>service>vprn>pim# mc-ecmp-hashing-enabled
*B:BB>config>service>vprn>pim# info
----------------------------------------------
                apply-to all
                rp
                    static
                        address 10.3.3.3
                            group-prefix 224.0.0.0/4
                        exit
                    exit
                    bsr-candidate
                        shutdown
                    exit
                    rp-candidate
                        shutdown
                    exit
                exit
                no mc-ecmp-balance
                mc-ecmp-hashing-enabled
----------------------------------------------
*B:BB>config>service>vprn>pim#
apply-to        - Create/remove interfaces in PIM
 [no] import          - Configure import policies
 [no] interface       + Configure PIM interface
 [no] mc-ecmp-balance - Enable/
Disable multicast balancing of traffic over ECMP links
 [no] mc-ecmp-balanc* - Configure hold time for multicast balancing over ECMP links
 [no] mc-ecmp-hashin* - Enable/
Disable hash based multicast balancing of traffic over ECMP links
 [no] non-dr-attract* - Enable/disable attracting traffic when not DR
      rp              + Configure the router as static or Candidate-RP
 [no] shutdown        - Administratively enable or disable the operation of PIM 
 [no] spt-switchover* -
 Configure shortest path tree (spt tree) switchover threshold for a group prefix
 [no] ssm-default-ra* - Enable the disabling of SSM Default Range
 [no] ssm-groups      + Configure the SSM group ranges
 
 

The following example shows distribution of PIM joins over multiple ECMP paths.

*A:BA# show router 100 pim group
 
===============================================================================
PIM Groups ipv4
===============================================================================
Group Address                           Type     Spt Bit Inc Intf       No.Oifs
   Source Address                          RP
-------------------------------------------------------------------------------
239.1.1.1                               (S,G)    spt     to_C0          1
   172.0.100.33                            10.20.1.6
239.1.1.2                               (S,G)    spt     to_C3          1
   172.0.100.33                            10.20.1.6
239.1.1.3                               (S,G)    spt     to_C2          1
   172.0.100.33                            10.20.1.6
239.1.1.4                               (S,G)    spt     to_C1          1
   172.0.100.33                            10.20.1.6
239.1.1.5                               (S,G)    spt     to_C0          1
   172.0.100.33                            10.20.1.6
239.1.1.6                               (S,G)    spt     to_C3          1
   172.0.100.33                            10.20.1.6
 
239.2.1.1                               (S,G)    spt     to_C0          1
   172.0.100.33                            10.20.1.6
239.2.1.2                               (S,G)    spt     to_C3          1
   172.0.100.33                            10.20.1.6
239.2.1.3                               (S,G)    spt     to_C2          1
   172.0.100.33                            10.20.1.6
239.2.1.4                               (S,G)    spt     to_C1          1
   172.0.100.33                            10.20.1.6
239.2.1.5                               (S,G)    spt     to_C0          1
   172.0.100.33                            10.20.1.6
239.2.1.6                               (S,G)    spt     to_C3          1
   172.0.100.33                            10.20.1.6
 
239.3.1.1                               (S,G)    spt     to_C0          1
   172.0.100.33                            10.20.1.6
239.3.1.2                               (S,G)    spt     to_C3          1
   172.0.100.33                            10.20.1.6
239.3.1.3                               (S,G)    spt     to_C2          1
   172.0.100.33                            10.20.1.6
239.3.1.4                               (S,G)    spt     to_C1          1
   172.0.100.33                            10.20.1.6
239.3.1.5                               (S,G)    spt     to_C0          1
   172.0.100.33                            10.20.1.6
239.3.1.6                               (S,G)    spt     to_C3          1
   172.0.100.33                            10.20.1.6
 
239.4.1.1                               (S,G)    spt     to_C0          1
   172.0.100.33                            10.20.1.6
239.4.1.2                               (S,G)    spt     to_C3          1
   172.0.100.33                            10.20.1.6
239.4.1.3                               (S,G)    spt     to_C2          1
   172.0.100.33                            10.20.1.6
239.4.1.4                               (S,G)    spt     to_C1          1
   172.0.100.33                            10.20.1.6
239.4.1.5                               (S,G)    spt     to_C0          1
   172.0.100.33                            10.20.1.6
239.4.1.6                               (S,G)    spt     to_C3          1
   172.0.100.33                            10.20.1.6
-------------------------------------------------------------------------------
Groups : 24
===============================================================================
 

5.1.8. PIM Interface on IES Subscriber Group Interfaces

PIM on a subscriber group interface allows for SAP-level replication over an ESM Group interface by establishing PIM adjacency to a downstream router. Figure 2 depicts the model:

Figure 2:  PIM Interface on IES Subscriber Group Interface 

On an IES subscriber-interface, an Ethernet SAP is configured (LAG or physical port). On the SAP, a static-host is configured for connectivity to downstream Layer 3 aggregation devices (including PIM adjacency) while multiple default-hosts can be configured for subscriber traffic. Single SAP with a single static-host per group interface is supported to establish PIM adjacency on a given subscriber group interface. Both IPv4 PIM ASM and SSM are supported.

Feature caveats:

  1. Only IPv4 PIM is supported with a single static host used to form a PIM interface under a group interface. Using multiple hosts or non-static hosts is not supported. Configuring IPv6-related parameters in config>router>pim>interface group-ift is not blocked, but takes no effect.
  2. config>router>pim>apply-to configuration does not apply to PIM interfaces on IES subscriber group interfaces.
  3. PIM on group interfaces is not supported in VPRN context.
  4. Extranet is not supported.
  5. Locally attached receivers are not supported (no IGMP/MLD and PIM mix in OIF list).
  6. Default anti-spoofing must be configured (IP+MAC).
  7. A subscriber profile with pim-policy enabled cannot combine with the following policies (config>subscr-mgmt>sub-prof):
    1. [no] host-tracking — Apply a host tracking policy
    2. [no] igmp-policy — Apply an IGMP policy
    3. [no] mld-policy — Apply an MLD policy
    4. [no] nat-policy — Apply a NAT policy
    5. [no] sub-mcac-policy — Apply a subscriber MCAC policy (MCAC policy can be used when configured in PIM interface context)
  8. The feature is supported on IOM3-XP or newer line cards. When enabling the feature on older hardware, joins may be accepted and an outgoing interface may be created for the group, but traffic will not be sent out on egress because no OIF is created in forwarding.

5.1.9. Multicast-Only Fast Reroute (MoFRR)

With large scale multicast deployments, a link or nodal failure impacts multiple subscribers or a complete region or segment of receivers. This failure interrupts the receiver client experience. Besides the impact on user experience, though multicast client applications may buffer streams for short period of time, the loss of stream data may trigger unicast request for the missing stream data to the source in certain middleware implementations. Those requests can overload the network resources, if a traffic loss persists for a prolonged period.

To minimize service interruption to end-users and protect the network from sudden surge of unicast requests, SR OS implements a fast failover scheme for native IP networks. SR OS MoFRR implementation is based on RFC 7431, Multicast-Only Fast Reroute, and relies on:

  1. sending a join to a primary and a single standby upstream nodes over disjoined paths
  2. fast failover to a standby stream upon detection of a failure

The functionality relies on failure detection on the primary path to switch to forwarding the traffic from the standby path. The traffic failure can happen with or without physical links or nodes going down. Various mechanisms for link or node failure detections are supported; however, to achieve best performance and resilience, it is recommended to enable MoFRR on every node in the network and use hop-by-hop BFD for fast link failure or data plane failure detection on each upstream link. Without BFD, the PIM adjacency loss or route change could be used to detect traffic failure. Figure 3 and Figure 4 depict MoFRR behavior.

Figure 3:  MoFRR Steady State No Failure 
Figure 4:  MoFRR Switch to Standby Stream on a Link Failure 

MoFRR functionality supports the following:

  1. IPv4 or IPv6 link or node failure protection in global routing instance
  2. Rosen PIM-SSM with MDT SAFI
  3. active streams and a single standby stream over disjoint ECMP paths
  4. active streams and a single standby stream joins over IS-IS or OSPF Loop-Free Alternate paths
  5. all regular PIM interfaces supporting MoFRR for all multicast streams (tunnel interfaces are ignored)
Note:

MoFRR (config>router>pim>multicast-fast-failover or config>router>pim>multicast6-fast-failover) cannot be configured when GTM auto-discovery (config>router>pim>gtm>auto-discovery) is enabled.

5.1.10. Automatic Discovery of Group-to-RP Mappings (Auto-RP)

Auto-RP is a proprietary group discovery and mapping mechanism for IPv4 PIM that is described in cisco-ipmulticast/pim-autorp-spec, Auto-RP: Automatic discovery of Group-to-RP mappings for IP multicast. The functionality is similar to the IETF standard bootstrap router (BSR) mechanism that is described in RFC 5059, Bootstrap Router (BSR) Mechanism for Protocol Independent Multicast (PIM), to dynamically learn about the availability of Rendezvous Points (RPs) in a network. When a router is configured as an RP-mapping agent with the pim>rp>auto-rp-discovery command, it listens to the CISCO-RP-ANNOUNCE (224.0.1.39) group and caches the announced mappings. The RP-mapping agent then periodically sends out RP-mapping packets to the CISCO-RP-DISCOVERY (224.0.1.40) group. PIM dense-mode (PIM-DM) as described in RFC 3973, Protocol Independent Multicast - Dense Mode (PIM-DM): Protocol Specification (Revised), is used for the auto-RP groups to support multihoming and redundancy. The RP-mapping agent supports announcing, mapping, and discovery functions; candidate RP functionality is not supported. SR OS supports version 1 of the Auto-RP specification; the ability to deny RP-mappings by advertising negative group prefixes is not supported.

Auto-RP is supported for IPv4 in multicast VPNs and in the global routing instance. Either BSR or auto-RP for IPv4 can be configured; the two mechanisms cannot be enabled together. BSR for IPv6 and auto-RP for IPv4 can be enabled together. In a multicast VPN, auto-RP cannot be enabled together with sender-only or receiver-only multicast distribution trees (MDTs), or wildcard S-PMSI configurations that could block flooding.

5.1.11. VRRP Aware PIM

The Virtual Router Redundancy Protocol (VRRP) eliminates the single point of failure inherent in the static default-routed environment. VRRP describes a method of implementing a redundant IP interface that provides dynamic failover if the VRRP master router (MR) becomes unavailable.

VRRP provides information on the state of a router. However, PIM operates independently of VRRP group states. The PIM DR and the VRRP MR may not be the same router and IP multicast traffic may not necessarily follow the same path as elected by VRRP.

In order to leverage the redundancy capabilities of VRRP that are lacking in PIM, the VRRP Aware PIM mechanism allows PIM to monitor and react to changes in the VRRP MR. This ensures that the multicast traffic follows the unicast traffic through the same gateway as the VRRP MR, providing consistent IP multicast forwarding in a redundant network.

5.1.11.1. Configuring VRRP Aware PIM

The VRRP Aware PIM feature enables PIM to track the state of a VRRP instance and to identify whether the associated VRRP interface is the master. PIM uses an operational group parameter (oper-group group-name) to monitor the state of VRRP. One operational group can be created for IPv4, and another for IPv6. When VRRP is the MR, the operational group is up; for all other VRRP states, the operational group is down. A VRRP instance can only be associated with one operational group, and an operational group can have one or more associated VRRP instances. This feature is supported on base router, IES, and VPRN interfaces.

If the monitored interface is the VRRP MR, PIM becomes the DR by setting its priority to the configured oper-group-active-priority value. In order for the router to become the DR, the proper priorities must be configured so that the oper-group-active-priority is the highest priority on the IP interface.

If a PIM router is the DR and then receives an indication from VRRP that the interface is no longer the VRRP MR, PIM relinquishes the DR role by setting its priority back to the default or configured priority value.

If the configured VRRP instance or oper-group is not configured, PIM operates as normal with the default or configured priority value, and does not set its priority to oper-group-active-priority. A change in the operational group status is independent of the address family; IPv4 and IPv6 priorities are configured independently of each other. Two operational groups are supported per PIM interface, one for IPv4 and one for IPv6.

5.1.11.2. Configuration Recommendations

When configuring VRRP Aware PIM, consider the following recommendations.

  1. VRRP could be configured to use BFD to speed up failure detection in addition to the functionality provided by VRRP Aware PIM.
  2. To optimize failover, the config>router>pim>non-dr-attract-traffic command can be enabled on the primary and secondary routers to make them a hot-standby redundant pair. This configuration ignores the DR state and attracts traffic to populate the router’s PIM database. This setting should not be used if multicast traffic must only follow the VRRP MR.
  3. The config>service>oper-group>hold-time>group>up time on the primary router and config>service>oper-group>hold-time>group>down time on the secondary router should both be set to the time needed to repopulate the PIM database; for example, 10 seconds. This allows the primary router to populate its PIM database again before becoming the DR if a failure occurs from the primary to secondary router, and recover from the secondary back to the primary router.
  4. The config>service>oper-group>hold-time>group>up time should be set to 0 on the secondary router so that it assumes the DR role immediately if the primary router fails. The up hold time is set to 4 seconds by default, which delays the DR change unnecessarily.
  5. The sticky DR setting should be disabled if it is configured with the config>router>pim>if>sticky-dr command. Sticky DR enables the secondary router to continue to act as the DR after the primary router comes back up. Sticky DR is incompatible with the VRRP Aware PIM mechanism that tracks the VRRP MR.

The following is a basic configuration example for VRRP Aware PIM.

service
    oper-group “VAwP1” create
    exit
vprn 1 customer 1 create
    interface to-LAN
        vrrp 1 create
            oper-group “VAwP1”
        exit   
   pim
       interface to-LAN
           monitor-oper-group “VAwP1” family ipv4 add 90
           monitor-oper-group “VAwP1” family ipv6 add 90
       exit
       interface to-LAN2
           monitor-oper-group “VAwP1” family ipv4 add 90
           monitor-oper-group “VAwP2” family ipv6 set 90
       exit
   exit
exit

5.1.11.2.1. Primary Router Example

*B:Dut-C>config>service# info
----------------------------------------------
        oper-group "vrrp1_1" create
            hold-time
                group up 10
            exit
        exit
        oper-group "vrrp1_1_ipv6" create
            hold-time
                group up 10
            exit
        exit
        customer 1 create
            description "Default customer"
        exit
        vprn 1 customer 1 create
            interface "toRemoteSite_1001" create
                address 10.1.1.5/24
                bfd 500 receive 500 multiplier 3
                vrrp 1
                    backup 10.1.1.100
                    priority 200
                    ping-reply
                    message-interval 5
                    oper-group "vrrp1_1"
                    bfd-enable 1 interface "toRemoteSite_1001" dst-ip 10.1.1.4
                exit
                ipv6
                    address 2001:db8:1:1:1:5/112
                    link-local-address ff00:db8:1:1:1:5 preferred
                    bfd 500 receive 500 multiplier 3
                    vrrp 1
                        backup ff00:db8:1:1:1:100
                        priority 200
                        ping-reply
                        message-interval 5
                        oper-group "vrrp1_1_ipv6"
                        bfd-enable 1 interface "toRemoteSite_1001" 
dst-ip 2001:db8:1:1:1:4
                    exit
                exit
            exit
            interface "toDC" create
                address 10.1.30.5/24
                bfd 500 receive 500 multiplier 3
                vrrp 255
                    backup 10.1.30.100
                    priority 200
                    policy 1
                    ping-reply
                    message-interval 5
                    bfd-enable 1 interface "toDC" dst-ip 10.1.30.4
                exit
                ipv6
                    address 2001::db8:1:30:5/112
                    link-local-address ff00:db8:30:1:30:5 preferred
                    bfd 500 receive 500 multiplier 3
                    vrrp 255
                        backup ff00:db8:30:1:30:100
                        priority 200
                        policy 1001
                        ping-reply
                        message-interval 5
                        bfd-enable 1 interface "toDC" dst-ip 2001::db8:1:30:4
                    exit
                exit
                sap 2/1/2:1 create
                exit
            exit
            router-advertisement
                interface "toRemoteSite_1001"
                    use-virtual-mac
                    no shutdown
                exit
                interface "toDC"
                    use-virtual-mac
                    no shutdown
                exit
            exit
            igmp
                interface "toDC"
                    no shutdown
                exit
                no shutdown
            exit
            mld
                interface "toDC"
                    no shutdown
                exit
                no shutdown
            exit
            pim
                no ipv6-multicast-disable
                interface "toRemoteSite_1001"
                    monitor-oper-group "vrrp1_1" family ipv4 set 5
                    monitor-oper-group "vrrp1_1_ipv6" family ipv6 set 5
                exit
                interface "toDC"
                    monitor-oper-group "vrrp1_1" family ipv4 set 5
                    monitor-oper-group "vrrp1_1_ipv6" family ipv6 set 5
                exit
                rp
                    static
                        address 10.1.10.245
                            group-prefix 224.0.0.0/4
                        exit
                    exit
                    bsr-candidate
                        shutdown
                    exit
                    rp-candidate
                        shutdown
                    exit
                    ipv6
                        static
                            address 2001:db8:1:10:245
                                group-prefix ff00:db8::/8
                            exit
                        exit
                    exit
                exit
                non-dr-attract-traffic
                no shutdown
            exit
            no shutdown
        exit

5.1.11.2.2. Secondary Router Example

*B:Dut-E>config>service# info
----------------------------------------------
        oper-group "vrrp1_1" create
            hold-time
                group down 10
                group up 0
            exit
        exit
        oper-group "vrrp1_1_ipv6" create
            hold-time
                group down 10
                group up 0
            exit
        exit
        customer 1 create
            description "Default customer"
        exit
        vprn 1 customer 1 create
            snmp
                community "XldhYQtqb7c" hash2 rw version both
            exit
            route-distinguisher 10.1.10.244:1
            interface "system" create
                address 10.1.10.244/32
                ipv6
                    address 2001:db8:1:10:244/128
                exit
                loopback
            exit
            interface "toRemoteSite_1001" create
                address 10.1.1.4/24
                ip-mtu 1454
                bfd 500 receive 500 multiplier 3
                vrrp 1
                    backup 10.1.1.100
                    ping-reply
                    standby-forwarding
                    message-interval 5
                    oper-group "vrrp1_1"
                    bfd-enable 1 interface "toRemoteSite_1001" dst-ip 10.1.1.5
                exit
                ipv6
                    address 2001:db8:1:1:4/112
                    link-local-address ff00:db8:1:1:1:4 preferred
                    bfd 500 receive 500 multiplier 3
                    vrrp 1
                        backup ff00:db8:1:1:1:100
                        ping-reply
                        standby-forwarding
                        message-interval 5
                        oper-group "vrrp1_1_ipv6"
                        bfd-enable 1 interface "toRemoteSite_1001" 
dst-ip 2001:db8:1:1:5
                    exit
                exit
            exit
            interface "toDC" create
                address 10.1.30.4/24
                bfd 500 receive 500 multiplier 3
                vrrp 255
                    backup 10.1.30.100
                    ping-reply
                    standby-forwarding
                    message-interval 5
                    bfd-enable 1 interface "toDC" dst-ip 10.1.30.5
                exit
                ipv6
                    address 2001:db8:1:30:4/112
                    link-local-address ff00:db8:30:1:30:4 preferred
                    bfd 500 receive 500 multiplier 3
                    vrrp 255
                        backup ff00:db8:30:1:30:100
                        ping-reply
                        standby-forwarding
                        message-interval 5
                        bfd-enable 1 interface "toDC" dst-ip 2001::db8:1:30:5
                    exit
                exit
                sap 1/1/5:1 create
                exit
            exit
            static-route-entry 10.1.10.245/32
                next-hop 10.1.30.5
                    no shutdown
                exit
            exit
            static-route-entry 2001:db8:1:10:245/128
                next-hop 2001:db8:1:30:5
                    no shutdown
                exit
            exit
            router-advertisement
                interface "toRemoteSite_1001"
                    use-virtual-mac
                    no shutdown
                exit
                interface "toDC"
                    use-virtual-mac
                    no shutdown
                exit
            exit
            igmp
                interface "toDC"
                    no shutdown
                exit
                no shutdown
            exit
            mld
                interface "toDC"
                    no shutdown
                exit
                no shutdown
            exit
            pim
                no ipv6-multicast-disable
                interface "toRemoteSite_1001"
                    monitor-oper-group "vrrp1_1" family ipv4 set 255
                    monitor-oper-group "vrrp1_1_ipv6" family ipv6 set 255
                exit
                interface "toDC"
                    monitor-oper-group "vrrp1_1" family ipv4 set 255
                    monitor-oper-group "vrrp1_1_ipv6" family ipv6 set 255
                exit
                rp
                    static
                        address 10.1.10.245
                            group-prefix 224.0.0.0/4
                        exit
                    exit
                    bsr-candidate
                        shutdown
                    exit
                    rp-candidate
                        shutdown
                    exit
                    ipv6
                        static
                            address 2001:db8:1:10:245
                                group-prefix ff00:db8:/8
                            exit
                        exit
                    exit
                exit
                non-dr-attract-traffic
                no shutdown
            exit
            no shutdown
        exit

5.2. IPv6 PIM models

IPv6 multicast enables multicast applications over native IPv6 networks. There are two service models: Any Source Multicast (ASM) and Source Specific Multicast (SSM) which includes PIM-SSM and MLD (see MLD Overview). SSM does not require source discovery and only supports single source for a specific multicast stream. As a result, SSM is easier to operate in a large scale deployment that uses the one-to-many service model.

5.2.1. PIM-SSM

The IPv6 address family for SSM model is supported. This includes the ability to choose which RTM table to use (unicast RTM, multicast RTM, or both). OSPF3, IS-IS and static-route have extensions to support submission of routes into the IPv6 multicast RTM.

5.2.2. PIM ASM

IPv6 PIM ASM is supported. All PIM ASM related functions such as bootstrap router, RP, and so on, support both IPv4 and IPv6 address-families. IPv6 specific parameters are configured under config>router>pim>rp>ipv6.

5.2.3. Embedded RP

The detailed protocol specification is defined in RFC 3956, Embedding the Rendezvous Point (RP) Address in an IPv6 Multicast Address. This RFC describes a multicast address allocation policy in which the address of the RP is encoded in the IPv6 multicast group address, and specifies a PIM-SM group-to-RP mapping to use the encoding, leveraging, and extending unicast-prefix-based addressing. This mechanism not only provides a simple solution for IPv6 inter-domain ASM but can be used as a simple solution for IPv6 intra-domain ASM with scoped multicast addresses as well. It can also be used as an automatic RP discovery mechanism in those deployment scenarios that would have previously used the Bootstrap Router protocol (BSR).