Quality of Service (QoS) provides an appropriate level of service for packets as they flow inside the switch and between switches in the network. The required level of service depends on the application that generates the flow of packets, and can be defined by the application’s sensitivity to packet loss, delay, and jitter.
Packets that require a similar treatment (per-hop behavior) are grouped into a Forwarding Class (FC), also known as a behavior aggregate. Up to eight forwarding classes can be specified. Traffic is scheduled and can optionally be marked based on its forwarding class.
A configurable drop probability expresses the sensitivity of packets to packet loss. Packets should be assigned a low drop probability when they are sensitive to loss. Proper congestion management relies on a balance of traffic that is classified as low, medium, and high drop probability so that discard decisions can be made intelligently when there is congestion.
QoS functionality is supported on the 7250 IXR, 7220 IXR-D2 and D3, and the 7220 IXR-H2 and H3.
This section describes how QoS applies to transit packets on the SR Linux.
![]() | Note: If there is no entry of this policy matching the received DSCP, then the assigned forwarding class is fc0 and the assigned drop probability is low. This FC and drop probability classification corresponds to a best effort treatment. |
DSCP values | Included DSCP names | Forwarding class | Drop probability |
0, 2 to 7 | CS0/BE | fc0 | Low |
1 | LE | fc0 | High |
8 to 11 | CS1, AF11 | fc1 | Low |
12 to 13 | AF12 | fc1 | Medium |
14 to 15 | AF13 | fc1 | High |
16 to 19 | CS2, AF21 | fc2 | Low |
20 to 21 | AF22 | fc2 | Medium |
22 to 23 | AF23 | fc2 | High |
24 to 27 | CS3, AF31 | fc3 | Low |
28 to 29 | AF32 | fc3 | Medium |
30 to 31 | AF33 | fc3 | High |
32 to 35 | CS4, AF41 | fc4 | Low |
36 to 37 | AF42 | fc4 | Medium |
38 to 39 | AF43 | fc4 | High |
40 to 47 | CS5, EF | fc5 | Low |
48 to 55 | CS6/NC1 | fc6 | Low |
56 to 63 | CS7/NC2 | fc7 | Low |
![]() | Note: On all VLAN-based subinterfaces the 802.1p bits are currently ignored for purposes of forwarding class and drop-probability classification. |
![]() | Note: The operational values of the max-probability may be significantly different from the configured values due to internal hardware calculations. You can check the hardware configured values for any slope calculations. |
![]() | Note: The operational values of the max-probability may be significantly different from the configured values due to internal hardware calculations. You can check the hardware configured values for any WRED slope calculations. |
When the 7220 IXR-D2 and D3 receives a terminating VXLAN packet on a subinterface, it classifies the packet to one of eight forwarding classes and one of three drop probabilities (low, medium, or high). The classification is based on the following considerations.
When the 7220 IXR-D2 and D3 adds VXLAN encapsulation to a packet and forwards it out a subinterface, the inner header IP DSCP value is not modified if the payload packet is IP, even if the egress routed subinterface has a DSCP rewrite rule policy bound to it that matches the packet FC and drop probability. The outer header IP DSCP is modified by the DSCP rewrite rule policy that is bound to the egress routed subinterface, if such a policy exists.
This section describes how QoS applies to traffic that terminates on the SR Linux.
This section describes how QoS applies to traffic that originates on the SR Linux.
Protocol / message type | Forwarding class | Drop probability | DSCP marking |
IPv4 ARP request/reply | 6 | Low | N/A |
ICMPv4 including echo-request 1, echo- reply 2, dest-unreachable, redirect, time-exceeded, parameter-problem | 0 | Medium | 0 |
ICMPv4 echo-request with ToS/DSCP override = x | look up X in system-default DSCP classifier | look up X in system-default DSCP classifier | x |
ICMPv4 echo-reply to echo-request with non-zero DSCP x | look up X in system-default DSCP classifier | look up X in system-default DSCP classifier | x |
UDP traceroute | 0 | Low | 0 |
IPv6 neighbor solicitation | 6 | Low | 48 (CS6/NC1) |
IPv6 neighbor advertisement | 6 | Low | 48 (CS6/NC1) |
All other ICMPv6 including dest unreachable, packet-too-big, time-exceeded, parameter-problem, echo-request, echo-reply, router-solicitation, redirect | 0 | Medium | 0 |
ICMPv6 echo-request with DSCP override = x | look up x in system-default DSCP classifier | look up x in system-default DSCP classifier | x |
ICMPv6 echo-reply to echo-request with non-zero DSCP x | look up x in system-default DSCP classifier | look up x in system-default DSCP classifier | x |
BFD | 6 | Low | 48 (CS6/NC1) |
BGP | 6 | Low | 48 (CS6/NC1) |
DNS query | 4 | Low | 34 (AF41) |
FTP/TFTP | 4 | Low | 34 (AF41) |
gNMI | 4 | Low | 34 (AF41) |
JSON RPC | 4 | Low | 34 (AF41) |
LLDP | N/A | Low | N/A |
NTP | 4 | Low | 34 (AF41) |
sFlow | 0 | Low | 0 |
SNMP | 4 | Low | 34 (AF41) |
SSH | 4 | Low | 34 (AF41) |
Syslog | 4 | Low | 34 (AF41) |
TACACS+ | 4 | Low | 34 (AF41) |
Notes:
QoS configuration on the SR Linux involves tasks such as creating classifier policies for incoming packets, creating rewrite-rule policies for outgoing packets, creating queue-templates, and applying these different constructs to the appropriate objects.
When a DSCP classifier policy is applied to a subinterface, the policy attempts to match the 6-bit DSCP value in the IP header of incoming packets to one of its entries. If there is a match, the incoming packet is assigned to the specified forwarding class and drop probability; otherwise, the assigned forwarding class is 0 and the assigned drop probability is low.
Packets that require a similar treatment (per-hop behavior) are grouped into an FC, also known as a behavior aggregate. The SR Linux differentiates up to eight forwarding classes.
The drop probability can be one of high, medium, or low. If a queue-template with different WRED slopes is bound to a queue, then packets in that queue with a high drop probability are the first to be dropped when the queue experiences congestion, followed by packets with a medium drop probability, then by packets with a low-drop probability. The default is low.
Example:
The following example creates a DSCP classifier policy:
![]() | Note: To create a new DSCP classification policy based on the default policy, you can copy the default policy from state in candidate mode. |
Example:
# copy from state /qos classifiers dscp-policy default to /qos classifiers dscp-policy test
For information about how to apply a classifier policy, see section 15.6.3.1.
When a DSCP rewrite-rule policy is applied to a subinterface, the policy attempts to match the forwarding class (and optionally, also the drop-probability) of outbound packets to one of its entries. If there is a match, the DSCP value of the outbound packet is changed to the value specified by the policy. If the forwarding class of the packet does not match a rule of the rewrite-rule policy, the DSCP value is changed to 0.
On 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 systems, if no DSCP rewrite-rule policy is applied to a subinterface, the incoming packet's DSCP remains unchanged at egress.
Example:
The following example creates a rewrite-rule policy:
For information about how to apply a DSCP rewrite policy, see section 15.6.3.2.
On a 7720 IXR-D2 and D3, you can use a classifier policy to classify ingress packets received from any remote VXLAN VTEP. The policy applies to payload packets after VXLAN decapsulation has been performed.
Example:
The following example shows how the DSCP classifier policy created in the example in section 15.6.1.1 can be used for VXLAN traffic:
Queue-template are groups of configuration information that apply to a set of queues. On 7250 IXR systems, the controlled set of queues are VOQs; on 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 systems, the controlled set of queues are egress queues.
The maximum number of queue-templates per system varies by platform. On 7250 IXR systems, the maximum is 8 queue-templates; on 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 systems, the maximum is 63 queue-templates.
The following parameters are configurable inside a queue template:
If a queue (VOQ or egress queue) does not have a queue-template binding, it inherits the settings of a default queue-template. The default queue-template has a platform-specific MBS default value, no defined queue utilization thresholds, no WRED slopes, and no ECN slopes. The default queue-template cannot be displayed, but its effect is visible by reading the state of individual queues that lack a queue-template binding.
Example:
The following example creates a queue template that you could use for any of the following:
![]() | Note: This example is only the starting point of a full configuration. Subsequent sections build on this example to create a full configuration. |
In a queue-template, the maximum-burst-size parameter sets the maximum length of an egress queue or set of VOQs. The queue depth is also known as the Maximum Burst Size (MBS). The maximum-burst-size parameter must be configured with a non-zero value in order to configure WRED slope and ECN slope parameters.
On the 7250 IXR, the maximum-burst-size parameter applies to a set of VOQs. If the parameter is not configured, or is set to 0, the effective MBS of these VOQs is 256MB.
On the 7220 IXR-D2 and D3 or the 7220 IXR-H2 and H3, the maximum-burst-size parameter applies to a set of egress queues. If the parameter is not configured or is set to 0, the effective MBS of these egress queues is calculated based on a fair allocation algorithm. You can assign a non-zero MBS value to multicast queues, but this configuration is not recommended (especially if multicast traffic is being shaped by configuring peak-rate-percent), since it can lead to a shortage of multicast-related buffering resources on 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 systems.
Example:
The following example specifies the queue depth with a set maximum-burst-size:
In a queue template, WRED policies can be configured to handle congestion when queue space is depleted. Without WRED, once a queue reaches its maximum fill size, the queue discards any packets arriving at the queue (known as tail drop).
WRED policies manage queue depth. They help to prevent congestion by starting random discards once the queue reaches a user-configured threshold value. This avoids the impact of discarding all the new incoming packets. By starting random discards at this threshold, an end-system can adjust its sending rate to the available bandwidth.
The WRED curve algorithm is based on two user-configurable thresholds (min-threshold-percent and max-threshold-percent) and a discard probability factor (max-probability).
On the 7220 IXR-D2 and D3 or the 7220 IXR-H2 and H3, a WRED slope can be configured to apply only to TCP or to non-TCP traffic. This can be useful because TCP has built-in mechanisms to adjust its sending rate in response to packet drops. TCP-based senders lower the packet transmission rate when some of the packets fail to reach the far end.
Example:
The following example specifies a WRED slope for low drop probability traffic flowing through a set of VOQs on a 7250 IXR. This WRED slope applies to both TCP and non-TCP traffic.
Example:
The following example specifies a WRED slope for TCP traffic that is classified as low drop-probability flowing through an egress queue on the7220 IXR-D2 and D3 or the 7220 IXR-H2 and H3.
Some IP applications support the ECN mechanism. With ECN, IP packets originated by such applications are not discarded when they enter a congested queue; instead, they are marked in a special way. The marking uses the two ECN bits in the traffic class field of the IPv4 or IPv6 packet header. The receiver of IP packets marked as having experienced congestion can signal to the sender (through Layer 4 or higher protocols) that it should reduce its sending rate. The advantage of this feedback mechanism is that the sending rate can be dropped more gradually than the normal response of a TCP sender to packet discards. A more gradual back-off can result in higher effective throughput in the network.
An ECN slope is similar to a WRED slope. It is based on two user-configurable thresholds (min-threshold-percent and max-threshold-percent) and a marking probability factor (max-probability).
On the 7220 IXR-D2 and D3 or the 7220 IXR-H2 and H3, you can have one ECN slope per drop-probability level of traffic flowing through an egress queue. On the 7250 IXR you can only have one ECN slope per queue and it applies to all drop-probability levels.
For an ECN slope to be used, explicit-congestion-notification must be configured. On 7250 IXR systems, this configuration also requires you to specify an ecn-dscp-policy; this is the DSCP rewrite policy that will be used when an ECN field rewrite must be performed.
Example (7250 IXR):
The following example specifies an ECN slope applicable to a 7250 IXR system:
Example (7220 IXR-D2 and D3 or 7220 IXR-H2 and H3):
The following example specifies an ECN slope applicable to a 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 system:
If you apply a DSCP classifier policy to input traffic on a subinterface, incoming packets are evaluated against the policy, and matching packets are assigned to the forwarding class and drop probability specified by the policy. If no classifier policy is applied to the subinterface, the system default DSCP classifier (with the reserved name default) is used.
![]() | Note: On 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 systems, separate classifier policies for IPv4 and IPv6 traffic are not supported, but you can apply a common policy that applies to both IPv4 and IPv6 traffic. |
Example (7250 IXR):
The following example applies a DSCP classifier policy to inbound IPv6 traffic on a subinterface with a 7250 IXR system:
Example (7220 IXR-D2 and D3 or 7220 IXR-H2 and H3):
The following example applies a DSCP classifier policy to inbound traffic on a subinterface with a 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 system:
When a rewrite-rule policy is applied to output traffic on a subinterface, outbound packets are evaluated against the policy. All packets, with some exceptions, are subject to DSCP remarking by this policy. If no rewrite-rule policy is applied to the subinterface, the DSCP marking of the traffic leaving the subinterface is unchanged, unless it is ECN-capable traffic forwarded by a 7250 IXR system or VXLAN traffic originated by a 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 system. For these exceptions, DSCP may be remarked even in the absence of a rewrite-rule policy applied to the egress subinterface. Note that on all platforms the DSCP marking of self-generated traffic is not affected by rewrite rule policies.
![]() | Note: Separate rewrite policies for IPv4 vs IPv6 egress traffic are supported on 7250 IXR systems. Common rewrite policies that apply to both IPv4 and IPv6 traffic are supported on 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 systems. |
Example (7250 IXR):
The following example applies a rewrite-rule policy to outbound IPv4 traffic on a subinterface with a 7250 IXR system:
Example (7220 IXR-D2 and D3 or 7220 IXR-H2 and H3):
The following example applies a rewrite-rule policy to outbound traffic on a subinterface with a 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 system:
Each unicast queue and each multicast queue of an egress port is associated with a scheduler node. The mapping of queues to scheduler nodes is platform-dependent and cannot be configured.
On 7250 IXR systems, there are two scheduling nodes per port; one for unicast traffic and one for multicast traffic. The two scheduling nodes have a WRR relationship, but the parameters cannot be adjusted. There is one PIR scheduling loop per scheduling node. The scheduling loop serves the strict priority classes first (in descending order of FC), and then the WRR classes (by weight), limiting each forwarding class to its PIR (expressed as a percentage of the egress port bandwidth). By default, the PIR of each forwarding class is 100%. Note that multicast traffic handled by the multicast scheduler node is unscheduled and is not subject to the ingress VOQ buffering that applies to unicast traffic.
On 7220 IXR-D2 and D3 systems, the unicast queue and multicast queue for a particular forwarding class make up a queue pair. Each of the eight possible queue pairs of an egress port are associated with a scheduler node. Each scheduler node is served as strict priority (SP) or weighted round robin (WRR). If it is served as WRR, then the scheduler node also has an associated weight. The scheduling loop serves the SP nodes first and then the WRR nodes by weight. The serving order of SP queues is in descending order of FC: fc7 first, then fc6, then fc5, and so on.
On 7220 IXR-H2 and H3 systems, there is a one-to-one mapping of queues to scheduler nodes. Each scheduler node can be served as SP or WRR. A WRR node has a configurable weight. The scheduling loop serves the SP nodes first and then the WRR nodes by weight. The serving order of SP queues is as follows:
The following examples configure a queue or scheduler node for strict priority. Note that when strict priority is set to false, the associated queue or scheduler node is configured as WRR. When strict priority is set to true, any configured weight is ignored.
Example (7250 IXR):
Example (7220 IXR-D2 and D3 or 7220 IXR-H2 and H3):
The following examples configure a queue or scheduler-node for WRR. Queues or scheduler nodes that are not configured with a specific weight have a weight of 1.
Example (7250 IXR):
Example (7220 IXR-D2 and D3 or 7220 IXR-H2 and H3):
The following example sets the maximum percentage of port bandwidth that is available to traffic of a particular forwarding class. By default traffic belonging to any FC can use up to 100% of port bandwidth. The example is applicable to 7250 IXR, 7220 IXR-D2 and D3, or 7220 IXR-H2 and H3 system.
Example:
Buffer utilization differs between the 7250 IXR, 7220 IXR-D2 and D3, or 7220 IXR-H2 and H3. See Table 18.
Hardware | Buffer memory |
7250 IXR |
|
7220 IXR-D2 and D3 |
|
7220 IXR-H2 and H3 |
|
The following examples show overall buffer usage. Depending on the hardware deployed, the output will vary.
Examples:
The following example shows overall buffer usage on a 7250 IXR system:
The following example shows overall buffer usage on 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 systems:
To display traffic statistics for each output queue on an interface, use the show interface <id> queue-detail command in running or candidate mode.
Example:
The following example displays output queue statistics for an interface on a 7250 IXR system. The output on a 7220 IXR-D2 and D3 or 7220 IXR-H2 and H3 system is similar, but will show slightly different information.
You can reset the queue statistics counters for an interface.
Examples:
The following example resets all statistics counters on an interface:
The following example resets statistics counters for a specified egress queue (multicast) on an interface:
A QoS profile resource refers to the number of classifier and rewrite policies that are applied to interfaces on a line card. Each classifier or rewrite policy that is applied to an interface on a line card counts as one profile resource used.
For example, if you create classifier policy dscp1 and apply it to input IPv4 traffic on an interface, and apply the same dscp1 policy to input IPv6 traffic on a different interface on the same line card, it counts as two classifier profile resources used.
The SR Linux supports up to 15 classifier profile resources and up to 32 rewrite profile resources per line card. You can display the number of QoS profile resources in use for each line card.
Example:
The following example displays the number of used and free classifier and rewrite profile resources for a line card: