I was asked about the difference between RSVP and CSPF, and I thought: “hmm, that would probably be like comparing a car’s engine with the car’s GPS”. You definitively need the engine to go anywhere, but if you are like me you definitely also need the GPS to get anywhere without getting lost or arriving 30 minutes late! LOL!
CSPF and RSVP are two different things and yet, they work closely alongside each other, and are often mentioned together in documents related with MPLS Traffic Engineering. I can see why someone could get confused.
But think of them as the GPS and the engine: “CSPF calculates the route to the destination, and RSVP sets up a path following that route, so that traffic can be forwarded.”
I’ll go over the details of how this happens, but as always, before we jump into the details, let’s review some basics and get some context, so that things make sense. If you already know these basic concepts, just bear with me for a little, we’ll get to the more advanced stuff.
First two concepts we need to talk about are:
- MPLS (Multiprotocol Label Switching), and
- LSPs (Label Switched Paths):
What is MPLS (Multiprotocol Label Switching)?
MPLS is in essence a way to forward traffic across a network, without having to do layer 3 lookups. Originally, the benefit of MPLS was fastest packet processing and scaling, which is still true, but today MPLS is the forwarding mechanism that enables Service Providers to offer a variety of services, including L3VPNs (Layer 3 Virtual Private Networks), L2VPN (Layer 2 Virtual Private Networks), EVPN (Ethernet VPN), VPLS (Virtual Private LAN Service, EVPN, and IPv6, all over the same IPV4 infrastructure.
But perhaps the major benefit of using MPLS for traffic forwarding is the ability to perform TE (Traffic Engineering), which as the name implies is the ability to engineer (define) how traffic will be forwarded across the network beyond simple routing protocol metrics. We will cover TE extensively later, and you will see then how CSPF comes into play.
But how does MPLS give a Service Provider this flexibility?
The answer is actually fairly simple: by enabling devices in the core of the network to make forwarding decisions based on a label attached to packets, instead of having to perform a L3 lookup for a matching route. This also implies that the routers in the core of the network do not need to learn all the routes that edge devices will have from the customers, but just need to know how to forward traffic based on these labels.
As an example, consider the following network, and think of R1, R6 and R7 as the edge, and R2-R5 as the core.
When R1 has traffic to send to 10.6.6/24, it looks at its IPv4 routing table (inet.0), and finds an entry that specifies that traffic for this destination should be sent to R2, with an MPLS label = 100.
R2 will receive traffic with label =100 and will know that it needs to send it to R3 with an MPLS label = 200, and so on.
R2, R3, and R4 do not check inet.0. They are receiving traffic with an mpls label, so they look at a different table (mpls.0) which contains mappings between incoming label, and outgoing interface and label. No L3 lookups!
If a second prefix were added behind R6, the only thing needed to allow traffic to reach this new destination from R1, would be an additional route in R1‘s routing table. Nothing else. The labels already in place would take care of the rest.
How does this MPLS label look like?
What we commonly refer to as an MPLS label is actually a fixed length shim header that is added (pushed) to the packet between the L2 and L3 header, as it enters the network. In other words: at the edge (In our example at R1)
This header consists of the following fields:
– Label: the actual label value that we think of when someone says MPLS label.
– Traffic Class: formerly known as Experimental Bits. Used for class of service as defined in RFC5462
– Bottom-of-stack bit: Which indicates whether this label is the last label or not, since multiple labels can be added to a single packet at a given time, as shown in the diagram below. As an example, packets for an L3VPN have two labels: a VPN label and a transport label. We will see more details about this too.
– Time to live: Which is decremented at every hop, and by default is a copy of the IP TTL.
So, we know now how the labels look like, and that the routers in the core of the network do not need to perform L3 lookups but check their mpls table to determine the outgoing interface and label, based on the incoming label.
We also heard of this term Traffic Engineering. The ability to assign labels and to map incoming labels with outgoing labels and interfaces, in a certain way, is what gives us the ability to perform traffic engineering.
In a topology like this:
Without MPLS and TE, the routers would be performing L3 lookups, and would be relying on an IGP such as ISIS, to learn where destinations are and how to forward traffic.
In the example above, any traffic for either 10.6.6/24 and 20.6.6/24 behind R6, would follow the upper path going through R2, R3, and R4, because the ISIS metrics are lower across this path.
As a result, the links on the top could get congested, while the links at the bottom are underutilized or not used at all.
Playing with the metrics would not help, as it would just move ALL traffic from one interface to another.
What we would probably want to happen is: traffic going to 10.6.6/24 be sent along the upper path, and traffic going 20.6.6/24 along the bottom path. Yeap! MPLS can help us achieve this. We would of course need proper labels and mappings along the paths, and on the ingress router (R1) proper routing information.
This is what traffic engineering is at a high level!
But, how are these labels assigned? How do R2–R5 know what to do when labels 100 or 200 come in?
And how are labels mapped to prefixes or destinations (both L2 and L3)? How does R1 know that to send traffic to 10.6.6/24 it needs to use label 100, and to send traffic to 20.6.6/24 it needs to use label 200?
Well, these assignments and mapping can be done manually (maybe you don’t want to, but you can), or dynamically using protocols such as BGP, LDP or RSVP, or a combination of them. It depends on the type of service that you are trying to build (L3VPN, VPLS, EVPN…), what level of TE you need, and scaling.
We are going to use an L3VPN as an example.
Layer 3 Virtual Private Networks (VPNs), and labels.
Here is our example scenario:
We are a Service Provider and our customer has two sites which need to exchange traffic.
The customer connects to our network using the routers referred as Customer Edge routers (CE1 and CE2). The routers that the CE devices connect to are referred to as Provider Edge routers (PE1 and PE2), and the routers in the core of our network are just Provider router (P1, P2, and P3).
The customer advertises his SITE prefixes to the local PE routers, and expects those prefixes to be received on the other side.
For instance, when CE1 advertises prefix 172.20.1/24 to PE1, the customer expects this prefix to be advertised to CE2 from PE2, so that CE2 knows how to forward traffic to 172.20.1/24.
And that means PE1 will need to advertise that information to PE2 somehow. How is this routing information going to be propagated across the network?
Also, once the routes are propagated, the customer expects traffic sent from SITE1 (from CE1 to PE1) and destined to SITE 2 to be delivered on the other side (from PE2 to CE2).
PE1 and PE2 have the routes and know how to forward the traffic, but are the routers inside the Provider’s network (P routers) going to also learn the customer’s prefixes, and forward the traffic based on destination address?
The short answers for the previous two questions:
- PE1 and PE2 will be exchanging customer routes using IBGP
- No, as we described before, the P routers will NOT have the customer routes, and will be forwarding traffic based on MPLS labels.
What we need to understand now, is how these MPLS labels are assigned, and how they are mapped to the customer, and to the customer routes. Basically, we need to understand now how the entries on inet.0 and mpls.0 are created across the network.
We are going to add some additional details to our scenario to make it more interesting:
We now have two customers with two sites each, which are connected to the same PE routers (PE1 and PE2).
The customers are using the same subnets for SITE 1 and SITE2, 172.16.10/24 and 172.16.20/24, and our job is to provide communication between their sites.
To provide this connectivity, we are going to build an L3VPN (Layer 3 Virtual Private Network) for each customer.
We need to take care of communication between the CE and the PE routers first.
Because we need to keep the customer’s routing information and traffic separated, both for privacy and to avoid conflicts caused by the customers using overlapping address spaces, we start by creating a routing-instance for each customer on the PE routers. This will create a separate routing table and corresponding forwarding table for each of them.
The interfaces on the PE router that connects to the customer’s CE routers will be placed in the corresponding routing-instance. As a result, the local and direct routes for these interfaces will be placed in the corresponding routing table.
NOTES: for now, I am going to make the routing-instances virtual-routers, but later we will see that we will need to change that and add additional attributes to the instances.
To exchange routing information with the customers, we configure EBGP within each routing instance. Any BGP route that we receive from CE1 or CE2 will be placed in the corresponding routing table, and be separated from the other customer’s routes.
In our example, the customer prefixes are configured under the loopback interface of CE1 and CE2, and are redistributed into BGP using a policy.
Here are the relevant configuration statements for CE1, CE2 and PE1.
We check that the BGP sessions have been established, and that we are receiving routes from the CE devices:
In order to propagate the customers routing information across the network, as we said before, we are going to use IBGP.
We don’t want to have multiple IBGP sessions between the PE routers to propagate the customers routes. In other words, we don’t want to have an IBGP session between the PE routers to advertise the routes from customer 1, and another IBGP session to advertise the routes from customer 2. We will have a single IBGP session!
This IBGP session will be configured OUTSIDE of any customers’ routing instance. Essentially, we are creating this single IBGP session that will be shared to advertise information for both customers, or any other customer that is added in the future.
But if we have a single IBGP session, how do we keep the routes that belong to each customer separated, and how do we make sure that the receiving router places the routes in the correct routing table?
Let’s figure that out.
ISIS, MPLS, and RSVP are already configured on all routers, and PE1 and PE2 have routes to reach each other’s loopback interface via ISIS:
We create a single IBGP session between the loopback interfaces of PE1 (10.100.100.5) and PE2 (10.100.100.4).
The session came up, but NO routes are being advertised by either side.
We need to configure this BGP session to advertise the routes present in the Customers’ routing instances. We also need the routes that we received via this BGP session to be installed in the Customers’ routing instances.
Think about it for a moment: we need a BGP session that is NOT configured within any routing instance, to advertise routes that belong to routing instances, and to install received routes into those routing instances.
A few configuration changes are in order within the routing instances and under the IBGP session:
1) Routing instances type and additional attributes:
Here is where we go back to the routing-instances we created before, and make the changes I told you we would need:
We are changing the instance type to vrf, and adding a vrf-target, a route-distinguisher and the command vrf-table-label, under each routing instance:
NOTE: Routes will only be accepted if there is a routing instance with a matching route target configured.
I think I can probably write an article with more examples about the route-target and route-distinguisher, and also about vrf-table-label, which I know can be confusing, but for now:
– route-distinguisher = prefix modifier that makes it unique.
– route-target = tag (a BGP community) that will tell the receiving side which routing table a route should be installed in.
– vrf-table-label = configures the router to assign a VPN label per VRF instead of per interface.
A route distinguisher converts an IPV4 prefix into a VPN-IPV4 prefix, that is advertised as a inet-vpn unicast route (BGP family inet-vpn unicast NLRI)
This will make it unique even if there is address overlapping:
The route target on incoming VPN-IPV4 routes, is compared with configured route-targets on the receiving router:
The vrf-table-label assigns a label per VRF, instead of per interface as shown:
2) Address family under BGP:
This will configure BGP to advertise the VPN-IPv4 prefixes.
Back to our scenario, once we add these attributes under the routing instances, the routes advertised by BGP from PE1, for example, will look like this at a high level:
The receiver will treat these routes as TWO distinct routes because they are TWO different VPN-IPV4 prefixes.
Each route then will be installed into the proper VRF routing-instance based on the matching vrf-target community.
If you read my article about rib-group use cases, you might remember that I mentioned that policies are created automatically to make these advertisements and route installation happen. So, all you really need is:
- family inet-vpn under BGP, and
- instance-type vrf, the route-target and the route-distinguisher under each routing instance.
If you wanted to, you could configure your own policies to be more specific, but right now we are letting the routers create the policies for us.
We can see that routes are being exchanged now when we use the show bgp summary command.
Notice the new routing tables listed in the show bgp summary command. There is a bgp.l3vpn.0 table where ALL routes matching ANY configured route-target are placed, and the customer specific routing tables where only the routes matching the specific router-target configured under each routing instance are placed.
We can also see that both routers are advertising the direct and BGP routes present on each routing instance, when we enter the show route advertising protocol command.
And if we use the extensive option, we can validate that the routes have the correct route distinguisher, and route target.
In this last command output, I want you to pay attention to the highlighted field. Each route was assigned a VPN label:
Which translates into:
When traffic arrives the receiving PE, PE1 in this example, will know which routing table to search in for a matching route so that traffic can be forwarded:
Let’s take a look at how these routes look like on the other side (PE2).
We try the show route receive-protocol command and find that no route is displayed, however we notice that there are some hidden routes. You might have noticed that when we were looking at the advertised routes before.
So we try again, this time including the hidden option, and we find our routes:
But why are they hidden?
Checking the details for one of the routes will provide the answer:
As you might know, an unusable BGP next-hop is a next-hop that cannot be resolved to a physical (directly connected next-hop).
If you know the BGP advertisement rules, and attributes, your first instinct might tell you: “oh! I need a policy to change the next-hop on PE1, before it sends it to PE2 because it is IBGP, and IBGP does NOT change the next-hop…”
But if you look at the protocol next-hop in the output above you will realize that the next-hop was changed! This happens without you creating a policy with the next-hop self action, because it is a L3VPN route. So, that is not the reason for the next-hop being unusable.
The router is trying to resolve the BGP next-hop 10.100.100.5, and it is not finding a route, but we know for a fact, there is a route! 10.100.100.5 is the loopback of PE1, and there is an ISIS route with next-hop = 10.100.34.1 (P3) in the routing table:
The BGP session would not be up if we didn’t have this route to start with.
Well, while it is true that we have a route to 10.100.100.5, this route is in inet.0.
The route that the router needs to resolve the next-hop must be in inet.3. That is because the route is an inet-vpn route. One way to understand and remember this requirement is this:
The route is an L3VPN route, that means it is a route for a destination that will be reachable across an MPLS network. In other words, the packets needs to be sent with a label!!! A label that is meaningful for the P routers that will be forwarding the traffic across the service provider’s network.
inet.3 contains routes created by LDP, RSVP, or manually and these routes contain labels.
Also, think about it this way, if PE2:
- Resolved the route’s next-hop using the ISIS route currently present in inet.0,
- Installed the route with next-hop = P3
- Sent packets going to 172.16.10.1 to P3, without a label, or even with the VPN label that was advertised by PE1 via BGP, would P3 know what to do with these packets?
NO, it would NOT!!
The routes that were advertised by CE1 to PE1 were NEVER redistributed into ISIS. And label 299776 was advertised by PE1 to PE2, and only to PE2, using BGP. The P routers are NOT even running BGP, and label 299776 only has any meaning when the traffic arrives at PE1.
So, nope! P3 has NO IDEA how to forward traffic to 172.16.10.1, nor has ANY CLUE what label 299776 means!
So, guess what, if PE2 sent packets without a label, or with the VPN label, P3 would just throw it away!
Bottom-line is: we need a route in inet.3! We need a route with a label that has some meaning for P3. A label that tells P3, to send this packet to PE1. Remember the mpls.0 table, that maps incoming labels with outgoing label and interfaces? P3 need to have an entry on this table, and PE2 needs to know what label P3 is expecting.
The path with allocated labels at each node, where each node has mappings in mpls.0, is called a Label Switched Path, and it has to be built!
In our example, we will have an LSP named PE2-to-PE1, starting at PE2, going across P3, and terminating at PE1. For this LSP we will call PE2 the ingress, PE1 the egress, and P3 a transit router.
Once everything is in place, traffic going to the customers behind PE1 will flow downstream from PE2 to PE1 (from ingress to egress).
Let’s say that an LSP was built by RSVP as shown below.
P3 now has a route in mpls.0 that says: any traffic coming in with label 299856, should be sent to 10.100.34.1 (PE1) with no label (action is pop).
And PE2 now has a route for the loopback interface of PE1 (10.100.100.5), installed in inet.3 that says: to reach 10.100.100.5 traffic should be sent to 10.100.35.1 (P3) with a label of 299856.
The routes that we were receiving from PE1, and were hidden have a BGP next-hop = 10.100.100.5
So, the next-hop for the route is 10.100.100.5. The egress of the LSP is 10.100.100.5 (route in inet.3). Bingo!
PE2 can now resolve the BGP next-hop using the route in inet.3, and can install the customers routes, in the customers’ VRF routing tables, as active routes (no longer hidden).
This final route in customer1.inet.0, for example, says: to reach 172.16.20/24, send the traffic to 10.100.35.1 with two labels, inner label = 299776, and outer label = 299856.
The outer label will allow forwarding the traffic across the network from PE2 to PE1. The inner label will be used by PE1 to determine which customer this packet belongs to.
Once the LSP has been established (labels have been propagated, and routing table entries have been created along the path), BGP has advertised the VPN-IPv4 route, and the next-hop resolved, here is how traffic will be forwarded:
1) A packet is received by PE2 from CE2 customer1.
- PE2 performs a route lookup in customer1.inet.0.
- The table indicates that the packet should be sent to P3 with two labels (inner label = 299776, and outer label = 299856).
2) The packet is sent with two labels from PE2 to P3
- The packet is received by P3.
- P3 looks at the outer label and consults its mpls.0 table.
- The table indicates that any packet that comes in with a label of 299856, should be sent out to PE1, and that the outer label must be removed (pop), leaving only the inner label.
3) The packet is sent to PE1 with only the inner label 299776
- PE1 receives the packet with the remaining label and consults its mpls.0 table
- The mpls.0 table indicates that the label must be removed (pop), and a route lookup must be performed using Customer1.inet.0
- The Customer1.inet.0 table indicates that the packet should be sent to CE1 Customer1.
4) The packet is sent to CE1 without any labels.
Now, why is P3 removing the outer label?
Have you heard of PHP (Penultimate Hop Popping)? The router second to last (penultimate), in this case P3, removes (pops) the label before sending the packet to the last router. This is the default behavior but is configurable.
And could the path be established across P1 and P2 instead?
Sure, remember traffic Engineering? Will cover that soon.
Would that affect our L3VPN?
Nope! What changes is the path followed by traffic across the core of the network. The LSP will be installed in inet.3 the exact same way. There will be no changes to the IBGP route advertisement. The next-hop resolution would be the same. There will be a route for the customer prefix in customer.inet.0, and so on.
The routing information along the path will look like this:
In this case, P2 has a route in mpls.0 indicating that any packet received with a label of 299856, has to be sent to P1 with a label of 299792 (label swap), and then P1 has a route in mpls.0 indicating that any packet coming in with label 299792, has to be sent to PE1, and that the label should be removed.
Traffic will be forwarded along this path as shown below:
Let’s now get into the details of:
- How the labels are assigned
- How they are mapped together so that one router knows what the next-router along the paths is expecting
- What determines the sequence of routers used to build this path (e.g. through P3? or through P2 and P1?)
Here is where we start talking about label distribution, and creating LSPs, using either LDP, or RSVP, and here is also where we will get into what CSPF is.
Label Switch Paths (LSPs) – Label distribution
STATIC LSPS:
LSPs can be created manually. That’s right manually! That means:
Going to the ingress router, in this case PE2, and manually creating the route in inet.0 that says: to go to PE1, send traffic to P2 with a label of 1009793 or whatever.
Then going to P2 and manually adding a route to mpls.0 saying that traffic coming with label 1009793 should be sent out of interface ge-0/0/0, with next-hop P1 and label 1009712, and so on.
I have a say at every single hop!
I guess good for an emergency! But: Tedious! Prone to errors! Not scalable! Say no more!
DYNAMIC (OR SIGNALED) LSPS:
A dynamic LSP is created by a protocol that assigns/distributes labels between routers.
Initiating this process could be a simple as turning on the protocol on all the routers, and seeing the labels get assigned (LDP), or as complex as telling the ingress router, which routers you want to be followed, how many hops you can accept, what kind of links you want to include, how much bandwidth you want for the LSP, the LSPs priority, whether you want protection, … (RSVP).
You could also use BGP-LU (BGP Labeled Unicast), for much more complex scenarios, larger scaling, or inter-provider/inter-domain (IGP boundaries) MPLS applications. Not what we are going to talk about in the rest of the article but worth mentioning it!
Also, for dynamic label distribution, there are different distribution modes:
Downstream vs. Upstream:
Are the labels distributed upstream (ingress to egress), or downstream (egress to ingress)?
On Demand vs Unsolicited
Are labels assigned upon request? Or without request?
Ordered Control vs Independent control
Does the router advertise a label after receiving a label from a neighbor downstream or when it is the egress? OR does it advertise a label when it learns a new prefix without waiting?
Which type of label distribution is implemented by LDP and by RSVP? Let’s find out.
LDP – Label Distribution Protocol
This is the easiest way to distribute labels, create LSPs, and add routes to inet.3. I always describe it to people as: “Turn it on! Let it run!”. You might have heard me saying that before.
LDP establishes label-switched paths (LSPs) by mapping network-layer prefixes to labels, and distributing those labels.
In Junos , the label distribution is initiated by the egress router (the owner of a prefix) which advertises a label for that prefix to its neighbors. Then its neighbors map a label to that prefix, and advertise it to their neighbors, and so on.
The result of this process is a tree of LSPs that converges on the egress router.
If we wanted to turn on LDP in our scenario right now, it would be as easy as going to every router and typing:
We would also need to enable MPLS processing on the interfaces with:
Of course, there are some additional things that we could configure to tune up the protocol behavior but the above commands are sufficient to get things going.
After enabling LDP everywhere, we would see how the routers quickly discover their neighbors, establish LDP sessions and exchange labels.
And before you know it, they will have routes in inet.3 for the primary address of the loopback interface of all the other routers in the topology:
In Junos, LDP follows the Downstream-unsolicited, ordered control mode by default:
- Junos can only generate labels for routes it is an egress router for, i.e. directly connected routes.
- By default, a router only binds a label to the primary address of its own loopback interface, and advertises it to its neighbors.
- The router thus becomes the egress router for the LSPs that will be established in the opposite direction and towards its loopback interface.
- A Juniper device will never create a label for a prefix it does not own! For instance, if your router learns about prefix 10.2.2/24 from its neighbor R2, via ISIS, your router will NOT bind and advertise a label for that prefix unless R2 advertises a label first!
Always keep in mind that in Junos, an LDP router ONLY initiates the label distribution for the primary address of its loopback (lo0.0) interface, by default.
The primary address is the LOWEST address configured under the interface unless explicitly defined with the primary knob.
In our example, PE1’s loopback interface is configured with the address 10.100.100.5, and is currently advertising a label for that address.
If we added another address under the lo0.0 interface on PE1:
Because this address is lower than the 10.100.100.5, it becomes the primary address of lo0.0 and it gets advertised immediately AND instead of 10.100.100.5
And…. what is the problem with that???
Well, remember that the BGP next-hop is still 10.100.100.5, so guess what? Changing the primary address of the loopback breaks our L3VPN if we are using LDP. Oh, ooops!
We could change this behavior with a policy (an LDP egress policy). But we are not going to cover that right now.
Let’s just remove the new address, and focus once again on the 10.100.100.5 address, the BGP next-hop of our customers routes.
Labels for the 10.100.100.5/32 prefix are assigned and advertised upstream hop by hop as shown below:
1) PE1 assigns a label to its loopback interface address, and advertises it to P1 and P3.
NOTE: our router is advertising label = 3 which is the implicit null label that instructs the neighbor router to pop the label before sending traffic. In other words, the router is requesting PHP.
2) P1 and P3, take note of the labels advertised to them by PE1, assign a label to this prefix, and advertise it to P2, and PE2, respectively.
3) P2 takes note of the label advertised by P1, assigns a label to the prefix and advertises it to PE2
As the labels propagate from hop to hop, each router adds routes to their routing tables (mpls.0, inet.0, inet.3, and customer.inet.0) as shown:
1) P1 and P3 routes to 10.100.100.5:
These routes show no label, because PE1 advertised label = 3 (implicit null) to them.
2) P2 route to 10.100.100.5
P2 received a label of 299865 from P1 for prefix 10.100.100.5/32, thus the route includes this label. The router will push a label of 299856 when sending traffic to P1 for destination 10.100.100.5.
When P1 receives a packet with a label of 299856, it will pop the label, and send the packet to PE1:
3) PE2 route to 10.100.100.5:
The route selected by PE2, points towards P3, with a label of 299856,
As a result, the router will push a label of 299856 when sending traffic to P3 for destination 10.100.100.5.
When P3 receives a packet with a label of 299856, it will pop the label, and send the packet to PE1:
4) PE2 route to 172.16.10/24:
Now that PE2 has a route to 10.100.100.5 in inet.3, as we have seen before, it can resolve the next-hop of the customer routes:
Some of you might wonder why, even though PE2 received labels for 10.100.100.5 from both P2 and P3, it chose the one from P3.
The reason is: LDP follows the IGP!
If you check the routes in both inet.0 and inet.3 for 10.100.100.5/32 you will see how both routes are using ge-0/0/0.0, and the next-hop is 10.100.34.1.
LDP relies on the decisions made by ISIS. If I change the metric of interface ge-0/0/0.0 under ISIS:
Immediately the routes in both inet.0 and inet.3 switch to interface ge-0/0/2.0.
Though, that will affect the metric of OTHER routes as well. In this case, the routes to P3 would also be affected:
Again, changing the metrics potentially changes the decisions made for multiple destinations. We have no way to override what ISIS is deciding or to tell LDP to do something different, for specific destinations.
This highlights one of the main limitations of LDP: NO TRAFFIC ENGINEERING!!!
Which of course becomes one of the main advantages of RSVP: IT SUPPORTS TRAFFIC ENGINEERING!!! YES! Let’s talk about what you all have been waiting for….
RSVP – Resource Reservation Protocol
You can think of RSVP as the opposite of LDP.
First of all, the idea of just turning the protocol on, and boom! here are your labels, does not apply.
RSVP implements the Downstream-on-demand, ordered control label distribution mode. Thus, labels are NOT distributed unless requested, and until the neighbor downstream has assigned a label. We will see this in more detail. But the point is: you need to configure the LSP so that you router requests labels to be allocated along the path to the destination (egress), using RSVP.
But what is RSVP?
Originally, RSVP was developed as a generic resource-reservation protocol, with the ability to request resources to be allocated along a specific path for a specific flow. It was built with QoS in mind!
The protocol’s details are described in RFC 2205, which introduces it as: “used by routers to deliver quality-of-service (QoS) requests to all nodes along the path(s) of the flows and to establish and maintain state to provide the requested service”. Clearly states that the protocol’s purpose in life was to build a path that could guarantee a service level for flows.
In the case of RSVP for MPLS, the allocation of resources includes the labels that will be used to forward the traffic.
RSVP is not a routing protocol and relies on the underlying routing protocol to define the path that its messages should follow to allocate resources, and that will be followed by the traffic flows afterwards.
RSVP is not a data transport protocol either and only works in the control plane. Again, it sets the path that will be followed by the traffic, including the labels that need to be added to the traffic so that it actually follows that path.
RSVP is essentially a signaling protocol!!!
Well, so is LDP, isn’t it?
Yes, BUT RSVP has a key advantage over LDP, that makes it ideal as a signaling protocol for MPLS applications: Extensibility!
Like other protocols such as ISIS, RSVP encodes information inside containers referred to as TLV (Type Length Value) objects. New objects can be defined to carry additional information, which is exactly what was done to make RSVP meet the requirements of MPLS services.
Using these additional objects, RSVP can describe to the nodes along the path the desired characteristics of the LSP being requested, and can provide responses confirming the acceptance of the requested attributes, and advertising the labels that have been allocated.
RSVP uses PATH messages which travel downstream from ingress to egress, and RESV (reservation) messages which travel upstream from egress to ingress.
Think of a PATH message as a request: “Can you allocate labels, and bandwidth and create a state for an LSP to a.b.c.d named LSP-blah?”. And of course, think of the RESV message as the reply: “Yes, sure! Use LABEL=100”.
Inside these two messages there are some common objects, and some message specific objects. Some of the objects are mandatory, and some optional:
PATH MESSAGE | RESV MESSAGE |
---|---|
Mandatory Objects: | Mandatory Objects: |
SESSION OBJECT: LSP identification | SESSION OBJECT: LSP identification |
LABEL_REQUEST OBJECT: how a label allocation is requested. | LABEL OBJECT: the label that was assigned by the router |
STYLE OBJECT: The reservation style (fixed-filter, wildcard-filter or shared-explicit) | |
Optional Objects: | Optional Objects: |
RECORD_ROUTE OBJECT (RRO): The complete list of nodes that the PATH message is actually going through. | RECORD_ROUTE OBJECT (RRO): The complete list of nodes that the RESV message is actually going through. |
HOP OBJECT: Contains the previous hop IP address | HOP OBJECT: Contains the previous hop IP address |
EXPLICIT_ROUTE OBJECT (ERO): a list of devices that the LSP MUST go through. Nodes (hops) in this list can be specified as: Strict Hop = This address must be touched next (directly connected hop) Loose Hop = This address has to be touched (let the IGP figure out how to get there). The ability to define this path is one of the most important tools to implement Traffic Engineering. You might remember that with LDP I have not say on which path is used for an LSP? Well, I CAN do it with RSVP. | |
SESSION_ATTRIBUTE OBJECT: which defines additional characteristics such as the LSP priorities, whether we want fast reroute or not. | |
SENDER T_SPEC: requested bandwidth reservation | |
ADSPEC OBJECT: contains the lowest MTU of the interface along the path. Update hop by hop. | FLOWSPEC OBJECTS = copy of the ADSPEC value received by egress = lowest MTU along the entire path. |
NOTE: there are also messages to indicate failures or to tear down the LSP: PATH ERR, RESC ERR, PATH TEAR, and RSVP TEAR, but we are not going to discuss those today.
Now that we know what RSVP is, and the basics of how it works, let’s look at how we configure it in Junos.
CONFIGURING RSVP SIGNALED LSPs
We first deactivate LDP:
And right after that we see that our BGP routes are hidden again (you still remember those, right?):
Quick refresher: these routes are customer routes that were advertised by PE1 to PE2, and that are hidden because the next-hop cannot be resolved in inet.3. And also remember, the next-hop has to be resolved in inet.3 because these are vpn-ipv4 routes advertised using family inet-vpn unicast.
This time we are going to fix these routes using RSVP instead of LDP.
We need to enable RSVP on the interfaces first.
NOTES:
- enabling MPLs on the interface is also required but we already did that.
- RSVP, and MPLS also need to be enabled on all the P routes. Already did that.
We check that the interface are indeed running RSVP with the show rsvp interface command:
Right now the entire bandwidth of the interfaces is available as we have not set up any LSPs requiring bandwidth to be reserved. In fact, we have not created any LSPs at all!
We need to configure the LSP(s) that we want. They do not get created automagically like in LDP. Also, because LSPs are unidirectional, we need to configure LSPs from PE1 to PE2, and LSPs from PE2 to PE1 so that we can have bidirectional traffic.
We are actually going to configure two LSPs on each direction, so that we can look at other things later. Also, for now, we are going to disable CSPF. That way we can see the difference it makes, later on.
We create the LSPs under protocols MPLS as follows:
And now we check the status:
Humm, looking at the output of show mpls lsp transit command on the P routers, we notice that all 4 LSPs are going through P3. More on that in a minute.
Let’s take a look at the extensive output for one of these LSPs first:
Looking at the RRO we can tell the LSPs are going through 10.100.35.1 (P3) and then 10.100.34.2 (PE1).
P3 allocated label = 299776. PE1 signaled label = 3, the implicit null label which is requesting PHP to be performed. You can also see in the command output that Penultimate hop popping is implemented.
Another important detail you can get from the output is this: “Follow destination IGP metric”. This means that the path to reach the egress address of the LSP was found based on the IGP decisions.
When we configured:
we are telling PE1: “hey! I want you to build an LSP with egress = 10.100.100.4”. (no further instructions)
PE1 has to figure out how to get there, so it looks for a route to 10.100.100.4 and finds:
Then it sent an RSVP PATH message to P3 requesting the LSP to be set up.
P3 received the request, and also checked its routing table: “how do I get to 10.100.100.4?” and sent a path message to PE1.
PE2 accepted the request and responded with an RSVP message with label = 3.
P3 received the response from PE2 and allocated label 299776
The response is received by PE1 who now knows the LSP has been established and knows what label to use to send traffic to 10.100.100.4
The process followed to create LSP PE1-to-PE2_2 is the exact same, and being that we are using the IGP and the egress address is the same, the LSP will be set up following the exact same path, just with a different label assigned by P3:
In the opposite direction we configure:
And again, the routing table will be consulted along the way to establish the LSPs:
During the LSPs set up process RSVP states and routing table entries are created on all nodes, and it is kind of nice to see how they all match up as we compare them. Let’s take a look at that, using different options.
LSP status:
RSVP States:
Routing tables:
And BTW, now that we have a route for the other PE’s loopback interface our BGP route are NO LONGER HIDDEN!
OK, cool! No more hidden routes, but wasn’t that a lot of work for something that we solved super easy with LDP before.
True! But! Doesn’t it bother you that ALL 4 LSPs are going through the same path? And that the other path is being wasted? And that with LDP there is nothing we can do about and that routers will use the path with the best IGP metric period? RSVP (with some help) can take care of this, no problem!!
CONFIGURING ERO – Explicit Route Objects.
Here is what we are going to do now: We are going to tell RSVP which path we want each LSP to take.
PE1, will look at the ERO for LSP PE1-to-PE2_1, for example, and will find that the first hop in the list is 10.100.15.1. The router will then figure out how to reach 10.100.15.1. Yes, the router it still relying on the IGP, but it is looking for a route for the next-hop in the path that we defined, not for the egress.
The routing table indicates that 10.100.15.1 is reachable via ge-0/0/3.0 so the PATH message is sent out of that interface. The message will include the ERO object.
P1 receives the path messages, removes itself from the list, and looks up the address on the next hop in the list in its routing table.
P2 receives the path messages, removes itself from the list, and looks up the address of the egress router:
PE2 receives the path message and just as before: PE2 sends a RESV message back, this time to P2. P2 sends a RESV message to P1, and P1 sends a RESV message to PE1.
For the other the LSP the same process is being followed along the path through P3.
The results is that the two LSPs will follow two completely different paths, just as we wanted:
What can I do with that?
Imagine I have two prefixes behind PE2 (not related with the VPNs scenario): 150.1.1/24 and 150.2.2/24. I can use LSP PE1-to-PE2-1 to send traffic to 150.1.1.24 and LSP PE1-to-PE2_2 to send traffic to 150.2.2/24.
I can configure the router to install routes in inet.0 for 150.1.1/24 and 150.2.2/24 pointing out of the corresponding LSP:
Traffic Engineering! Basic, but still Traffic Engineering!
Now, let’s remove the EROs and the routes, and play with something else.
Request Bandwidth Reservations.
Now we are adding a bandwidth requirement to our LSPs on PE1 as follows:
The two LSPs are up:
And when we check the extensive output we can see that each requested 500Mbps, and we know that both requests were granted, otherwise they would have not come up.
Each of the router along the path (and notice that both LSPs are back to the original path through P3), reserved 500Mbps on the interfaces for each of the LSPs. We can check by looking at the show rsvp interfaces command:
Notice that the interfaces ge-0/0/4 on PE1 and ge-0/0/4 on P3 have two reservation, and are out of available bandwidth!!!
So, what would happen if I requested a third LSP to be set up?
Let’s try first without requesting bandwidth:
Not a problem:
Let’s add a bandwidth request now:
Now the third LSP is down:
I am sure you know why:
Now, what should bother you is not that the LSP is DOWN but the fact that it is DOWN even though we still have 1Gbps of bandwidth going across P1 and P2.
If we are simply relying on the IGP the only thing we can do is create an ERO! Though, do we want to do that every single time? does it mean that before creating a new LSP and requesting bandwidth I need to check the interfaces on all routers to see which path has enough bandwidth? Sounds like a nightmare!!!
What if we stop relying on the plain simple IGP that only looks at static metrics, and use something a little bit smarter that can for example realize that there is more than one path, and that if one does not have enough resources, there might be enough following a different one?.
Yes, you know exactly where I am going! The moment you’ve been waiting for: CSPF!!!!
As soon as enable it on the third LSP magic happens and it comes up!
I am sure you want to know how this magic works!
Constrained Shortest Path First – CSPF
You are probably familiar with the Shortest Path First algorithm used by both ISIS and OSPF.
Both ISIS and OSPF are link state routing protocols which advertise information about links (including type, IP addresses & metrics), how they connect to other devices running ISIS/OSPF in the network, plus other attributes such as area number, router ids, and so on. All this information is placed in a Link State Database which you can think of as the detailed map of the entire network.
The Dijkstra’s Shortest Path First algorithm, commonly referred to as just SPF, uses information in the link state database to calculate the shortest path to each destination. The shortest path is essentially the path with the smallest total metric from the router running the algorithm to the destination. Thus, this calculation is based on a static value associated with each link in the network.
Junos OSPF calculates this metric using 10^8/bandwidth, while ISIS uses a fixed value of 10, regardless of the interface type. While these values can be manipulated in different ways, they are static.
When we create an LSP without CSPF, we are letting each router along the path decide how to set up the LSP. Remember that at each node checks the routing table to determine who the next node needs to be to reach the egress. With an ERO we have some control over this, but the process is manual, and again static.
When we created the 3 LSPs before, all 3 without CSPF, the routers were choosing the exact same path to reach the egress. The first 2 LSPs were reserving the whole bandwidth of the interfaces. Thus the third LSP requested 100Mbps could not be created.
When we turned on CSPF, the LSP came up because CSPF realized that though the LSP couldn’t be set up across P3, there was another path that could provide the required bandwidth. CSPF takes into account what’s really available on the interfaces, vs. the requirements of the LSP.
Obviously, this information needs to be available for CSPF to run the calculations. The information would be found in a different database called the TED (Traffic Engineering Database).
CSPF looks at the requirements, looks at what’s available and builds a strict ERO that is passed to RSVP for signaling. Thus, CSPF can solve the problems we ran into before:
- Building an LSP when no bandwidth is available over one path
- Creating some LSPs over one path, and other LSPs over other paths, so that we don’t create congestions over a given path and underutilize others.
- Creates EROs automatically given a set of requirements. More scalable.
Now, where does the information in the TED comes from?
Both ISIS and OSPF can carry Traffic Engineering Information. ISIS does it by default, OSPF requires the traffic-engineering statement.
While the two protocols still feed their respective Link State Databases, and SPF calculations are performed to create ISIS and OSPF routes, the two protocols can also feed information into the TED:
To carry this information ISIS uses Traffic Engineering objects as defined in https://tools.ietf.org/html/rfc5305, while OSPF use opaque LSA type 10, as defined in https://tools.ietf.org/html/rfc3630
NOTE: If both protocols are injecting information into the TED, ISIS is preferred over OSPF when there is overlapping information.
Like the Link State Database, the TED contains link information (topology information), but including additional attributes:
- Actual Bandwidth (StaticBW)
- Reservable bandwidth
- Available bandwidth (AvailBW)
- Available bandwidth at each LSP priority level
- MPLS administrative groups (colors)
- IGP Metric and Traffic Engineering Metric (which can be configured different as shown in the diagram)
- Local and remote IP addresses
- IDs of the local and remote router
Summarizing what’d we seen so far, and just to make sure you are following:
SPF | CSPF Constrained SPF |
---|---|
Input from Link State Database | Input from Link State Database + LSP configuration (constrains) |
Decisions based on OSPF/ISIS metrics (calculated automatically, manually configured or fixed value) | Decisions based on OSPF/ISIS metrics + TE attributes |
SPF builds routes => routing table (e.g. inet.0) | CSPF creates ERO for LSP signaling. CSPF determines the path (ERO), and RSVP does the signaling! |
And how does CSPF determine the ERO?
CSPF is a modified shortest-path-first algorithm. It uses the contents of the TED plus user-defined LSP constrains, to figure out how the LSP should be set up. I other words, CSPF searches for a path that meets all the LSP constrains, using information in the TED.
Constrains can include:
- Bandwidth
- Administrative groups (link colors)
- Maximum hop count (for fast reroute detours)
- User defined ERO (User might specify that the path has to go through a particular router and CSPF figures out the rest)
- LSP Priority
At a high level, CSPF looks at the topology, scratches out any nonqualifying links (links that do not meet all the constrains) and performs SPF over the remaining topology.
The result is either an ERO that is handed to RSVP for signaling:
1 Mar 29 21:17:14.826 CSPF: computation result accepted 10.100.15.1 10.100.12.2 10.100.24.2
or an error message:
1 Mar 29 21:04:47.329 CSPF failed: no route toward 10.100.100.4
NOTE: When you see this error message, your first instinct will likely be to check: show route 10.100.100.4/32 and if you get a route back, sure you will say: “then why is the router telling me it has no route???”
Well, when CSPF says “I found no route to destination” what it is really saying is: “I did not find a path all the way from ingress to egress (10.100.100.4) that meets all the requirements for the LSP!”
Let’s say we configure an LSP on PE1 with PE2 as egress, with requirements:
- Bandwidth = 250 Mbps
- Only green links.
CSPF will prune the links going through router P5 and P6 because they are not green.
CSPF will also prune the links through P7 and P8, because there is not enough bandwidth available across the entire path.
CSPF runs SPF calculations over the remaining topology. Because TE-METRIC of the link between P1 and P2 is lower (assuming all other links have the same metric), the resulting ERO will be P1, P2, PE2. RSVP will then signal the LSP.
But what if there is more than one path that meets the requirements?
CSPF then looks at the metric, and for equal-cost paths, selects the one with the least number of hops.
If we changed the topology like this:
the path across P3 will be selected because of least hops.
Pretty straight forward but what if both path have the same number of hops and the same total metric?
In this case, CSPF would uses one of these tie-breaking options, or distribution criteria:
Random | Default Chooses a path at random. Tends to balance total number of LSPs over all available paths |
Least fill | Prefer the path with the most available bandwidth = largest Minimum available bandwidth ratio |
Most fill | Prefer the path with the least available bandwidth = smallest Minimum available bandwidth ratio |
Where:
Our sample topology looks like this now:
The Minimum available bandwidth ratio for the top path is 50%
The Minimum available bandwidth ratio for the top path is 60%
If we configured our LSP as this:
CSPF would select the bottom path (largest Minimum available bandwidth ratio), in other words, it would select the path with the most available bandwidth all the way across.
NOTE: CSPF calculations have other rules, that I didn’t cover, but I wanted to give you the general idea. For more details you can check:
https://kb.juniper.net/InfoCenter/index?page=content&id=KB30996
Now that we know how CSPF works and interacts with RSVP, let’s see how our LSP PE1-to-PE2_3 (back to the original topology) came up right away as soon as we turned on CSPF:
We can figure out what happened before and after we made the change by looking at the extensive output of the show mpls lsp command and looking at some of the logs.
The highlighted entries in the log (for example: “4 Mar 31 02:36:14.232 10.100.35.1: Requested bandwidth unavailable”) indicate that the router was attempting to set up the path going across P3 (10.100.35.1), but it was failing because there was not enough bandwidth. We know now that without CSPF or a manual ERO, each router looks at its own routing table for a route to the egress node.
Since we already had two other LSPs reserving 500Mbps each, along that path via P3:
when the third LSP requested 100Mbps, the RSVP signaling was failing. You can actually see the RSVP messages:
And also check the LSP log:
As soon as we removed the no-cspf from the LSP configuration, CSPF took over and figured out a path that could provide enough bandwidth to set up the LSP.
The ERO calculated by CSPF is 10.100.15.1 10.100.12.2 10.100.24.2 where 10.100.15.1 is the address of the ge-0/0/3.0 interface on P1 and 10.100.12.2 (the address of the ge-0/0/0.0 interface on P2, and 10.100.24.2 is the address of ge-0/0/2.0 on PE2 (the egress router).
You can see CSPF in action by looking at the LSPs logs:
Remember that once CSPF calculates the ERO, it is passed, along the other LSP attributes, to RSVP for signaling.
There are many other things that CSPF can do for us, including using link coloring (administrative groups) for TE, Fast Reroute, and LSP optimization, but I think this article is already long enough. Maybe I’ll cover some of these in another article.
For now just remember:
- L3VPN traffic uses two labels:
- inner = VPN label advertised by BGP,
- outer = remote PE (advertised by LDP or RSVP)
- L3 VPN routes need to be resolved in inet.3.
- Routes in inet.3 can be installed manually, or dynamically with LDP and RSVP, which are signaling protocols.
- LDP is fast and easy for not too smart!
- RSVP is smarter, and provides traffic engineering, but requires someone to tell him what to do.
- CSPF is the magic that figures things out for RSVP and makes it look smart!
AND of course as I said at the beginning:
CSPF is like the GPS, and RSVP is like the car engine!
Recent Comments