The Basics of L2TPv3

Layer 2 Tunneling Protocol Version 3 (L2TPv3) lets you tunnel layer 2 frames across an IP network. In a modern network, those frames will most likely be Ethernet frames, but the protocol also supports transport of PPP, HDLC, ATM, Frame Relay. It’s a point-to-point technology; frames enter a single ingress port and are transported to an egress port. A distinguishing feature of L2TPv3 compared to other L2VPN technologies is that you only need basic IP connectivity between the ingress and egress (PE) routers. No additional protocols, such as LDP or BGP, are used, or any MPLS encapsulation.

At the ingress port, the frame that is about to be tunneled is given an IPv4 header with the source and destination of the tunnel and an L2TPv3 header with some additional information. The protocol number is 115. Once the packet reaches its destination, the encapsulation is removed and the frame is forwarded out the egress port. The ingress and egress ports are referred to as “attachment circuits”.

The L2TPv3 Header has the following format:

[IP Delivery header] (20 bytes)
[L2TPv3 Header] {
 [Session ID] (4 bytes)
 [Cookie] (0, 4 or 8 bytes)
 [Pseudowire Control Encapsulation] (4 bytes by default)
[Layer 2 Payload]

The session ID is used at the egress router to identify which tunnel a particular packet belongs to. In other words, all received packets that belong to the same L2TPv3 tunnel should have the same session ID. Pseudowire Control Encapsulation is used for sequencing packets. The purpose of the cookie field is not explained well in the configuration guide so I’ve had to refer to the RFC. Section 4.1. says that it’s an optional field that provides a further guarantee that a received packet belongs to the session identified by the session ID. It protects against “malicious packet insertion attacks”.

Static vs. Dynamic L2TPv3

In the Cisco implementation at least, there’s a distinction made between static and dynamic L2TPv3 tunnels. Both static and dynamic tunnels require explicitly configuring the tunnel endpoints, but the dynamic style has a control plane that negotiates session parameters like the session ID and cookie. With static configuration, the operator manually defines the session ID and (optionally) the cookie at both ends of the tunnel. With dynamic configuration, the two routers involved handle those things via the L2TPv3 control plane. If you have multiple dynamic L2TPv3 tunnels between a set of PE routers, a single control channel handles all of them.

Control channel parameters can be defined in a template that can then be reused for all tunnels. According to the configuration guide, you can still use the control channel for authentication and dead-peer-detection even if you configure a static tunnel, meaning that a static tunnel automatically will be torn down if there’s a failure in the hello packets, e.g. because passwords don’t match. Basic testing does show this to be the case, which makes the dynamic/static categories somewhat confusing.

Port mode vs. VLAN mode

When tunneling Ethernet frames, two different “modes” can be used. The first option is to configure the ingress and egress attachment circuits on physical Ethernet ports. With this type of configuration, all incoming frames are encapsulated in an unmodified state and sent to the other side of the tunnel. The tunnel can be used to send multiple VLANs, i.e. extending a 802.1Q trunk, or just a single one if the attached CE device is sending untagged frames.

The other option is to configure the attachment circuits on vlan subinterfaces associated with a specific vlan-id (via the encapsulation dot1q X command). With this configuration frames tagged with the configured ingress vlan-id are sent to the egress subinterface, and “retagged” with the vlan-id configured there. Normally you would use the same 802.1Q tag on both attachment circuits, but if you don’t, you could leak between the VLANs (assuming that your attached devices don’t have spanning tree running, which would disabled the ports due to mismatched vlan-ids).

Basic Configuration Example on IOS-XE

A very basic dynamic L2TPv3 configuration that more or less only uses the bare minimum of options, looks like this:

1. Configure the l2tp-class.

This is where you configure the control plane parameters such as authentication and the hello timers. There’s also an older plain text password authentication option available. I’ve lowered the hello interval to 5 seconds from the default of 60.

l2tp-class CLASS_NAME
 digest secret 0 CISCO hash SHA1
 hello 5

2. Configure the Pseudowire Class.

These are configuration parameters for the pseudowire itself. Note that the l2tp-class template is referenced in the pseudowire-class. The loopback interface is the tunnel source address.

pseudowire-class PW_CLASS
 encapsulation l2tpv3
 protocol l2tpv3 CLASS_NAME
 ip local interface Loopback0

3. Configure the Attachment Circuits.

The tunnel endpoints are configured with the ‘xconnect’ command, which references the pseudowire-class. The IP address is the remote tunnel endpoint. The number “17” is the virtual circuit ID.

interface GigabitEthernet5
 xconnect 17 encapsulation l2tpv3 pw-class PW_CLASS

Verification is done with ‘show l2tp’ (additional options available)

csr1#show l2tp

L2TP Tunnel and Session Information Total tunnels 1 sessions 1

LocTunID RemTunID Remote Name State Remote Address Sessn L2TP Class/
 Count VPDN Group 
1386111325 524726897 csr7 est 1 CLASS_NAME

LocID RemID TunID Username, Intf/ State Last Chg Uniq ID 
 Vcid, Circuit 
2577805196 3111658915 1386111325 17, Gi5 est 00:04:32 0



L2TPv3 IOS-XE Configuration Guide

RFC 3931: “Layer Two Tunneling Protocol – Version 3 (L2TPv3)”

MPLS Configuration on IOS Software

6PE and 6VPE

This is a summary of the IPv6 over MPLS chapter in the book MPLS Fundamentals. The chapter starts off with a 20 page introduction to IPv6, but I will move directly to the actual mechanisms for transporting IPv6 over an IPv4 MPLS network: 6PE and 6VPE.

IPv6 over MPLS

Let’s say that you have a standard IPv4 based MPLS network where you offer MPLS VPNs and other such services, and now you want to start supporting IPv6 in this network. One way of doing this would be to move to a dual stack solution, which would would involve implementing an IPv6 IGP, MP-BGP and IPv6 LDP (or MPLS-TE). At the time of this book’s release, LDP wasn’t even implemented for IPv6.

Another approach is to maintain the MPLS network as it stands, but implement mechanisms on the PE routers that allow you to transport IPv6 packets as normal labeled packets on the P routers. This is exactly what the 6PE and 6VPE solutions do. The key selling point of these two solutions is that you do not need IPv6 support in the core; only PE routers are dual stack.

The difference between 6PE and 6VPE is whether the IPv6 routes are in the global routing table or in VRFs. 6PE serves the same role as plain IPv4 over MPLS, and 6VPE is the equivalent of an MPLS VPN.

Both 6PE and 6VPE exploit the fact that as long as a packet somehow can be forwarded along an LSP from ingress to egress PE, P routers do not care about anything but the transport label. When using a BGP route in an IPv4 MPLS VPN (or just IPv4 over MPLS), the top label is found by looking at the BGP next hop of the route. The ingress looks at this IPv4 next hop, finds the label associated with it, and by using this label, the packet will be forwarded to the egress PE.

If we had an IPv6 MPLS VPN with an IPv6 IGP in the core, VPNv6 prefixes through MP-BGP and IPv6 LDP, the BGP next hop would be an IPv6 address, and the router would find the correct transport label for that FEC using the IPv6 CEF table. Now, imagine that instead of the BGP route having an IPv6 next hop address, the next hop was an IPv4 address. If that was the case, the ingress PE router would impose the same VPN label, but the transport label would be found in the IPv4 FIB.

The egress PE wouldn’t care either way because the VPN label would be the same, and the packet would still be forwarded out the same interface based on that router’s LFIB. 6PE and 6VPE are based on that idea; as long as BGP provides the ingress PE with a VPN label, it doesn’t matter exactly how the transport label is handled as long as the packet reaches the egress PE.


When using 6PE, a CE router is connected to an interface on the PE router that’s in the global IPv6 routing table. Between PE routers there’s an MPLS network with BGP, an IGP and LDP. The IGP and LDP only needs to be IPv4 capable, and BGP only needs to have an IPv4 session. That BGP session do however need to activated for the IPv6 unicast address family to make it possible to advertise IPv6 prefixes from PE to PE. Additionally, BGP needs to attach a label to each IPv6 prefix with the ‘send label’ command. IPv6 routes pointing to the CE router should be redistributed into BGP. If we had two PE routers, PE1 and PE2, with the loopback addresses and, PE1’s BGP configuration would look roughly like this:


router bgp 100
neighbor remote-as 100
neighbor update-source 100

address-family ipv6 unicast
neighbor activate
neighbor send-label
redistribute static

The PE router with the IPv4 address will receive IPv6 routes with a label and a next hop of ::FFFF: This next hop is an “IPv4-mapped IPv6 address” and it tells the router that if it were to use that IPv6 BGP route, it should use the transport label for the IPv4 address

Besides the usual BGP/MPLS commands, a useful IOS verification command for 6PE is ‘show bgp ipv6 unicast neighbors’. It should show an ipv4 address as the neighbor address and that the “ipv6 mpls label capability is advertised and received”.


With 6VPE, the CE facing interface on the PE router is in a VRF. This makes 6VPE pretty much exactly like an MPLS VPN, except that the transport label is derived from an IPv4 address instead of IPv6. The key features are:

  • An MPLS core with IPv4 IGP and IPv4 LDP and/or TE.
  • The PE routers are IPv6 capable.
  • The PE routers have IPv6 VRFs on interfaces towards CEs.
  • BGP advertises VPNv6 prefixes between PEs and they are imported into VRFs based on route targets.
  • The data plane uses a transport label and a VPN label.
  • There’s some kind of IPv6 routing between CE and PE.
  • BGP next hop on ingress PE is an IPv4-mapped IPv6 address.
  • You can run MPLS VPN for IPv4 and 6VPE at the same time, and even on the same interface.

Configuration of 6VPE is pretty much identical to the MPLS VPN case. Instead of configuring an IPv4 specific VRF, you use the ‘vrf definition’ command to create a VRF with address family support. In that VRF, you define the RD and RTs as you would in an MPLS VPN. You enable the VPNv6 address family on the BGP sessions, and you don’t need to configure ‘send-label’ like in the 6PE case since that’s already the default behavior when using VPN prefixes.

Using PE routers with loopback addresses and .2, a sample configuration could look something like this:


//defining the VRF
vrf definition VRF_NAME
rd 1:1
address-family ipv6
route-target export 1:1
route-target import 1:1

//assigning the VRF to an interface
interface ethernet1
vrf forwarding VRF_NAME

//configuring the VPNv6 and VRF address families in BGP
router bgp 100

address-family vpnv6
neighbor activate
neighbor send-community both
address-family ipv6 vrf VRF_NAME
neighbor 2001:12::1 remote-as 65000 //eBGP session with the CE
neighbor 2001:12::1 activate

Important verification commands are:

#show vrf ipv6 name
Verify that the VRF is properly configured.

#show ip bgp neighbors
Verify that the VPNv6 AF is active.

#show bgp vpnv6 unicast vrf name prefix
Verify that expected prefixes are available.

For Internet access from the 6VPE VPN, you have the same options as with an MPLS VPN. Briefly, these are:

  • “Non-VRF Internet Access”, a method that involves configuring sub-interfaces on the CE router that are used to receive the Internet table, or to route a static default.
  • “VRF Internet Access”, where Internet routes are put into a VRF.
  • “Static and static VRF routes providing Internet access” is a method where a static route is configured in the PE’s VRF with a next hop in the global routing table. Return traffic is routed to the VRF by configuring a static route in the global table with a next hop in the VRF.

Basic VPLS

This is a summary of the VPLS chapter in the book MPLS Fundamentals. The post is short because the chapter is short. If you don’t know what AToM is, you should probably read this post first.

VPLS – Virtual Private LAN Service

VPLS emulates an Ethernet segment over an MPLS backbone. It’s similar to AToM, but it creates a point-to-multipoint service instead of just point-to-point. Because it emulates a LAN, it has to provide certain features like MAC address learning and broadcast/multicast replication, that are not necessary when you only do a point-to-point tunnel.

VPLS Architecture

Ethernet switches do the following:

  • Forwards frames.
  • Forwards frames with unknown destination addresses.
  • Replicates broadcast and multicast traffic.
  • Loop prevention.
  • Dynamic learning of MAC addresses.
  • Aging of MAC addresses.

VPLS needs to emulate these functions.

All PE routers that belong to a particular emulated LAN maintain a full mesh of pseudowires between them. Loops are prevented by using split horizon; a frame received on a pseudowire is not forwarded out another pseudowire. Because there’s a full mesh, broadcast and multicast can be replicated and sent to all PEs even with split horizon in effect. Unknown unicast are also replicated and sent to all PEs. PE routers learn MAC addresses by looking at source addresses in frames, and they also age out just like in a normal Ethernet switch.

VPLS Data Plane

Like AToM, VPLS uses a label stack with two labels – a VC label and tunnel label. The tunnel label is used to switch the packet between PE routers. When the packet arrives at the egress PE, the exposed VC label is used to direct the packet to the correct pseudowire attachment circuit. If frames have an 802.1Q tag, that is removed as it enters the VPLS network.

The MAC table is used to forward frames to and from the physical Ethernet ports and the pseudowires (the PE router can have physical ports and pseudowires for the same VLAN).

VPLS Signaling

VPLS uses targeted LDP sessions between PE routers. This is configured in a Virtual Forwarding Instance (VFI), which is a collection of data structures on the PE router that handles the forwarding. Each VPLS instance is tied to a VLAN interface on the PE and given a VC ID / VPN ID. The same ID is used for all VFIs that belong to the same VPLS instance.

Basic VPLS Configuration

Define a VPLS instance:

router(config)#l2 vfi NAME manual

Specify a VPN ID:

router(config-vfi)#vpn id NUMBER

Configure neighbors for the full mesh:

router(config-vfi)#neighbor x.x.x.x encapsulation mpls

Associate the VFI with a VLAN interface:

router(config-if)#xconnect vfi NAME

All PEs participating with have the same configuration except for having different neighbors.

The main verification commands are:

#show vfi NAME
Shows the local attachment circuit and the LDP neighbors.

#show mpls ldp neighbor
Shows the targeted ldp session between PEs.

#show mpls l2transport summary
Shows pseudowires and what labels are used for them.

#show mpls l2transport binding
Shows local and remote VC labels for each pseudowire.

#show mpls forwarding-table
Shows the LFIB for verification of the data plane.

VPLS and Tunneling Layer 2 Protocols

By default, layer 2 protocols are not tunneled across the VPLS network. For example, if you look at the CDP neighbors from a CE router, you’ll see the PE router, not the remote CEs. Similarly, STP BPDUs are blocked. You can change this behavior with the l2protocol-tunnel command on the CE facing physical interface on the PE router.

Not forwarding STP BPDUs does not cause a problem in most cases because split horizon prevents loops. However, in certain scenarios it’s necessary to run STP, like when you are multihoming a site to two or more PEs. If this is the case, and you enable tunneling of STP BPDUs, you should do this everywhere to avoid “severe network problems and instability”.

Trunk Port Between the CE and PE

In the basic VPLS configuration above, the CE facing physical interface on the PE is an access switchport. An option is to make this port a trunk port with multiple allowed VLANs. If this option is used, a separate VFI instance has to be created for each VLAN that you are tunneling. Each VLAN then gets its own VLAN interface on the PE where the VFI gets attached.

Hierarchical VPLS

Hierarchical VPLS (H-VPLS) involves connecting a second layer of routers to the PE router. You’re basically adding an “access layer” to the PE router. The benefit of this is to get less signaling and packet replication in the MPLS core. H-VPLS uses two types of routers: N-PE routers (network PE) that are the normal PE routers that connect directly to the VPLS cloud, and U-PE (user PE) routers that connect to the N-PE on the customer side.

There are two forms of H-VPLS:

  • H-VPLS with dot1q tunneling.
  • H-VPLS with MPLS in the access layer.

H-VPLS with Dot1q (QinQ) Tunneling in the Access Layer

Just like with AToM, it’s possible to combine QinQ tunneling with VPLS. The idea is that if the customer has say 50 VLANs, you can avoid using 50 VLANs in the VPLS network by encapsulating all of them in a single provider VLAN. The QinQ encapsulation happens at the U-PE router, and when that “double tagged” frame reaches the N-PE, it gets MPLS labels and is forwarded to other N-PEs.

H-VPLS with MPLS in the Access Layer

In this design, there are pseudowires from the U-PE routers to the N-PE. To get the correct forwarding in this scenario, you must disable split horizon in the N-PEs since frames must be able to be received and then forwarded on pseudowires.

Quality of Service

The default behavior is to copy the 802.1Q priority bits to the EXP bits at the label imposition. You can change this behavior with a policy map applied to the vlan interface.

Limiting MAC Addresses

A potential issue in VPLS is that the PE routers get overwhelmed by having to learn too many MAC addresses. Either because sites have too many hosts, or because there’s a denial of service attack that attempts overwhelm the devices with spoofed packets. Either way, you can control the limits on the number MAC addresses with this command:

mac-address-table limit [vlan vlan] [maximum maximum] [action {warning|limit|shutdown}] [flood]

Routing Peering

In both AToM and VPLS, CE routers are directly connected at the network layer, which means that the service provider does not have to worry about customer routing. From the customer’s perspective, routing protocol behavior will not be the same in AToM and VPLS because one is a point-to-point circuit, and the other point-to-multipoint. The routing protocol configuration is simplified in the VPLS case compared to having a larger number of point-to-point circuits.

Any Transport over MPLS – A Short Summary

This is a summary of the AToM chapter in the book MPLS Fundamentals

Any Transport over MPLS

A MPLS Layer 3 VPN has the disadvantage of requiring IP connectivity between the customer and the provider. A customer may want to only have to worry about their own devices at the network layer and have complete control of what it looks like. Another reason to use AToM is if you have legacy equipment that cannot use IP. Before AToM, this meant that providers had to maintain their old Frame Relay and ATM networks to offer this layer 2 service, while also offering MPLS Layer 3 VPNs using their MPLS backbone. Using AToM, a provider can offer a layer 2 service that uses the MPLS backbone.

AToM is a point-to-point service that’s sometimes referred to as Virtual Private Wire Service (VPWS). Just like with MPLS VPN, the “intelligence” is found in the PE routers and P routers only switch labeled packets between different PEs.

Transporting Layer 2 Frames

There are two solutions for creating a point-to-point pseudowire that can transport layer 2 frames:

  • AToM
  • Layer 2 Tunneling Protocol version 3 (L2TPv3)

The main difference between these two is that AToM uses an MPLS network to label switch the encapsulated frames from ingress to egress, while L2TPv3 uses IP as the tunnel encapsulation.

Both solutions use the pseudowire concept that makes it appear to the connected layer 2 devices that they exist on the same physical wire. A frame enters the ingress PE router, gets encapsulated, and is then transported to the egress PE where it is decapsulated.

The interfaces on the PE routers where frames enter and leave the pseudowire are called the Attachment Circuits (AC).

AToM Architecture

If you are familiar with MPLS VPN, you know that BGP is used between PE routers to signal what label the remote PE router expects to see. AToM uses a very similar architecture, but LDP is used between PE routers instead of BGP.

A targeted LDP session between PE routers is set up, and using this session, a PE router advertises the so called VC label to its LDP peer. An ingress PE router uses this VC label and a tunnel/transport label to create a label stack that can transport the frame to the egress PE.

The VC label identifies the pseudowire or virtual circuit, and makes it so that when the egress PE receives a packet with this label, it knows that it contains a layer 2 frame destined for a certain attachment circuit.

The tunnel label is found by looking at the IP address used for the targeted LDP session with the egress PE. This is a loopback address on that router that’s advertised in the IGP, and the ingress PE should have a label binding for this address via LDP. By using this label as the top label, the packet should be labeled switched to the egress PE (an alternate approach is to create an LSP between PEs using MPLS-TE).

If you look at the LFIB on the egress PE you should see the VC label as a local label with the AC as the outgoing interface.

The TTL is set to 255 in the tunnel label and to 2 in the VC label.

Signaling the Pseudowire

The AToM pseudowire is signaled using a targeted LDP session between two PE routers. The main goal of this session is to exchange the VC labels. In this exchange, LDP uses two TLVs: a Label TLV that advertises the label, and a Pseudowire Identifier FEC TLV that contains various parameters used by AToM. The PW FEC TLV contains:

  • C-bit: Set to 1 if the control word is present.
  • PW-type: A field that specifies what type of pseudowire it is.
  • Group ID: Used to group pseudowires. In IOS, all pseudowires with the same AC are given the same ID which allows LDP to withdraw the label for multiple pseudowires in one message (e.g. because the AC interface goes down).
  • PW-ID: This is a 32 bit identifier for the pseudowire. In IOS, the PW-ID is called VC-ID and it shows up in various show commands.
  • Interface parameters: Contains things like MTU, interface description and request VLANs. MTU must match on both sides.

The PW FEC TLV is used to tie the two unidirectional LSPs used by AToM together.

Signaling the Status of the Pseudowire

There are two methods:

Label Withdraw: An older method where the PE routers send a Label Withdraw message if the AC goes down. The PE can also send a Label Mapping release message if the PE wants to restart forwarding frames with sequence number one.

Another method is is to use the PW Status TLV and this supports more states then the label withdraw method. To use this method, both PE routers must support it or they revert back to the older label withdraw method.

The Control Word

The control word is a 32 bit field added between the label stack and the transported layer 2 frame. It’s required for some layer 2 protocols, but optional for others. The ingress PE router adds the control word and the egress PE removes it. The egress PE knows to expect the control word due to the signaling or configuration of the pseudowire.

Format of control word:

[0000][Flags][B][E][Length][Sequence Number]

The control word has the following functions:

Padding small packets:

  • If an AToM packet is smaller than the minimum for the encapsulation type (e.g. Ethernet’s 64 bytes), there’s padding added in the control word. The egress PE knows that the padding was added because the length field of the control word will be non zero.

Carrying control bits of the transported frames:

  • Certain layer 2 flags are set in the control word, depending on the encapsulation type. An example is the Discard Eligible flag in Frame Relay.

Preserving the sequence:

  • By default, AToM does not use sequence numbers, but it can be enabled. If it’s enabled, the sequence number is carried in the control word. Out of order packets are then discarded.

Load Balancing

  • If IP packets are labeled, the router can look at the first nibble after the labels to identify the payload. In the IPv4 case, the first nibble is 4. This information is then used when load balancing packets. When labeling a frame like in AToM, there’s a chance that the first nibble of the payload is 4 without it actually being an IPv4 packet. The control word makes it so that the first nibble is always 0000, preventing this confusion.

Fragmentation and Reassembly

  • PE routers decide whether to support fragmentation for AToM packets. The B (beginning) and E (end) fields in the control word are used for this.
  • The following combinations of the flags are possible:
    00 – the entire payload is in a single packet.
    01 – the first fragment.
    10 – last fragment
    11 – intermediate fragment.
  • The ingress PE fragments and the egress PE reassembles.
  • P routes cannot fragment and this means that if an AToM packet that is too big is sent into the core, it’s always dropped.

MPLS MTU in the MPLS Backbone

With an IP payload of 1500 bytes, an AToM packet can reach 1530 bytes:

  • 1500 byte IP payload.
  • 8 bytes from two labels.
  • 4 bytes from the control word.
  • 4 bytes Ethernet 802.1Q tag.
  • 14 byte Ethernet II header (no FCS)

The main difference compared to say an MPLS VPN packet is that we’re now transporting an extra layer 2 header. This can cause MTU issues unless you set the MPLS MTU to 1530 to avoid fragmentation. Path MTU Discovery or other methods to reduce the size of the IP payload can also be utilized.

Basic AToM Configuration

The first step is to select the encapsulation of the AC interface. In the case of Ethernet that’s not required since an Ethernet interface by default uses the encapsulation Ethernet, but when using serial links, you have several options like PPP, HDLC, FR and so on.

The next step is to configure the actual AToM pseudowire on the same interface using the ‘xconnect’ command:

router(config-if)#xconnect peer-router-id vcid encapsulation mpls

The peer-router-id is the LDP router id of the remote PE. The vcid is a number that uniquely identifies the pseudowire. Encapsulation can be mpls or l2tpv3 (remember, L2TPv3 is a pseudowire over IP instead of MPLS).

The main verification command is #show mpls l2transport vcid.

The command #show mpls l2transport hw-capability can be used to see which encapsulations and AToM features an interface supports.

Instead of specifying the encapsulation as mpls directly on the xconnect command, you can specify a “pseudowire class” where you specify the parameters for the pseudowire. This is not required if you use the default parameters for everything, but if you want to for example use the preferred path feature, it’s configured in the class. It looks something like this:

pseudowire-class CLASS_NAME
encapsulation mpls
other parameters
interface ethernet1
xconnect 100 pw-class CLASS_NAME

Transported Layer 2 Protocols

This section contains discussion on specific features and nuances of transporting various layer 2 protocols. Because they’re legacy protocols at this point, the Frame Relay and ATM sections have been drastically shortened compared to the coverage in the book.


  • Transporting these protocols require that you specify the correct encapsulation type on the AC.

Frame Relay:

  • There are two methods:
    • DLCI to DLCI where each virtual circuit is mapped to its own pseudowire.
    • Port to Port where all virtual circuit are carried in one pseudowire.


  • There are two methods:
    • ATM AAL5
    • ATM Cell Relay


Ethernet frames enter the AC at the ingress PE and then exits the AC at the egress PE. There are some nuances to this and the pseudowire can operate in two modes: port mode and VLAN mode. In port mode, all incoming frames are carried transparently across the backbone. In VLAN mode, the PE routers look at the 802.1Q tag in the frames before forwarding them. The PW ID FEC TLV is used to signal to the remote PE router what type of the port they’re dealing with.

EoMPLS Forwarding

The ingress PE removes the Preamble, Start of Frame Delimiter and FCS fields from the Ethernet header of the incoming frame. The control word is added, and the frame is labeled. If it has a 802.1Q tag, this is kept. The packet is then forwarded.

VLAN ID Rewrite

If VLAN mode is used, it’s possible to have different VLAN IDs on the pseudowire’s two ACs. If this is the case, the VLAN ID is automatically rewritten.

EoMPLS Scenarios.

This section contains three AToM examples:

  • EoMPLS Carrying Simple Ethernet.
  • EoMPLS Carrying an Ethernet Trunk
  • EoMPLS Carrying One VLAN

EoMPLS Carrying Simple Ethernet

In this scenario, port mode is configured on two PE routers. The configuration consists of only a few lines on each PE’s AC interface.

interface ethernet1
no ip address
xconnect 2000 one
pseudowire-class one
encapsulation mpls

Using this configuration, all incoming frames are transported to the remote PE defined by the IP address The vcid is 2000. A verification command for the pseudowire is #show mpls l2transport vc 2000 detail.

EoMPLS Carrying an Ethernet Trunk

In this scenario, the PE configuration is unchanged from the previous one, but on the CE side, some kind of trunk configuration is used that lets you send frames with multiple different 802.1Q tags. The PE routers will transport these frames regardless of what their VLAN ID is.

EoMPLS Carrying One VLAN

Here, AToM in VLAN mode is demonstrated. On the PE router you define an individual pseudowire for each VLAN that you want to transport. This is done by configuring dot1q subinterfaces:

interface ethernet1
no ip address
interface ethernet1.100
encapsulation dot1q 100
xconnect 2000 one
interface ethernet1.200
encapsulation dot1q 200
xconnect 2001 one // note that the vcid is different.
pseudowire-class one
encapsulation mpls

An advantage of this approach is that since each VLAN is a different pseudowire, they can be routed differently through the MPLS network.

If you use the command #show mpls l2transport vc you will see that the pseudowires are running in VLAN mode.

Dot1q Tunneling over AToM

QinQ is a technique for tunneling 802.1Q Ethernet frames across an Ethernet backbone by encapsulation them in a second Ethernet header. It’s possible to combine this with AToM such that incoming frames are given a second 802.1Q header before the label imposition.

AToM Tunnel Selection

You can configure an MPLS-TE tunnel and then have the pseudowire use that LSP instead of the default path. This feature uses the “preferred-path” parameter in the pseudowire-class configuration. Note that this is not the same thing as configuring TE tunnels between the PE routers and then making them available for IP routing using static routing or autoroute announce; that requires no special consideration and AToM will treat it as a normal transport LSP to the egress PE. Preferred-path lets you pick a tunnel that’s not used as the next hop for the remote PE.

Configuration would look like this:

TE tunnel:
interface tunnel1
ip unnumbered loopback0
tunnel mode mpls traffic-eng
tunnel destination
tunnel mpls traffic-eng path option dynamic or explicit path

The pseudowire class:
pseudowire-class NAME
encapsulation mpls
preferred-path interface tunnel1 disable-fallback

The Pseudowire:
interface ethernet1
xconnect pw-class NAME

Another option is to use an IP address as the preferred path instead of a TE tunnel. This lets you configure a secondary loopback on the remote PE that can be routed differently than the main LDP router id. When packets are sent on that LSP instead of the default, it could take a different path.

If the preferred path fails, the default behaviour is to fall back to normal forwarding. You can disable fallback with the disable-fallback parameter on the preferred-path command.

AToM and QoS

Because AToM packets are labeled packets, you need to use the EXP bits in the labels if you want to do QoS while the packets are in the MPLS network. There are three possibilities for how to handle this:

  • Statically configuring the EXP bits on ingress.
  • Marking EXP according to IP precedence.
  • Using information from the frame header to set EXP.

For static configuration, you configure an inbound policy map that sets the EXP bits to some value for all incoming frames. Marking according to IP precedence is naturally only possible if the frame’s payload is IP (which will almost always be the case). Setting EXP based on the frame header involves matching the CoS in the 802.1Q header and then setting the EXP bits to the desired values. The default behaviour is for the router to copy the 802.1Q CoS to the EXP bits.

Advanced AToM Features

These features are mentioned, but not covered in the book:

  • L2VPN Interworking that allows different encapsulation types to use the same pseudowire. There’s translation between the two encapsulations.
  • L2VPN Inter-Autonomous Networking.
  • L2VPN Pseudowire Switching that stitches two different pseudowires together.
  • Local Switch that allows a PE router to switch layer 2 frames from one AC to another without sending the frame across the MPLS network.

LDP – RIB, LIB, LFIB, Bound Addresses

The Label Distribution Protocol (LDP) is a fairly simple protocol, at least compared to monstrosities like BGP or OSPF. However, the fact that LDP is coupled to your IGP in some ways introduces a few difficult aspects. One example of this is LDP-IGP synchronization and why that’s needed, and another is the interaction between the IP RIB, the FIB, the LIB, the LFIB and the so called “bound addresses”. Let’s explore these concepts.

A network that runs LDP always uses one of the IGPs to provide IP reachability. You could use EIGRP for this, but it’s almost always OSPF or IS-IS for various reasons that I won’t go into here. The IGP does its normal exchange of information which results in a database of destinations and what the correct next hop is to reach each destination. Colloquially, we almost always refer to to this database as the “routing table”, but in technical literature, people like to use the term “IP Routing Information Base”.

Because the IP RIB isn’t formatted in a way that’s ideal for packet forwarding, the information is reformatted and converted into the FIB, the Forwarding Information Base. When an IP packet enters the router, the FIB is used to determine what outgoing interface should be used to correctly forward the packet.

When an LDP session is established between two routers, an exchange of “label bindings” is done. A label binding consists of three things:

  • The LDP router ID.
  • The IP prefix.
  • The label.

The router is essentially saying “if you want to send a labeled packet to me for this prefix, use this label”. When an LDP neighbor receives the label bindings, they are installed in the Label Information Base (LIB). The LIB contains label bindings received from all LDP neighbors, assuming that there’s no filtering going on. If a router has three LDP neighbors, the LIB could, conceptually, look like this:
local binding: 17
remote binding:, label 19
remote binding:, label 32
remote binding:, label 28

This is the LIB entry for the IP prefix The addresses, and are the LDP router IDs for this router’s LDP neighbors. The local label, 17, is this router’s label binding for If a packet is received with label 17 as the top label, the router knows that it should forward it to one of its LDP peers that has provided a label for the same prefix. However, the LIB doesn’t contain any information that helps the router make a correct choice between the three available routers, and even if a choice was made, say randomly, the router does not know what outgoing interface to use.

We could compare that to the IP RIB which contains all information needed to pick the correct outgoing interface given a certain destination IP address on the incoming packet. That LDP cannot work exactly like the IGP in this regard is of course because LDP is a protocol for distributing label bindings, not a routing protocol. To fix our problem of not having an outgoing interface, LDP needs to piece together information from three sources; the IP RIB, the LIB and the LDP neighbor table.

From the IP RIB, the IP next hop for the FEC is found. FEC stands for Forwarding Equivalency Class and it’s MPLS terminology for packets that get the same forwarding treatment. A label binding is a label that is bound to a FEC, as MPLS Fundamentals puts it.[1]

In our case, the FEC is the IPv4 prefix, but a label can be bound to other types of FECs. Since LDP isn’t a routing protocol, it simply uses the loop free topology created by the IGP in the sense that labeled packets are forwarded using the same interfaces as unlabeled packets, for the same IP prefix. Or put differently: the outgoing interface for is also the outgoing interface used for packets that arrive with the local label for that FEC.

After the outgoing interface and next hop IP address have been found in the IP RIB, LDP must figure out what LDP peer is found on the other side of that link. The reason for this is simple: if the router doesn’t know which peer it’s sending the labeled packet to, it doesn’t know what label to use since the LIB only contains labels. For example, if the router figures out that the IP next hop for the FEC is gigabit0/1 and the IP next hop, that information by itself is useless since the LIB looks like this:
local binding: 17
remote binding:, label 19
remote binding:, label 32
remote binding:, label 28

This is where the Bound Addresses come in. When an LDP session is formed, the peers exchange information about what IP enabled interfaces they have. This would look something like this:

LDP Peer Identifier:
Addresses bound to peer:

The IP next hop is found in the bound address list of the LDP peer, and the router therefore knows that it must use that peer’s label in the LIB. Returning to the LIB, we find that the outgoing label for the FEC and peer is 32.

Using the fact that that the outgoing label is 32, that our own local label is 17, and that the outgoing interface is gigabit0/1, the router can build the so called Label Forwarding Information Base (LFIB). The LFIB is the data structure used to forward incoming labeled packets. In this case it simply says that if the incoming label is 17, that label is swapped to 32, and the packet is forwarded out gigabit0/1.

[1] MPLS Fundamentals, page 68.

MPLS-TE with Class-Based Tunnel Selection

Someone asked about class based tunnel selection on a forum that I visit ( and I thought that it would make a decent blog post. The question:

Now I want to simulate Class based tunnel selection in MPLS Core network running with IS-IS. Could be OSPF as well. Would be grate if in PE is configured VRF as well. And if I send packet from one PE vrf to another, different packets goes to different path.[1] 

From this question we are able to extract the following:

  • We need to create a MPLS-TE based VPN between two PE routers.
  • Transport across the core needs to use at least 4 different LSPs to allow packets to use different paths in both directions.
  • BGP needs to be involved to distribute VPNv4 or VPNv6 prefixes between PE routers.
  • Class-Based Tunnel Selection should be used on the PE routers to direct packets into a specific tunnel based on the packet marking.

1. Using MPLS-TE for VPNs

The data plane of an MPLS Layer 3 VPNs involves attaching two labels to each packet at the ingress PE router. The bottom label is called the VPN label and it’s sent from the egress PE to the ingree PE by BGP. It’s used by the egress PE router to switch the packet to the correct outgoing interface. The top label is called the IGP label and its purpose is to get the packet to the correct egress PE based on the BGP next hop of the destination. Now, in most cases the IGP label is distributed by LDP, but this doesn’t necessarily have to be the case, as we’ll see in this blog post.

2. What is MPLS-TE?

In a nutshell, MPLS Traffic Engineering is way to create an LSP between a ‘head end router’ (the LSP’s first router) and an IP destination. The traffic engineering aspect of it is that you have great control over how this LSP is created. For example, you can specify every single hop that the LSP should take, which allows you to route packets in ways that would be impossible using other methods, such as traditional IP routing protocols. In the next section we’ll start looking at TE  in a specific topology while building towards a solution for the person’s question.

3. Information Distribution

The IGP plays an important role in TE and it must be a link state protocol. Because IS-IS and OSPF are the only link state protocols that we use, that’s what you have to choose from. Here I’ll use OSPF since almost everyone that knows IS-IS knows OSPF, but the opposite isn’t the case. A link state IGP normally only carries information about links between nodes and the IP prefixes attached to these nodes. When TE is activated, the IGP also starts incorporating TE specific information about each link that is enabled for TE. In the case of OSPF, that information is exchanged between routers in a new LSA called the type 10. Before demonstrating this, let’s look at the topology that I’ll use throughout this post:


Loopbacks use 192.168.1.x where x is the router number. Links use 10.1.xy.z where x is the lowest router number, y the higher one and z the local router number. E.g. on R2’s side of the R2 – R4 links. OSPF is configured as the IGP on all links and loopbacks on the provider routers. I’m using Cisco IOS on all routers.

Our first task will be to activate the TE capabilities of OSPF. That involves the following steps:

  1. Globally activate TE using the command router(config)#mpls traffic-eng tunnels. This doesn’t do anything besides making other TE commands actually. It also, presumably, activates some TE related process(es).
  2. Each interface must also be configured with the same command: router(config-if)#mpls traffic-eng tunnels.
  3. Under the OSPF router process, TE must be enabled for an area (0 in this case), and a TE router-id must be chosen.
    router(config-router)#mpls traffic-eng area 0
    router(config-router)#mpls traffic-eng router-id loopback0

There are a few different ways that we can verify that it’s configured properly. The command ‘show mpls traffic-eng tunnels summary’ should show that processes are running and that forwarding is enabled:

R2#show mpls traffic-eng tunnels summary
Signalling Summary:
    LSP Tunnels Process:            running
    Passive LSP Listener:           running
    RSVP Process:                   running
    Forwarding:                     enabled
    Head: 0 interfaces, 0 active signalling attempts, 0 established
          2 activations, 2 deactivations
          0 SSO recovery attempts, 0 SSO recovered
    Midpoints: 0, Tails: 0
    Periodic reoptimization:        every 3600 seconds, next in 2244 seconds
    Periodic FRR Promotion:         Not Running
    Periodic auto-bw collection:    every 300 seconds, next in 144 seconds

Verifying that the interfaces have TE enabled on them can be done with ‘show mpls interfaces’. The tunnel column should say yes:

R2#show mpls interfaces
Interface              IP            Tunnel   BGP Static Operational
GigabitEthernet0/2     No            Yes      No  No     Yes        
GigabitEthernet0/3     No            Yes      No  No     Yes

So what have we actually achieved by making OSPF “TE aware”? If we take a look at the OSPF database, we’ll see a new LSA called Type 10:

R2#show ip ospf database


                Type-10 Opaque Link Area Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Opaque ID         183         0x80000013 0x009B10 0         338         0x80000013 0x00871F 0         64          0x80000013 0x00732E 0         40          0x80000013 0x005F3D 0         338         0x80000013 0x001FC7 2         64          0x80000013 0x000C19 2         40          0x80000013 0x00C1E4 2         183         0x80000013 0x008D58 3         338         0x80000013 0x009925 3         64          0x80000013 0x00DA10 3         40          0x80000013 0x00A816 3         183         0x80000013 0x003E9F 4   

Each LSA represents one interface and it holds TE specific information (there’s also an LSA for each router that carries the MPLS-TE router-id). Let’s take a look at one of these LSAs:

R2#show ip ospf database opaque-area

            OSPF Router with ID ( (Process ID 1)

                Type-10 Opaque Link Area Link States (Area 0)

  LS age: 633
  Options: (No TOS-capability, DC)
  LS Type: Opaque Area Link
  Link State ID:
  Opaque Type: 1
  Opaque ID: 2
  Advertising Router:
  LS Seq Number: 80000013
  Checksum: 0x1FC7
  Length: 132
  Fragment number : 2

    Link connected to Point-to-Point network
      Link ID :
      Interface Address :
      Neighbor Address :
      Admin Metric : 1
      Maximum bandwidth : 125000000
      Maximum reservable bandwidth : 125000000
      Number of Priority : 8
      Priority 0 : 125000000    Priority 1 : 125000000  
      Priority 2 : 125000000    Priority 3 : 125000000  
      Priority 4 : 125000000    Priority 5 : 125000000  
      Priority 6 : 125000000    Priority 7 : 125000000  
      Affinity Bit : 0x0
      IGP Metric : 1

    Number of Links : 1

The first paragraph gives us some general information, like the OSPF router id of the advertising router. The interface is then described, and using these LSAs, each router builds a database of each TE enabled link. In a normal SPF calculation OSPF finds the lowest cost path to destinations using the type 1,2,3,4,5 and 7 LSAs that we’re used to from IP routing. Similarly, the type 10 LSAs are used to find paths for TE LSPs. The big difference is that when you configure the TE tunnel on the head end router, you can impose various constraints on it based on the information carried in the type 10 LSA.

The most well known of these constraints is the bandwidth requirement. By configuring the ‘ip rsvp bandwidth’ command at the interface level, we can tell the TE database that a certain amount of bandwidth can be reserved in the outgoing direction of that interface. We can then specify that the TE tunnel must use a certain amount of bandwidth, and if that much is available on a path between head end and tail end, the tunnel satisfies the constraints imposed on it. When the tunnel is signaled, the reservable bandwidth is decreased with the amount that the tunnel uses. In my topology I have configured that a full 100% of the interface bandwidth is available to TE, but as we’ll see later, to create the LSPs that we’re interested in here, we don’t need to specify a bandwidth constraint at all.

Other information carried in the LSA is the administrative weigh (admin metric in the output) which is also called the “TE metric”, the attribute flags, the IGP metric, priority values and bandwidth per periority. Neither of these will be of further relevance in this post, but know that they too can be used to impose constraints on the TE tunnel and thereby control which interface a specific TE LSP can traverse.

4. Creating the LSPs

Now that all of our four routers (R2, R3, R4 and R5) have exchanged TE information through the type 10 LSA and built identical databases, we’re able to create the needed LSPs. But what is an LSP exactly? The acronym stands for Label Switched Path and it’s defined as a series of routers that a packet takes as it’s label switched through the network. To accomplish this, a router needs to be aware of what label packets will arrive with, what new label it should swap the incoming label to, and what outgoing interface it should use to send the packet further down the LSP. In the case of MPLS-TE, that information is distributed using the RSVP protocol.

However, before RSVP can do its thing, the head end router (the first router of the LSP) needs to come up with a path that the LSP should follow. There are two main ways that this can be done, dynamically and explicitly. With a dynamic path calculation, the head end uses the TE database to find the lowest metric path to the destination that satisfies any configured constraints, like the required bandwidth. In our case, dynamic paths would not be a good idea because we want the LSPs to take specific paths, and we want to know what LSPs go where.

On R2 we want one LSP that uses the R2 -> R3 -> R5 path and one that uses R2 -> R4 -> R5. Similarly, on R5 we want one LSP that arrives on R2 via R3 and one that goes via R4.


We need separate LSPs in each direction since an LSP is unidirectional. On R2 and R5, the top LSP that goes through R3 will be named tunnel3, and the bottom LSP will be named tunnel4.

Configuring this tunnel is done in two steps and the first step is define the explicit paths with the ip explicit-path command. Here I have defined one path that uses the link addresses on the R2 – R3 – R5 path, and one that uses the path via R4:

R2#show run | s ip explicit-path  
ip explicit-path name VIA_R3 enable
ip explicit-path name VIA_R4 enable

Equivalent configuration has been added on R5 for the LSPs in the opposite direction. Once the explicit paths are defined, we’re ready to do the actual tunnel configuration on R2 and R5. A minimal TE tunnel configuration requires the following:

  • Creating a tunnel interface and setting the tunnel mode to MPLS-TE.
  • A tunnel destination.
  • Setting the tunnel IP address by referencing a loopback address with the ip unnumbered feature.
  • Defining a path option.

(Note here that I don’t mention constraints like bandwidth or using the attribute flags/affinity bits. These features can be used, but it’s optional. If you for example don’t care about the bandwidth reservation aspect of MPLS-TE, you don’t have to use it, and in this case I think that it’s unnecessary.)

On R2, the configuration of tunnel3 and tunnel4 look like this:

interface Tunnel3
 ip unnumbered Loopback0
 tunnel mode mpls traffic-eng
 tunnel destination
 tunnel mpls traffic-eng path-option 1 explicit name VIA_R3
 no routing dynamic
interface Tunnel4
 ip unnumbered Loopback0
 tunnel mode mpls traffic-eng
 tunnel destination
 tunnel mpls traffic-eng path-option 1 explicit name VIA_R4
 no routing dynamic

The configuration is identical, except for the fact that the explicit path is different. The command ‘no routing dynamic’ is added by default and it prevents the router from sending routing updates using that interface. We can see the items that I listed before; a destination, tunnel mode is traffic-eng and we have an explicit path configured as the path option. On R5 I have made configuration where the only difference is that the destination is instead of

Once the tunnel configuration is made, RSVP signals the path based on the path option. The head end router initiates this signalling process by sending a message called ‘Path’ to what the path calculation has determined is the next hop. In our case the explicit path defines where the Path message is sent. The next hop router then determines if it has enough bandwidth on its outgoing interface to allow the signalling to take place. Normally, that should not be a problem because when the head end makes the path calculation it takes the the available bandwidth of links into account.

However, that calculation is based on the information that it has in its link state database and that information is not guaranteed to be exactly the same as the local information at all times. Another tunnel could reserve the bandwidth while the path setup is in progress. In our topology, that’s obviously not a problem, and besides, we haven’t even defined a bandwidth requirement.

Assuming that it passes the admission control that is this bandwidth reservation check, the Path message is sent to the next router in the list of hops that the head end has calculated. Eventually the Path message should reach the final router called the tail end router. The tail end router knows that it’s the tail because it’s the final address in the list of hops found in the Path message.

The tail end router replies with a ‘Resv’ (for Reservation) message that follows the same path of routers that the Path message did. The Resv message is what actually creates the LSP because each router uses it to send the label that it wants to see on incoming packets to its upstream neighbor. Then when a router actually receives a packet with that label in the data plane, the routers know what LSP the packet belongs to, and that it should swap the top label to the label that it has received from its downstream neighbor on the LSP. When the Resv message is received by the head end router, the LSP is fully signaled. RSVP is also used to periodically refresh each LSP to prevent it from timing out.

In our scenario we explicitly signaled 4 LSPs, but each tunnel configuration only has this single explicit path, with no other options. This means that if say R3 fails, the two LSPs traversing that router will fail, and they will not be rerouted to a different path. It’s likely that you want them to then instead use the path via R4. We could solve that by either adding a second explicit path with higher preference value, or adding a dynamic path option. I’m choosing the latter here which results in the following tunnel configuration:

interface Tunnel3
 ip unnumbered Loopback0
 tunnel mode mpls traffic-eng
 tunnel destination
 tunnel mpls traffic-eng path-option 1 explicit name VIA_R3
 tunnel mpls traffic-eng path-option 2 dynamic
 no routing dynamic

As we can see, we now have two path options. The explicit option will still be used first because it has the lower number, but should it fail, the dynamic one will be tried.

Verification of the signaled LSPs can be done with the ‘show mpls traffic-eng tunnels’ command. Here I’m looking at tunnel3 on R2:

R2#show mpls traffic-eng tunnels tunnel 3

Name: R2_t3                               (Tunnel3) Destination:
    Admin: up         Oper: up     Path: valid       Signalling: connected
    path option 1, type explicit VIA_R3 (Basis for Setup, path weight 2)
    path option 2, type dynamic


  InLabel  :  - 
  OutLabel : GigabitEthernet0/2, 16
  RSVP Signalling Info:
       Src, Dst, Tun_Id 3, Tun_Instance 1
    RSVP Path Info:
      My Address:   
      Explicit Route: 
      Record   Route:   NONE
      Tspec: ave rate=0 kbits, burst=1000 bytes, peak rate=0 kbits
    RSVP Resv Info:
      Record   Route:   NONE
      Fspec: ave rate=0 kbits, burst=1000 bytes, peak rate=0 kbits


The command is quite verbose so I’ve omitted some of the information. What I have included tells us that the tunnel is up and that it’s using path option 1, our explicit path. We’re also able to tell exactly how the LSP is routed ( to to and what label packets will get when they’re sent out the tunnel interface (16).

5. VRFs and BGP

Before continuing with the MPLS-TE discussion, let’s do a detour to BGP. In our network we want to combine TE tunnels with MPLS VPNs, and that means that we need VRFs and BGP. I won’t spend too much time on this section since it’s not the main focus of the blog post, but at a minimum we need a VRF on each PE router and a BGP session between the two PEs. I’ll then connect each CE router to its PE using BGP. Each CE will advertise its loopback address ( and into BGP.


The VRFs are configured with appropriate route targets to let routes from R1 reach R6, and vice versa. R2’s configuration is very simple and looks like this:

R2(config-router)#do show run | s router bgp 200
router bgp 200
 bgp log-neighbor-changes
 neighbor remote-as 200
 neighbor update-source Loopback0
 address-family vpnv4
  neighbor activate
  neighbor send-community extended
 address-family ipv4 vrf CUST_A
  neighbor remote-as 100
  neighbor activate
R2(config-router)#do show run | s ip vrf    
ip vrf CUST_A
 rd 200:100
 route-target export 200:1
 route-target import 200:1

The point here is to have BGP send the customer routes as VPNv4 prefixes from the egress PE to the ingree PE. That update will include a VPN label which we can see with ‘show bgp vpnv4 vrf CUST_A’:

R2#show bgp vpnv4 unicast vrf CUST_A
BGP routing table entry for 200:100:, version 4
Paths: (1 available, best #1, table CUST_A)
  Advertised to update-groups:
  Refresh Epoch 2
  300, imported path from 200:300: (global) (metric 3) (via default) from (
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:200:1
      mpls labels in/out nolabel/16
      rx pathid: 0, tx pathid: 0x0

This output shows us that R2 needs to use label 16 when sending packets towards R5. When R5 receives packets with label 16, it knows what output interface to use for the packet (the interface towards R6 in this case).

The VPN will not work at this point and the problem is that there’s currently no way to get the VPN labeled packets from ingress to egress PE. In a “normal” MPLS VPN that’s handled by a combination of LDP and the IGP, but as you might expect, we’re instead going to use the TE tunnels that we’ve built between the two PE routers.

6. Forwarding Traffic Down Tunnels

As it stands, our TE tunnels cannot be used for anything. The path is signaled and ready to go, but directing traffic into the tunnels is a separate step that doesn’t happen automatically. There are numerous ways of doing this like static routing, policy based routing, something called autoroute, etc. In our scenario, we’ll use the autoroute feature.

When configuring autoroute on a TE tunnel interface you’re telling the router to see the tunnel as a directly connected link to the tunnel tail. The tunnel can then be used for packets that are destined to the tunnel tail end or to destinations beyond the tail.

What happens in this case when autoroute is enabled on all four tunnel interfaces (remember, LSPs are unidirectional so each PE has two tunnels each) is that the PEs start load sharing using these tunnels. The IGP/underlay path is no longer used at all for traffic between and, and because of this, all VPN traffic is funneled into the tunnels as well. Curiously, the tunnel’s metrics are 3, the same as the IGP cost to reach the tunnel destination, and you could imagine that you would load share between the IGP paths and the tunnel paths, but this does not happen for destinations on the tail itself.

There’s a bit more to autoroute than this, but for our purposes it’s good enough to know that when autoroute is enabled, the ingress PE routers will start using their two tunnels to load share traffic destined for the egress PE. It’s easy to verify this with show ip route.

Before autoroute:

R2#show ip route
Routing entry for
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from on GigabitEthernet0/2, 00:00:28 ago
  Routing Descriptor Blocks:
  *, from, 00:00:28 ago, via GigabitEthernet0/3
      Route metric is 3, traffic share count is 1, from, 00:00:28 ago, via GigabitEthernet0/2
      Route metric is 3, traffic share count is 1

Configuring autoroute:

R2(config)#int tun3
R2(config-if)#tunnel mpls traffic-eng autoroute announce

After autoroute:

R2#show ip route
Routing entry for
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from on Tunnel3, 09:42:13 ago
  Routing Descriptor Blocks:, from, 09:42:13 ago, via Tunnel3
      Route metric is 3, traffic share count is 1
  *, from, 09:42:13 ago, via Tunnel4
      Route metric is 3, traffic share count is 1

There’s an obvious problem with this. While we can now send packets from R1 to R6, we’re not making any kind of decision based on what class packets belong to. This is where Class-Based Tunnel Selection needs to be configured.

7. Class-Based Tunnel Selection

When you have multiple tunnels between the same head- and tail-end it’s possible to make it so that packets choose a tunnel based on the EXP bits in the packet’s label. Because multiple tunnels between the same head and tail is exactly what we have here, we should be able to make CBTS happen. We’ll discuss this from the perspective of a packet that enters R2 from R1, but as with the other things that we have configured in this post, the concepts and configuration are exactly the same when packets enter R5 from R6.

When packets enter R2 from R1, they’re not marked with EXP bits obviously since they’re normal IP packets and don’t have a label. This is not a problem because the tunnel selection happens after the input classification and marking. We can therefore set the EXP bits on incoming packets with a policy map and have the tunnel selection be based on that. To not make things needlessly complicated, we’ll have a real time class that should take the top path via R3 (tunnel 3), and everything else will be sent via R4 (tunnel 4).

Classification of the real time traffic will be based on the customer having marked those packets with CS5 or EF. Such packets then have the EXP bits set to 5 on all their labels (two labels in this case – TE label and VPN label). All other packets will drop down to the default class and will not get any marking. This results in a very simple class-map and policy-map that looks like this:

R2#show run | s class-map|policy-map
class-map match-any REALTIME
 match dscp ef 
 match dscp cs5 
policy-map CUST_A_INGRESS
  set mpls experimental imposition 5

The first step in making tunnel3 carry the EXP 5 packets is to configured that tunnel interface with ‘tunnel mpls traffic-eng exp 5’. Tunnel 4 is configured with ‘tunnel mpls traffic-eng exp default’ to indicate that all other EXP markings should use that tunnel. Alternatively, you could explicitly configure tunnel 4 with all seven values that are not 5.

The next step is to configure a so called “master tunnel”. A master tunnel is an MPLS-TE tunnel interface that groups the tunnels that are involved in the CBTS, and then becomes the output interface for all packets that would normally use our tunnel 3 and 4. Once packets (conceptually) enter the master tunnel it directs the packets to the correct member tunnel based on the packet’s EXP value. This is the configuration of my master tunnel on R2:

R2#show run int tun10
interface Tunnel10
 ip unnumbered Loopback0
 tunnel mode mpls traffic-eng
 tunnel destination
 tunnel mpls traffic-eng autoroute announce
 tunnel mpls traffic-eng exp-bundle master
 tunnel mpls traffic-eng exp-bundle member Tunnel3
 tunnel mpls traffic-eng exp-bundle member Tunnel4
 no routing dynamic

If we take a look at show mpls traffic-eng tunnels tunnel10, we see that this isn’t a normal TE tunnel with a signaled LSP:

R2#show mpls traffic-eng tunnels tunnel10

Name: R2_t10                              (Tunnel10) Destination:
  Status: Master
    Admin: up         Oper: up     Signalling: N/A

    Member Tunnels:         Member Autoroute: Inactive
    Tunnel3: Config Exp:  5 
    Tunnel4: Config Exp:  default

      Time since created: 14 minutes, 18 seconds
      Number of LSP IDs (Tun_Instances) used: 0

The master tunnel is more like a configuration mechanism that’s relevant only locally at the head end. In the output above we’re able to tell that tunnel 3 handles EXP 5 and tunnel 4 everything else (‘default’). Looking at the forwarding to the egress PE the master tunnel is now the outgoing interface:

R2#show ip route
Routing entry for
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from on Tunnel10, 00:13:21 ago
  Routing Descriptor Blocks:
  *, from, 00:13:21 ago, via Tunnel10
      Route metric is 3, traffic share count is 1

At this point our MPLS-TE based VPN with CBTS should actually be operational. However, the post wouldn’t be complete without some kind of verification that packets actually are taking the correct paths based on DSCP markings set at the customer side. The method that I’ve chosen for this verification is to put an incoming policy map on R3 and R4 that matches EXP 5 and the customer prefixes. Assuming that our tunnel selection works, we should see an increase in EXP 5 packets on R3 when we send pings with DSCP 46. When ICMP packets are sent with no marking, we should see an increase in the counter on R4. We’ll also see some “noise” in the counters due to control plane packets, but this should be insignificant compared to our large number of ICMP echoes.

After sending 100 packets with DSCP 46 and 100 without any marking from R1 to R6, these are the results:

R3#show policy-map interface

  Service-policy input: EXP_5_COUNTER

    Class-map: EXP_5 (match-all)  
      100 packets, 12200 bytes
      5 minute offered rate 1000 bps
      Match: mpls experimental topmost 5 

    Class-map: class-default (match-any)  
      10 packets, 1308 bytes
      5 minute offered rate 0000 bps, drop rate 0000 bps
      Match: any 

On this router we expect to see packets with EXP 5 because R2 directed the packets down the tunnel that’s configured with an explicit path that traverses this router.

R4#show policy-map interface

  Service-policy input: EXP_5_COUNTER

    Class-map: EXP_5 (match-all)  
      0 packets, 0 bytes
      5 minute offered rate 0000 bps
      Match: mpls experimental topmost 5 

    Class-map: class-default (match-any)  
      115 packets, 13910 bytes
      5 minute offered rate 1000 bps, drop rate 0000 bps
      Match: any 

On R4 we’re not seeing any EXP 5 packets and everything is counted in the default class.

I passed CCNP:SP 642-885 SPADVROUTE

I took this exam today and it seems appropriate to write down some random thoughts while it’s still fresh.

  • Overall, this is a reasonably fair exam, and the questions are almost entirely on topics from the blueprint.
  • No questions had language that was tricky for the sake of being tricky.
  • A single question was definitely on a topic that wasn’t on the blueprint, and it was hard enough that I probably failed it despite roughly knowing what that feature did.
  • The main topics were BGP, Multicast and IPv6. In terms of difficulty, I found the multicast questions the easiest and BGP the hardest. This may or may not be a reflection of me spending a lot more time on multicast. I was a tiny bit unprepared for the technical details of the XR based BGP questions. I know BGP fairly well on a conceptual level, and while there were some questions that you could call “conceptual”, the bulk of the questions were on the specific BGP features on the blueprint. BGP as a whole is probably tested on the SPROUTE exam that I haven’t taken.
  • The simulation style questions were disappointing in the sense that very few commands were actually implemented. I also couldn’t see which commands that I could actually use with “?”, and show run was disabled. This may, or may not, have been an issue with the keyboard layout because in my previous Cisco exams, the keyboard layout was English, and here it was Swedish, but the ? key didn’t actually work. I don’t know if this was intended. I would have been able to answer all the questions easily if I had I real router in front of me, but not knowing what commands that I could actually use, I had to give uncertain answers to several of these questions.
  • Too many questions felt like trivia rather than testing overall knowledge of the protocols. In particular, one question gave you CLI output that went something like <name of feature> x y z w. The question was then what each parameter meant. I couldn’t remember the exact order you entered these parameters and I probably failed the question despite knowing what the feature did and how it is used.
  • One or two of questions of the type where you are told to pick say 2 or 3 alternatives based on some statement, had either more correct answers than what you were allowed to pick, or less. One question in particular was on a feature that I knew inside out, but I was forced to pick 3 when only 2 were actually correct.
  • This exam does have some issues, but if you know the material, you will pass anyway. The passing score was the lowest I’ve seen on any Cisco exam so far (but I believe that I’m not actually allowed to disclose it), and most questions were actually pretty straight forward.

So how did I prepare?

  • I wrote a post last year about how I planned to prepare:
  • I mostly followed that plan, but the big game changer was the release of the book I like to call IP Routing: Long Title. “IP Routing on Cisco IOS, IOS XE, and IOS XR: An Essential Guide to Understanding and Implementing IP Routing Protocols” is pretty much the ideal book for studying for this exam because it contains both IOS and XR configuration and explanation. I read all relevant chapters in that book, and the High Availability chapter was pretty much my only source for those topics, and I didn’t feel like I missed something.
  • XR Fundamentals never played a big part in my preparations. It’s probably a useful book in general, but for this exam, I only found it relevant for learning about the XR BGP processes.
  • I spent a lot more time on multicast than what was really necessary to pass this exam. If I had just focused on BGP on XR and the IPv6 topics, I could have passed this exam two months ago, but it doesn’t feel like I wasted time. There’s nothing wrong with going deeper than what you actually need just to pass some exam.

What’s next?

  • I still have the EDGE and CORE exams left to complete CCNP:SP, and that pretty much involves learning MPLS from the ground up. I configured some rudimentary L3 VPNs last year (and read the first half of the book MPLS Fundamentals), but  that’s pretty much the extent of my MPLS knowledge at this point.
  • My man srg on irc says that EDGE builds on CORE, which makes sense so when I do start preparing for actual exams again, CORE will be my target.

DHCP In IPv6 – version 2

I wrote a post on DHCPv6 a few months ago, but after having studied the topic a bit more I was unhappy with it and felt that a rewrite was needed. In particular, a talk by Enno Rey at Troopers 15 made me realize a few things. Go watch it:

This post is still just the very basics, but I believe that it’s slightly more correct than my previous version.

1. Introduction

To understand DHCPv6 we first need to understand that IPv6 has significant differences from IPv4 in how it handles assigning an address to an interface. As you probably know, IPv4 supports manual configuration and addressing via (stateful) DHCP. IPv6 keeps these methods but adds two additional ones in the form of stateless autoconfiguration and stateless DHCP. That brings us to a total of four distinct methods of assigning addresses (and additional information such as DNS servers) to hosts:

  • Stateless Address Autoconfiguration (SLAAC)

  • Stateful DHCPv6

  • Stateless DHCPv6

  • Manual configuration

These methods rely on a set of ICMPv6 messages that will be covered in the next section.

2. Neighbor Discovery Protocol

Neighbor Discovery Protocol (NDP) is an umbrella term for a series of link local functions performed by five different ICMP messages. These messages do a number of things such as:

  • Automatically assign addresses to interfaces without using DHCP.

  • Verify that assigned addresses are unique on the link.

  • Map between IP and MAC addresses.

  • Inform that a DHCP server should be used instead of autoconfiguration.

The five messages performing these functions are:

  • Router Advertisement (RA)

  • Router Solicitation (RS)

  • Neighbor Advertisement (NA)

  • Neighbor Solicitation (NS)

  • Redirect

The RA and RS are the most relevant ones to the discussion about DHCPv6 so for the sake of brevity, there will be no further mention of the NA and NS, or redirects.

3. Router Advertisement and Router Solicitation

These two messages are sent between routers and hosts. The RS is the simpler of the two and is sent by a host when one of its interfaces wakes up and it wants to acquire an IP address (and other information contained in the RA). The source address of the RS is typically a link local address belonging to the host and the destination is the “all routers” multicast address FF02::2. The RS also contains the sender’s MAC address.

The RA is sent by routers to the all nodes multicast address FF02::1, either when triggered by receiving an RS, or periodically based on a configurable timer. The RA’s purpose is to provide a host with information such as what global unicast prefix is used on the link, what default gateway to use and whether SLAAC or DHCPv6 should be used by the host. The RA also contains information about how long an autoconfigured address is valid.

4. The Router Advertisement “M” and “O” flags 

The RA contains two 1 bit fields called M (managed) and O (other) that are used to inform a host of what methods could be used for address assignment. The default in Cisco IOS is that both of these bits are zero which means that a host shouldn’t assume that a DHCPv6 server is available. If the M flag is 1 however, addressing information could be received from a DHCP server, and the host could initiate that process. The O flag  can be used when the M flag is zero to give a host that uses autoconfiguration additional (“other”) information such as DNS servers via DHCP. In other words, we have two different styles of DHCPv6:

  • Stateful DHCPv6 – M flag is 1 -> use DHCP for all information.

  • Stateless DHCPv6 – M flag is 0, but O flag is 1 -> use SLAAC for addresses, but get additional information from DHCP.

A book I’m using as a source for this post, IPv6 Fundamentals, doesn’t mention what purpose the O flag serves when the M flag is 1. The book implies that if the M flag is 1, it doesn’t matter what the O flag is because the host could get all of its information from DHCP anyway due to M flag = 1.

An IETF informational dokument found here: discusses this problem of M and O flag behaviour. It points out that M and O are “advisory” and that there is some inconsistency in how various operating systems react to these flags.

5. The DHCP Exchange

Let’s now move on and assume that the host has agreed to use DHCP. Just like with DHCP for IPv4, a series of messages are sent back and forth between the client and the server with the goal being to agree on what address the client should use. There are 13 messages in total, but only four of these are involved in the initial exchange. UDP port 546 (clients) and 547 (server/relay) are used. This process is used for stateful DHCPv6:

  1. Solicit. This message is sent by the clients to the “all DHCP servers and relays on the link” address at FF02::1:2 in order to find a DHCPv6 server.

  2. Advertise. The server has received the solicit message and is now making itself known to the client.

  3. Request. The client is requesting an address from one of the servers that has sent it an advertise message. If multiple advertisements are received, one of them is picked according to a formula found in some RFC somewhere.

  4. Reply. The server responds to the request with a reply that assigns an address and other information.

If stateless DHCPv6 is used, the request message is replaced with one called information-request.

In addition to these four main messages, there are a whole slew of messages involved in the maintenance of assigned addresses, like renew and release messages.

It’s important to note that this exchange does not include a default gateway. The three big addresses that hosts typically need to become operational are the default gateway, a globally routed adress and a DNS server. DHCPv6 cannot supply the default gateway and the host always gets it from the RA. DNS server can be acquired from the DHCP server or through the RA. This RA method is not implemented everywhere.

6. Disabling SLAAC

If you are used to DHCP in IPv4, you would probably assume that if the client gets an adress from the DHCPv6 server, it no longer uses SLAAC. That is not the case, and if you simply enable the M flag, your client will get one adress from DHCP and one from SLAAC (Windows 8 solicits for DHCPv6 information even if the M-flag isn’t set). Whether the SLAAC derived adress or the DHCPv6 adress is used as the source adress for outgoing packets depends on a complicated algorithm. Note: it’s possible that an OS simply ignores addressing from DHCPv6 if it is able to configure an address via SLAAC – these flags are advisory.

Having multiple addresses in this fashion is generally not what people want, and you can disable SLAAC in one of two ways. The first method is to remove the prefix from the RA’s prefix information field using the ”no advertise” configuration parameter for the RA on the router. The second option involves the so called A flag and will be covered in the next section.

7. IPv6 On-Link Flag

Let’s say that the SLAAC is prevented by removing the prefix from the RA. The assumption one can make after having removed the SLAAC based address is that everything looks good. We have one address and this address has been acquired from a DHCPv6 server, assuming of course that all clients in your network actually supports stateful DHCPv6 and are properly configured to use it. Notably, Android does not support DHCPv6.

An IPv4 host decides if another IP adress exists on the same link using the configured subnet mask and this determines if the packet can be delivered directly, or if it must be sent off-link using a gateway. In IPv6 this same type of behaviour only applies if the so called on-link flag is set. If an IPv6 adress is acquired through SLAAC, the on link flag will be set for that prefix, and Neighbor Discovery will be performed when sending packets to other addresses that falls within the prefix.

However, when DHCPv6 is used, the on-link flag will not be set. This is a feature of the protocol, and if no further steps are taken after removing the SLAAC based address by removing the prefix from the RA, clients will no longer perform ND to acquire the MAC addresses of other hosts on the link. Instead, all packets will be delivered to the default gateway. If the default gateway determines that the destination exists on the same link, it will send the packet out the same interface and send a redirect message to the host. If the host operating system behaves correctly, it will use this redirect message to update what MAC address it uses when it sends packets to the IP address in question.

Unintuitively, we’ve fundamentally changed host behaviour by changing how it acquires its address. According to Enno Rey in his talk, DHCPv6 behaves this way with the on-link flag because the protocol was designed for a service provider environment where hosts weren’t supposed to talk directly to one another like you would expect them to do in a typical Enterprise environment. If the host operating system works correctly, this isn’t necessarily a problem because the redirect should take care of it. You may not want to make that leap of faith.

It is not possible to set the on-link flag in DHCPv6 but there’s a feature on the routers that you can use to trick hosts into thinking that it is set. Router Advertisements have a flag called the A flag (autonomous address-configuration flag) and if this is set, the prefix in the RA can be used for SLAAC. Clearing this flag will tell hosts to not acquire an address through SLAAC, solving the duplicate address issue with talked about the in the previous section. It will also make the host think that the on-link flag is set even though the address it’s using is really acquire from DHCPv6.

8. DUID and IAID

In DHCP for IPv4, the MAC address is used as an identifier. This lets you do things like always give a specific address to a specific NIC. This is not the case in DHCPv6 and instead a DHCP Unique Identifier (DUID) is used. DUID is generated by the host OS and then inserted into DHCP messages to identify the client. Generation of the DUID is typically based on the MAC adress of a NIC plus some kind of time variable. The idea is that this is generated once and then stored in for example the registry in Windows. Even if the NIC is changed, the same identifier would be presented to the DHCPv6 server. Other methods to generate this identifier can also be used. The IAID is an additional identifier that is generated for each interface on the client.

9. Rapid Commit

Instead of using the normal four message exchange it’s possible to reduce it to just solicit and reply. This functionality needs to be supported by both the client and the server and is initiated by the client setting the rapid commit option in the solicit message. Upon receiving this rapid commit solicitation, the server immediately sends a reply (also with the rapid commit option set) with an address. One problem with this brought up in IPv6 Fundamentals is that by removing the request part of the exchange, several servers might respond with rapid commit replies with addresses as a response to the solicit. The servers cannot know which of these addresses are actually picked by the client and must potentially maintain state for addresses that aren’t used.

8. Relay Agent

By default, DHCPv6 is constrained to the local link due to the solicit messages being sent to a multicast address with link local scope (FF02::1:2). This is similar to how DHCP in IPv4 cannot get off the local network due to the reliance on broadcasting. In an environment with a centralized DHCP service, we must configure a DHCPv6 relay agent on each host facing router interface.

The feature operates pretty much exactly like you would expect. The router receives DHCP messages from clients and because it has the relay agent configured, the router knows that the DHCP messages must be sent off link. The messages are encapsulated in a another message called RELAY-FORWARD and are either unicast to a particular DHCP server, or multicast to FF05::1:3 (a site local address used by DHCPv6 servers). This is manually configured.

The server receives the encapsulated messages and uses that information to encapsulate its replies in such a way that they get back to the correct relay agent. The relay agent then decapsulates and sends the messages back to the client.

CCNP Notes – Link Aggregation/EtherChannel

Notes that I wrote when studying for CCNP:RS a year ago. I posted them somewhere else and transferring them here for safekeeping. Continue reading

CCNP Notes – VLAN and VTP

These are some notes that I wrote when studying for the CCNP Switch exam a year ago. I’m simply transferring them here for safe keeping. Continue reading