Fundimental of rapid spanning tree

Basic concepts of rapid spanning tree

1, BPDU is the packet that used for communication between switches
2, Root bridge is the switch which has lowest value of priority, mac address
3, Switch port can be root port, designated port, alternative port and backup port; in Switch ports can be categorized also as edge-port(port-fast in classic spanning tree) and point-to-point port.
4, ports statues: discarding, learning, forwarding
5, hello interval is by default 2 secs, max age is 3 * hello interval, thus 6 sec by default
6, when TCN received,switch will immediately timeout its CAM (mac address table), and put all of its none-edge ports in sync mode.
7, During sync, each port will negotiate its new statue with the peer, switch who has the superior BPDU will put port into designated port.

8, Every bridge generate and send BPDU every hello time interval, but only designated port will send out BPDU, not alternative and backup port.
9, Not like classic stp,in rstp bridge will not relay BDPU from root bridge, instead, it will generate its own BPDU with its own BID and Root ID from root bridge. Thus all bridge will know RootID.

10, The sync process is used to determine if  I m agree with you about the root bridge. sync process start with sending/receiving proposal BPDU and end by receiving/sending agreement BPDU

Topology change in rapid spanning tree

Scenario 1:Topology change on edge port (up or down)
It will not generate topology change in the network

Scenario 2: Topology chanage on NONROOT switch, linkup on designated port in switch A
1, At first both switches’s ports are in blocking mode until they received each other’s BPDU. If switch A received a inferior bpdu, it will send out BPDU proposal, after got BPDU agreement from peer side, switch A will put port into forwarding state.
2, If switch A received a inferior bpdu, it will send out BPDU proposal, after got BPDU agreement from peer side, switch A will put port into forwarding state.
3, If switch A received a superior BPDU, it will realized that peer switch should be root bride, A will start RPST convergence:
a) Peer switch will send BPDU proposal for negotiation, switch A will block all its none-edge ports, put them into sync mode, age-out CAM (MAC address table) immediately, then reply peer switch BPDU agreement. After that, switch A and peer are forwarding the traffic;
b)All of the none-edge ports in switch A will send out proposal BPDU towards their peer side, and the peer side switch will repeat the step a) for port negotiation.

Scenario3: Topology change on NONROOT switch, linkdown on designated port in switch
1, Swich will stop receiving BPDU from peer side, after 3 x hello interval, switch will think link is down;
2, If the BPDU the switch stopped receiving is inferior BPDU, switch has no need to do anything;
3, If the BPDU the switch stopped receiving is superior BPDU, switch will take that root bridge is lost, if switch has a backup port, that port will immediately be forwarding, it switch has not a backup port, it will start RSTP convergence process. It starts as a wave of handshakes rippling outwards towards the periphery of the network.

Advertisements

Fundimental of spanning tree

Basic concepts

Basic concepts of spanning tree:
1, BPDU is the packet that used to communicated between switches
2, Root bridge is the switch which has lowest number of priority, mac address
3, Switch port can be root port, designated port, when a port is not a designated or root port it will be in blocking mode
4, ports statues: blocking, listening, learning, forwarding
5, hello interval is by default 2 secs, max age is 10 * hello interval, thus 20 sec by default,listening period: 15 sec; learning period: 15 sec
6, when tcn (topology change notification) happened,a blocking port will take 30 secs to 50 secs to turn to forwarding state depending on the topology change scenario

It is not always that a topology change will cause stp recalculation(root bridge re-selections), but all bridge who received tcn packet will age-out its CAM(mac address table) in 15 secs, in the meanwhile, blocked ports on the bridge will take 30 sec to 50 secs to go to forwarding state ( but not all blocked ports can necessarily go to forwarding state, it is possible that some blocked port will stay blocked even after topology change).

Spanning tree convergence

1, each switch declare self as root bridge by sending its own hello BPDU, BPDU will include bridge ip, priorty
2, Once switch received superior BPDU from peer, it will stop sending its own BPDU, instead it will relay this superior BPDU (with lower valude of priority.mac) by adding cost of interface.
interface cost 100M:19 10M:100
3, After root bridge is selected, root bridge will generate BPDU packet every 2 sec by default, other switches will relay this packet by adding cost.
4, The port from which BPDU is received will be selected as root port. If there are more than one ports receiving BPDU packets, the port that has the lowest cost (shortest path) will be selected as root port, the the other port will be blocked (alternative port)

Topology change in spanning tree

Topology change will in most cases not cause stp algorithm recalculation, only when root bridge is lost stp recalculation is triggered.

Scenario 1:Topology change on port-fast port (up or down)
Switch will not send out TCN (topology change notification)

Scenario 2: Topology chanage on NONROOT switch, linkdown on designate port in switch A
1, switch A will generate TCN bpdu packet, the send TSN through its root port
2, NONROOT switches who received TCN will send TCN up via its root port, and send TCA(acknowledge) back to the orignal port; at the same time, these switches will set cam timeout to 15sec (learning period)
3, Finally Root bridge will received TCN packet, it will generate topology change BPDU, and flood to the rest of the switches who has not got TCN packets yet.
4, All switches who received topology change packet will reset its CAM timeout to 15 sec (remove mac address from the table after 15 secs)
5, MAC address will be relearned immediately in most cases.

Scenario 3: Topology chanage on NONROOT switch,linkdown on root port in switch B
1, Switch B will delare it is root bridge by sending hello packets out to the rest of the ports.
2, The rest of the switches that is connecting to B but no other link towards root bridge will received BPDU from switch B, but no more BPDU from root bridge will be relayed to them. After MAX age timeout (10 * hello packet interval) 20 secs by default, these switches will acknowledge that root bridge is losted, they will restart spanning tree convergence. It will take max age (secs) + listening (15 secs) + learning (15 secs) for new convergence is in place.

spanning tree features

BPDU guard: Switch will set interface to err state when switch received BPDU from that interface

BPDU filter: Switch will drop the BPDU from the interface where BPDU filter in enabled, but will not put interface into err state

Root guard: Switch will put interface to err state when switch received BPDU from that interface, which is superior than the current root bridge.

portfast: Switch will not send TCS message when the interface with port-fast enabled has change from up to down or from down to up.

UPlinkfast & Backbone fast, will be described in separated page

loop guard will be described in separated page

Fundimental of multicast

In order to make multicast work we need to study the following 3 areas:

1, multicast address (ip and mac)
2, IGMP protocal (to make host join or leave a multicast group)
3, IP mulicast routing protocal (PIM is mostly used)

1, Multicast address

IP multicast address 224.0.0.0 -239.255.255.255 are all multicast address, especially
224.0.0.1 The All Hosts multicast group addresses all hosts on the same network segment.
224.0.0.2 The All Routers multicast group addresses all routers on the same network segment.
224.0.0.4 This address is used in the Distance Vector Multicast Routing Protocol (DVMRP) to address multicast routers.
224.0.0.5 The Open Shortest Path First (OSPF) All OSPF Routers address is used to send Hello packets to all OSPF routers on a network segment.
224.0.0.6 The OSPF All Designated Routers “”(DR)”” address is used to send OSPF routing information to designated routers on a network segment.
224.0.0.9 The Routing Information Protocol (RIP) version 2 group address is used to send routing information to all RIP2-aware routers on a network segment.
224.0.0.10 The Enhanced Interior Gateway Routing Protocol (EIGRP) group address is used to send routing information to all EIGRP routers on a network segment.
224.0.0.13 Protocol Independent Multicast (PIM) Version 2
224.0.0.18 Virtual Router Redundancy Protocol (VRRP)
224.0.0.19–21 IS-IS over IP
224.0.0.22 Internet Group Management Protocol (IGMP) version 3[2]
224.0.0.102 Hot Standby Router Protocol version 2 (HSRPv2) / Gateway Load Balancing Protocol (GLBP)

MAC multicast address range of 01-00-5E-00-00-00 to 01-00-5E-7F-FF-FF
The limitation of number of MAC multicast address require that MAC to IP map for multicast is 32/2^5 to 1 maping. especially, the last 23 bits of IP or MAC address will be match.

2, IGMP

IGMP will be used to make host able to join and leave multicast group
IGMP v1: to types of message : membership query and membership report
especially, router will send out membership query towards multicast ip 224.0.0.1. And host will send membership report towards the mulicast ip that host want to join.
Once router receives membership report from host via interface x, router will add interface x into the mroute table. Mroute table item can be timeout after, for example, 3 mins, if host does not want to join multicast group any longer host will stop sending membership report.The router will keep forwarding multicast traffic to the hosts until the timer expires.

IGMP v2:
Host is able to send membership leave message instead of letting multicast timeout in the router. Memembership leave message will be sent to general multicast ip 224.0.0.2
When hostx send membership leave message to router, router will issue a specific membership query message towards the interface where hostx sent leave message. This specific memebership query message will be sent towards the specific multicast address instead of generic 224.0.0.1 address.
when there is multiple routers in one network segment, there will be one router with lowest ip selected as active router to send membership query message

IGMP snooping:
This feature will be used in switches, so switches will snoop IGMP traffic between hosts and routers, maintain a mulicast table (CAM table), each item in the table will contain multicast dest MAC adress and interfaces that should get traffic wich that multicast dest MAC.
On the switch, all multicast traffic that is IGMP traffic will be sent to core of switches to process, realtime mulicast traffic (not IGMP traffic) will not be forwarded to switch core, this will reduce the burden of switches core

In the scenario where there is no Router, in order to get multicast work, we can
1, configure one of the svi interface on the switch to send member query message
2, configure svi interface with pim sparse mode
3, use broadcast instead of multicast by disable igmp snoopint
4, configure static mrouter port
5, configure Static multicast entry

3, IP multicast routing protocol

There are 2 mode of multicast dense mode and sparse mode
dense mode:
mulitcast traffic will send to all of the interfaces of the router except for the interface where the traffic come in. If the other routers do not want the traffic (not get membership report message),routers need send ‘no this mutlicast traffic’ back to the orignal router

There are a number of dense mode routing protocols:
DVMRP (Distance Vector Multicast Routing Protocol)
MOSPF (Multicast OSPF)
PIM Dense Mode ( This is mostly used)

PIM Dense mode will do RPF check (reverse path forwarding). router will check the source ip of the multicast traffic, if the source ip A is supposed to come in from portX on the router, the incoming traffic that comes from portY with source ip A will be dropped. The source path check is using unicast routing table, in the case that the reverse path for the source is different with that in the unicast routing table, a static mroute can be configured. RPF check will check static mroute table first, then go to unicast routing table.

PIM sparse mode:
Dense mode floods multicast traffic until a router asks you to stop.
Sparse mode sends multicast traffic only when a router requests
In sparse mode, there will be a RP (Rendezvous Point), router who receives IGMP membership report message will send PIM join message to RP. Routers will find RP by static configuration or by dynamic learning. If no RP configured in the router which only enabled PIM spars mode, the multicast will not function.

For those routers who are not interested in multicast traffic and filter out all multicast traffic on the router.

There are 2 protocols that can be used for dynamic RP configuration:Auto RP and BSR(Bootstrap router). Auto RP is Cisco proprietary protocol and BSR is standard procotol.

When there are more than one RP configured for a group, MSDP (multicast Discovery protocol) will be used to setup tcp session between those RPs, so that they can share the joined member information with each other.

4, MLD and MLD snooping

According to wikipedia “Multicast Listener Discovery (MLD) is a component of the Internet Protocol Version 6 (IPv6) suite. MLD is used by IPv6 routers for discovering multicast listeners on a directly attached link, much like Internet Group Management Protocol (IGMP) is used in IPv4. The protocol is embedded in ICMPv6 instead of using a separate protocol. MLDv1 is similar to IGMPv2 and MLDv2 similar to IGMPv3. The protocol is described in RFC 3810 which has been updated by RFC 4604.”

MLD snooping is for IPv6 multicast, like IGMP snooping for IPv4 multicast traffic.

OSFP details that is esay to ignore:

OSFP details that is esay to ignore:
1, Passive interface will not send hello packet, therefore will not form adjacency to any other router. But subnet that attached to this interface will be advertized inside OSPF
2, SPF calcuateion within the area will be triggered only when there is changes of type1 and type2 LSA in the update. Type1 and type2 LSA will only be exchanged within the area.
3, ABR is the router that at least has one interface connecting to backbone area, the the other one to the other area. ABS will generate 2 self type1 LSA, one is for Backbone type1 LSA, the the other is type1 LSA for the other area which ABR connects to.
4, OSPF uses a reference bandwidth of 100 Mbps for cost calculation. The formula to calculate the cost is reference bandwidth divided by interface bandwidth. For example, in the case of Ethernet, it is 100 Mbps / 10 Mbps = 10. Note: If ip ospf cost cost is used on the interface, it overrides this formulated cost. take reference bandwidth of 100mbps, any interface that greater or equal to 100mbps will have the same cost 1 becuase no fraction is allowed.So the OC-3, FastEthernet and GigabitEthernet will all have an OSPF cost of “1” given the default reference bandwidth.
5, With ‘default-information orignate’ command generate ASBR a default route into ospf area but only if there is a default 0.0.0.0 route exited already in its routing table
6, In Cisco configuraiton, use “area x range 10.10.0.0 255.255.252.0” for net summary in ABR, and “summary-address …”in ASBR for external route summary. OSPF DOES NOT perform auto-summarizatio.

OSPF process in short

1, form adjacency

init: each routers will send out hello packets to 224.0.0.5 ( if network type is broadcast) or unique ip address (if network type is none broadcast);

2-way: router A has received hello packet from B router,  the the hello packet includes its A’s id

extat: select DR/BDR if need, point to point network or point to multipoints networks do not select DR/BDR

exchange: exchanges the LSA

full: adjacency formed

When DR/BDR is presented in the network, all other routers can only form adjacency with DR and BDR, the rest of the routers can not form adjacency with each other, they will stay in 2 way state with each other.  When DR/BDR is not presented in the network ( point to point, or point to mulit points networks), adjacency is formed for each link.

Below is the table describing the process:

ospf adjencency process

2, Each router now has all information, LSA database, it will run dijkstra algorithm to calculate the shortest path towards each network work. That means each router will maintain the whole routing table calculated according to the LSA database. So it is critical for OSPF to guarantee that each router has the same LSA database.

3, OSPF routes will be selected to routing table according to administrative distance value  (110)in Cisco and default preference  (10)according to Juniper.

OSPF network types & OSPF network area types

OSPF network types

1, point to point

2, broadcast

3, one to multi points broadcast

4, one to multi points no broadcast

5, no broadcast

Especially, point to point and one to multi points* do not need select DR and BDR, because, one to multi points (both broadcast and no broadcast) will work in the way like several point-to-point links, there is no need to select DR and BDR. Especially, router with the highest priority will become DR and the router with the second highest priority will become BDR. In Juniper router has default priority of 128, in Cisco router has default priority of 1.

On the other side, both type 2 and type 5 need select DR and BDR in order to reduce the numbers of ospf packets (hello, lsa, etc) in the network.

Normally ospf build up adjacency by broadcasting hello packets to 224.0.0.5, but in some noethernet network (mostly in frame relay network), broadcasting is not applicable, hello packets are sent out with unique ip address of the peer side.

Network area types

1, Backbone area ( area 0 ) , can receive all LSA types information

2, Standard area, can receive type 3 info from ABR (Area border router)

3, Stub area, can receive normaly LSA type 3 info and a default route as subsitute for all external routes

4, Totally stub area, can receive only one LSA type 3 as default router towards outside of the area

5, Not so stub area, can work as stub area or totally stub area, BUT, can send external routes from  ASBR as type 5 LSA to the other area.

LSA types:

type 1:  router

type2: net route, generated and sent only by DR

type3, net summary route, generated and sent only by ABR in order to reduce the numbers of routes sent to the other area.

type4: serve to advertise the presence of an autonomous system boundary router (ASBR).

type 5:external route that send to other network area from NSSB(not so stub  area)

type 7: external route

Especially, type 1 and type 2 does not cross area border.

ASA drop packets unexpectively

We have the following scenario for connection:

A ——– outside inte–ASA–inside inte———B

A has TCP conntion with B, but connection was interrupted sometime during the communicaiton. I did packet capture on both inside and outside interfaces of ASA in order to find out what was going on during this communcation. And I found that some packets on inside interface of ASA has been dropped:
those packets showed up in inside interfaces, but did not present in outside interfaces, instead, ASA reply to B on behave of A. That leads to the issue that A keep sending
retransmission packets but got no reply, when timeout A send Fin packet to close the connection, on the other side B was communicating all the time until got Fin packets from A, in response B send back ACK and FIN packets too, still, this AC &FIN packets was caught by ASA and dropped:

A ———————–ASA———————–B
—->packet1————|——-packet1———>
<—–packetBtoA——–|—–packetBtoA<——–
……..
—–>pktAtoB n———|—–pktAtoB n———–>
——–no traffic——- |<—-pktBtoA n+1———
——–no traffi——– |—->pktAtoB n repeat—>
—–>pketAtoB retrans—|—->pktAto B retrans—>
——-no traffic———|<—-pktBtoA n+1———
——-no traffic———|—->pktAtoB n repeat—-
……..
after 5 retransmission or timeout
—–>FIN—————|——->FIN————->
—–no traffic———–|<——ACK—————
—–no traffic———–|<——FIN—————-

A closed the connection because got no reply from B, B close the connections too after receiving FIN(supposelly after timeout for half-closing tcp connection)
While ASA still keep this connection in the connection table until idle timeout.

In order to find out the reason why ASA dropped the packet, we may use capture with the following command:
ASA>capture drop type asp-drop all

asp-drop Capture packets dropped with a particular reason

This will capture all the dropped packets by ASA, at most cases if there is a drop-reason “tcp-paws-fail” as example, ASA will print the drop-reason for one packet, other packets that match this connection and dropped for the same reason will be in the outputs with no drop reason until another drop reason appear.

In our case, we have hit the ASA bug ‘ASA drops packet as PAWS failure’, and after consulting Cisco engineer, we got the info that”to know if your version is affected or not, you need to look at the known fixed releases. So, since version 9.1.(7.12) is the first version in the train 9.1.7 that fixed this bug, this mean all other versions before 9.1.7.12 in the same train 9.1.7 are affected with this bug.”