Services Affected
Internet
Wide Orbit
SBS
BOC Services
All other services
Sites Affected
Santa Barbara
Santa Maria
Time Affected
11:45AM - 12:45PM ( 60 Minutes )
Resolution (skip to the next section for layman explanation on switching loops)
Root bridge for vlan 802 changed on the Santa Barbara switch which apparently triggered a loop from Santa Barbara to Santa Maria on other connected vlans. This loop caused high CPU on all switches at both locations causing all services to go down. Once we were able to log into the switch at Santa Barbara, we disabled the port between locations which broke the loop. We added protection on the ports connecting the two locations together to keep this event from happening again. We enabled the port and the issue was cleared up at that point.
Brief explanation of switching loops and why they are bad:
Switches love traffic. They love to take traffic in, and send it back out. They do it fast and reasonably well. Primarily, our switches send/receive our video traffic. They hate (or more correctly, are unable to handle) loops. Loops can be physical or virtual, there is no difference in how they behave. Virtual loops are more advanced and difficult to troubleshoot. In our case of this outage, the loop was virtual.
Normally, switches are installed in a topology like below:
The picture on the left shows the connections and the picture on the right shows how traffic flows. Green arrows means good traffic flow.
Here's a picture of a loop (note the connection between B and C:
Here, you can see that traffic starting out on switch B, goes around to Switch C, but continues on (the transition from Green to Red). And the traffic never stops, it continues around the loop consuming more and more resources until there is none left and the switches can do nothing but process this merry-go-round of horror.
Comments
Article is closed for comments.