So who saw some of the Internet drop off the face of the planet yesterday morning???
Many ISP's around the world were affected.
Unfortunately we did as one of our smaller transit providers unfortunately hit the magic 524,288 BGP routes in its edge router BGP table and then started to delete certain new ones as the memory allocated for them filled up as they were using Cisco 7600 series routers (for many years the workhorse of many ISP BGP edge routers) which can't take more than 512k routes unless IPv6 is disabled on them :-/
So they got turned off.
A brief description of what happened is below from our Technical Director Matthew Hattersley with a link giving a full explanation further below.
Internet Touches Half Million Routes: Outages Possible Next Week - Renesys
Essentially the issue here is the amount of memory that routers allocate to storing routes. Certain models of routers and line cards have a hard limit of 512k (Some even lower, but they are mostly gone nowadays). Others have a soft limit.
A good example is the Cisco 7600 (With a RSP720-3CXL). This router for many years has been the workhorse of many ISP BGP edges. It hardy and reliable but as its been so reliable, a lot of operators have taken it for granted. By default the RSP720-3CXL allocates enough memory for 512k IPv4 routes. Leaving the leftover for IPv6 and other rarer kinds of routes (Multicast etc). Yesterday this limit was hit for a short period. At which point these routers essentially ran out of memory to store routes and starting dropping them. This meant different things for different operators as the routes dropped was largely random (Imagine loosing the default route to propagate around your network. Essentially the loss of one route looses everything).
As regards the impact to Talk Straight, directly there was no effect. Our core is based upon the Cisco ASR9000range of routers, capable of around 4 million routes. However like everyone else we saw issues getting to certain upstream peers. For us this meant our customers couldn't get to Office 365 and RackSpace. Two major online applications/networks that our client base rely on.
This was traced to a 3rd party 7600 series router. In order to fix this issue the upstream provider (Like many others) reallocated their memory to give more space to IPv4 routes (sacrificing IPv6 in the process).
What does this mean going forwards?
Well for Talk Straight / Schools Broadband, not a lot. We'll probably remove that upstream peer from our routing mix to avoid issues in the future. For the internet at large the people running these older platforms have two options. Replace their equipment or sacrifice IPv6 routing. We'll probably see a mix over the next few months, but eventually they will have to remove this kit as IPv6 isn't going anywhere."
If any of you have any questions don't hesitate to get in touch.
This actually goes a long way to explaining why some quite large sites were down yesterday (lots of eBay EU sites for example).
Sounds like the issue from yesteryear when Juniper released an update that caused a core dump failure.... could be a similar thing again.
There are currently 1 users browsing this thread. (0 members and 1 guests)