+ Post New Thread
Results 1 to 11 of 11
London Grid for Learning (LGfL) Thread, Our Synetrix Connection: Down for 8 Hours So Far in Regional Broadband Consortiums (RBC); Hi, Is anyone else on Synetrix missing all connectivity today? I know they were doing upgrade work in Camden today, ...
  1. #1

    Join Date
    Mar 2010
    Posts
    7
    Thank Post
    0
    Thanked 0 Times in 0 Posts
    Rep Power
    0

    Our Synetrix Connection: Down for 8 Hours So Far

    Hi,

    Is anyone else on Synetrix missing all connectivity today? I know they were doing upgrade work in Camden today, but we've been down since 8:40 a.m. with nothing, and no improvement yet.


    Jim H

  2. #2


    Join Date
    Feb 2007
    Location
    Northamptonshire
    Posts
    4,690
    Thank Post
    352
    Thanked 796 Times in 715 Posts
    Rep Power
    347
    No experience with LGfL but under EMBC I was able to log support calls on Saturdays and on call support staff would get back to me.

    Mind you, I had TCR access for a while that told me more about this kind of scheduled stuff.

    Sorry to hear of the troubles nonetheless, what was the expected downtime?

  3. #3

    Join Date
    Mar 2010
    Posts
    7
    Thank Post
    0
    Thanked 0 Times in 0 Posts
    Rep Power
    0
    Quote Originally Posted by kmount View Post
    No experience with LGfL but under EMBC I was able to log support calls on Saturdays and on call support staff would get back to me.

    Mind you, I had TCR access for a while that told me more about this kind of scheduled stuff.

    Sorry to hear of the troubles nonetheless, what was the expected downtime?
    The expected downtime for users was no more than two hours, according to the announcement on their site. The work was scheduled to take place from 8 a.m. to 4 p.m.

    I would be relieved if others were down, because otherwise it feels like our school has been forgotten about.


    Jim H

  4. #4

    Join Date
    Mar 2010
    Posts
    7
    Thank Post
    0
    Thanked 0 Times in 0 Posts
    Rep Power
    0
    I am happy to report that we came back up around 10 minutes ago, and services returning to normal.

    I am very, very relieved...

  5. #5
    nicholab's Avatar
    Join Date
    Nov 2006
    Location
    Birmingham
    Posts
    1,486
    Thank Post
    4
    Thanked 97 Times in 93 Posts
    Blog Entries
    1
    Rep Power
    50
    Why are they doing work in the middle of the day?!!!!!!
    Where are the details of the down time?

  6. #6
    gonzodad's Avatar
    Join Date
    Mar 2010
    Location
    Surrey
    Posts
    40
    Thank Post
    1
    Thanked 10 Times in 8 Posts
    Rep Power
    11
    hi nicolab - this post was from last month, when the 13th was a Saturday, it was planned maintenance that overran - for more details go from post #19 onwards in this thread Changing ISP from Synetrix.

    cheers
    gonz

  7. #7
    Face-Man's Avatar
    Join Date
    Dec 2005
    Location
    London
    Posts
    577
    Thank Post
    11
    Thanked 58 Times in 40 Posts
    Rep Power
    70
    LGFL have distributed this explanation.

    I currently can't add attachments so will copy and paste below. but the best quote is

    Unfortunately, due to the reliance on internet access to send SMS alerts and access to the email filtering platform to send email updates, both of which were affected by the issue, these updates did not go out to subscribed customers in a timely manner.
    Last edited by Face-Man; 15th April 2010 at 10:01 AM.

  8. #8
    Face-Man's Avatar
    Join Date
    Dec 2005
    Location
    London
    Posts
    577
    Thank Post
    11
    Thanked 58 Times in 40 Posts
    Rep Power
    70
    Here is the document

    Major Incident Report

    Internet Access Issues on 9thth April 2010

    Version Date Changes / Comments
    1.0 13/4/10 First version issued to LGfL

    Management Summary

    Synetrix apologise for the inconvenience to LGfL members for the Internet access issues on 9th April 2010 and the high impact this had on users within London.
    The incident was caused by a combination of issues, starting with what we believe to be a large scale Denial of Service attack. These issues are described later in this report.

    Issues that affect service are always disappointing. Whilst we pride ourselves on the quality of our services, we accept that problems will occasionally occur and we strive to learn from them and continually improve our performance; hence the purpose of this report is to:

    1. Explain the issue and the cause
    2. Review the management of the issue.
    3. Review lessons learned and describe any corrective actions that have been or will be put in place.


    Synetrix welcomes your feedback, as this provides the opportunity to further improve and provide the highest possible standard of service to all of our customers and users.

    Issue summary and resolution
    There were two distinct parts to the issue:

    Outage part 1 – firewall only: 10:30 – 13:30
    The impact of this issue was limited to inbound access to on-site services from the Internet and outbound Internet access from sites not using the Netsweeper filtering system.

    The cause of this was high firewall CPU due to unidentified external traffic hitting the firewall. The amount of traffic, and its impact on the firewall CPU meant that the firewall throughput was reduced to the point of it being effectively “down”. It is believed that this traffic was either a large scale Denial of Service attack, or the effects of a virus outbreak.

    Outage part 2 – total Internet outage: 13:30 – 14:50
    All internet users were impacted by this issue, with both inbound and outbound traffic affected.

    This was caused by a combination of events: -
    - While investigating the firewall issue and trying to isolate the traffic, the link on the Earls Court Juniper MX960 connecting to the untrusted interface of the firewall was disabled and later re-enabled
    - Due to an issue with the Juniper MX960, the change to the configuration to re-enable the interface was not correctly synchronised between the master and backup routing engines. This out-of-sync state caused the interface to show as activated in the configuration but remained operationally down
    - The BGP routing configuration on the MX960 was configured with the untrusted interface of the firewall as the next hop. Therefore, when the firewall became unavailable due to the interface being down, advertisement of the routes to external providers ceased, causing the total outage.

    Resolution – 14:50
    The Juniper MX-960 interface was disabled and re-enabled, and the BGP routing process restarted. This resolved the issue with the BGP routes and traffic started flowing.
    At this point the attack on the firewall had ceased and there were no further issues.



    Brief timeline of events

    10:30 – Synetrix monitoring systems picked up problems with the firewall. Investigations showed high CPU usage on the firewall.
    10:40 – A small number of calls were received into the Service Desk regarding Internet access from sites not using URL filtering (sites using Netsweeper were OK).
    11:00 – Investigations discovered that the high CPU was related to external traffic, but this proved difficult to isolate. Various troubleshooting measures started, culminating in temporarily isolating the firewall by disabling the interface on the Juniper MX960 router at approximately 13:25
    13:30 – Complete loss of internet access, including sites using Netsweeper
    13:50 – Initial investigations pointed to external peering with upstream providers. The theory was that upstream providers were blocking our routes
    14:00 – Discussions were started with all 3 upstream Internet providers
    14:20 - 14:30 – Discussions with upstream providers concluded that our routes were not being correctly advertised to them. Investigations refocused to Juniper MX960 routing configuration.
    14:50 – Following resetting an interface on the Juniper MX960 and restarting the BGP routing peering, service was resumed





    Issue management and communication
    Throughout the issue, a lead ticket was raised and updated with progress, and Talk2Synetrix was updated.

    Unfortunately, due to the reliance on internet access to send SMS alerts and access to the email filtering platform to send email updates, both of which were affected by the issue, these updates did not go out to subscribed customers in a timely manner.


    Lessons learned and corrective actions
    Juniper MX 960 configuration issue
    Reviewing the Juniper knowledgebase, there is an issue that under certain circumstances, where a change is being made before the previous change has completed synchronisation to the backup routing engine, the router can become out of sync.

    Now this issue is known and understood, it has been communicated to the engineers that work on these routers and will be added to the Synetrix knowledge base for future reference.

    Route issue
    The BGP routing configuration has been changed so that the non-availability of the firewall interface will no longer affect the advertisement of Internet routes for traffic flowing through Netsweeper, or traffic for customers not using the core firewalls at all.

    Communications systems
    It is clear that the automated SMS and email alerts failed to perform their function, due to their reliance on parts of the network that were affected by the failure. We have started a review process to ascertain where these dependencies lie and how they can be engineered out of the communication solution. This is likely to take several weeks and in the meantime this awareness means that manual SMS updates will now be sent to customers who have registered for them on Talk2Synetrix, should such an event recur.

    Firewall capacity
    The core firewalls are to be upgraded within the next 2 months, adding significant extra capacity. This will enable them to withstand a higher level of virus or DOS traffic before service is affected.

  9. #9

    powdarrmonkey's Avatar
    Join Date
    Feb 2008
    Location
    Alcester, Warwickshire
    Posts
    4,859
    Thank Post
    412
    Thanked 777 Times in 650 Posts
    Rep Power
    182
    Unfortunately, due to the reliance on internet access to send SMS alerts and access to the email filtering platform to send email updates, both of which were affected by the issue, these updates did not go out to subscribed customers in a timely manner.
    D'oh *facepalm*

  10. #10


    Join Date
    Dec 2005
    Location
    In the server room, with the lead pipe.
    Posts
    4,638
    Thank Post
    275
    Thanked 778 Times in 605 Posts
    Rep Power
    223
    Quote Originally Posted by powdarrmonkey View Post
    D'oh *facepalm*
    Yeah, it'd be funny if we weren't reliant on the people making those basic mistakes.

  11. #11
    nicholab's Avatar
    Join Date
    Nov 2006
    Location
    Birmingham
    Posts
    1,486
    Thank Post
    4
    Thanked 97 Times in 93 Posts
    Blog Entries
    1
    Rep Power
    50
    I am waiting for for fibre to the cabinet then we can have a decent speed backup.

SHARE:
+ Post New Thread

Similar Threads

  1. Synetrix connection slow
    By nicholab in forum London Grid for Learning (LGfL)
    Replies: 1
    Last Post: 6th March 2010, 11:25 PM
  2. EMBC/Synetrix
    By synaesthesia in forum East Midlands Broadband Consortium (EMBC)
    Replies: 5
    Last Post: 6th November 2009, 07:12 AM
  3. oh Synetrix
    By dsk in forum London Grid for Learning (LGfL)
    Replies: 4
    Last Post: 17th September 2009, 09:47 PM
  4. EMBC Synetrix
    By dan400007 in forum General Chat
    Replies: 13
    Last Post: 15th April 2008, 03:23 PM
  5. VPN connection with internet connection option
    By FN-GM in forum Wireless Networks
    Replies: 6
    Last Post: 29th December 2007, 07:19 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •