I don't have any ACLs yet, so I don't think I can configure any more logging than what I'm already getting from show logging, can I?
I do know that DNS is fine - I can always resolve names to IPs.
One notable development: since turning off the IPTV server last night, I haven't had the problem once. Going to leave it off for the rest of the day, especially as I haven't been using my workstation intensively this morning, but we could be on to something here. I already know that the problem was happening before multicast-routing was enabled, but could just having the multicast traffic there in the first place cause something like this? I've had this IPTV box running for 2 years without any issues before turning on L3.
You can turn on full debugging but that will own the switch CPU with
more focused you could use
debug ip ?
to pick out some stuff to do with it
This document may be worth a look Diagnostics Guide for the HP ProCurve Routing Switches 9304m, 9308m, and 9315m - Software Release 07.6.04 or Greater (S and ftp://ftp.hp.com/pub/networking/soft...032-Chap03.pdf
It's likely that enabling multicast has tipped your core over the edge on a problem it had previously been dealing with without noticable impact.
I'd set up a monitor/mirror port on your core and run a packet capture to see if there was anything obvious showing up. <- this is my 'new' favourite trick, I am always reluctant to do it, but it always shows me something I wasn't expecting to see.
I wonder whether you've got a loop somewhere and either STP is flapping or Broadcast Suppression is kicking in and the CPUs/Buffers are flooding resulting packet loss.
I've had issues with IGMP snooping and multicast. If it isn't configured properly across every switch that multicast traffic is going to pass, you'll run into problems. Spanning tree being what it is, unless your topology includes multiple redundant links, I don't know that is where you'd have an issue although you won't see these ports as blocked per se, unless you are actually looking at their STP state.
Originally Posted by AngryTechnician
To test the multicast theory you can disable IGMP snooping on all your switches. This is a temporary test and I do not recommend this as a permanent solution but it may point to whether or not this is a multicast problem.
A topology if you are able to provide one to us would also help diagnose what the issue might be.
Edit: Having just read about turning off the IPTV server I'd point at a multicast routing issue. Try and find out what multicast IP that server is using, there may be a conflict with another multicast service.
OK, the multicast server remained off yesterday as I was doing a lab build all day, but there still seem to be no recurrences of the problem. I plan to fire it back up tomorrow when no-one else is in to reproduce the issue and do some more troubleshooting.
I know the details of the multicast IPTV server extremely well as it's a personal project. It has been running pretty much 24/7 for 2 years with no issues, so it's a little odd that it's suddenly causing problems now. IGMP is configured on all switches on the VLANs which carry the IPTV: with ~100Mbps of multicast traffic being pumped out by the IPTV, it's very obvious from the traffic graphs if you miss a switch (as I have once in the past). It broadcasts SAP on the normal 220.127.116.11 address, and the ITPV channels are on the following addresses:
Unless I've missed something, the IPTV streams are all in the administratively scoped multicast address space, and there are no other multicast servers on the network (aside from the WDS server, but I never use the multicast features).
I've attached a switch topology graphic. The only time I have more than one connection between any two switches is between the core and the switch immediately below it in the rack for my servers. The 4 links are trunked, and there are no reported errors with that so it is acting as a single 4Gbps link.
At this point I'm beginning to wonder if this switch just can't hack that much IGMP/multicast and the routing at the same time, regardless of whether the traffic is to-standard or not. The CPU and memory aren't getting much above 50%, but it wouldn't be the first problem I've had with sub-par performance from HP kit.
Right, well I've had a thoroughly frustrating day with this, but I have some answers.
It's definitely a problem with the multicast traffic combined with ip multicast-routing. I must have left it on initially when I thought I had ruled it out, because I've been able to literally toggle the problem on and off today by enabling and disabling multicast-routing while having a PING running. The problem occurs even if PIM is disabled, so the multicast traffic isn't even being routed - just the presence of multicast traffic on a single VLAN with routing on is enough.
If I have multicast-routing on with no multicast traffic: everything is fine.
If I have multicast traffic with no multicast-routing: everything is fine.
If I have even the slightest amount of traffic (as little as a single SAP announce) with multicast-routing on, the problem will occur.
Another tell-tale symptom I've discovered today is that when the multicast is being routed, the routing performance is poor. SD streams are OK, but on anything but the default VLAN, HD streams are garbled with MPEG decoding artefacts on playback. That doesn't happen on the default VLAN (where the traffic originates), even when testing from the farthest point of the topology over a wireless WDS bridge.
There are documented limits for multicast routing on this switch, which get easier to reach the more VLANs you have, and at one point I had thought it might be a load problem due to the amount of multicast traffic. Then when it happened with just the SAP announcements running, which are a tiny amount of text traffic on a single multicast address, I came to the conclusion that the ProCurve 5300 series is just a pile of junk. There are no pertinent messages logged to the debug log when the problem occurs (even with debug all), and there was nothing unusual shown on a monitoring port trace using Wireshark. The switch simply doesn't do what its supposed to.
Thanks for all your help and suggestion guys, it has helped me narrow down the issue and preserve my sanity. I will be having an interesting chat next time our local ProCurve specialist calls to make a sale.
Multicast routing is always a one of the more difficult things, I remember when I was doing Cisco stuff and you needed a specific point revision of the right firmware to get it running well with all the streams. Have you tried updating the firmware to the latest version etc.
Firmware is the latest, last update was in May 2010 so not expecting any further updates. I already have a way around not being able to route the IPTV (the other VLANs will get a unicast stream instead) so I'm not going to lose any more sleep over it, but it would have been nice to know the switch was rubbish before spending a week growing grey hairs trying to figure it out!
It has got Procurve written all over it ;)
Originally Posted by AngryTechnician
Glad to hear you at least got your sanity saved!