Hardware Thread, problem with ISCSI Traffic dropping in Technical; Morning all,
Got a bit of an ongoing issue with or 2 hyper-v (scvmm) hosts connected to our dell md3200i ...
28th February 2011, 12:31 PM #1
- Rep Power
problem with ISCSI Traffic dropping
Got a bit of an ongoing issue with or 2 hyper-v (scvmm) hosts connected to our dell md3200i san
i'm getting thousands of error ID: 9, 39, 139 on both hyper-v servers, plus the SAN itself is kicking out disconnection errors
everything seems to be functioning okay, but services (cluster drives) drop out briefly, usually out of hours thankfully..
both of our servers have 8 1gb/lan ports and the san has two controllers, so 2 x 4ports
the configuration for each server is:
3ports aggregated to 3gbps on our main network subnet 172.16.*.*
1port connected to our other hyper-v server on the subnet 172.15.*.*
4 ports seperately connected to our SAN switch with different subnets for each port so..
this should give us failover capacity if one controller goes down, or if a NIC fails.
by the way, both Servers run server 2008R2 SP1 with failover clustering and the latest version of SCVMM.
the san is reporting everything as ok, all adapters are IP4 with jumbo frames (9000) enabled.
server adapters are a mix of broadcom and Intel...and they are Dell R610's if that helps.
oh and the san switch is a D-link 24port gigabit managed switch (with jumbo enabled) and separate from the main network.
if anyone has any suggestions I would be very appreciative, If you need any information please let me know.
I just realised, this might be more suited to the virtualisation section, although...it is hardware.
Last edited by sacrej; 28th February 2011 at 12:35 PM.
28th February 2011, 01:30 PM #2
Couple of questions:
- Do the network adapters have iSCSI offloading, and if so, are you using the drivers that provide this?
- Are you able to monitor traffic to check the host is definitely using all 4 NICs for iSCSI, not just preferring 1 all the time?
28th February 2011, 02:27 PM #3
- Rep Power
Okay, due to the servers having 2 x 4 port nic's (intel and broadcom) we put two iscsi ports on the intel and 2 on the broadcom (Intel Gigabit ET Quad Port server adapter) and (Broadcom BCM5709C NetXtreme II) in the hope that if one card completely failed, then there would still be 2 ports serving ISCSI traffic on the server.
the following settings are enabled on the Broadcom adapters:
IPv4 Checksum Offload - tx/rx on
IPv4 Large Send Offload - on
Jumbo MTU - 9000
RSS queues - 8
priority and vlan- enabled
Rcv Buffer - 750
Recieve side scaling - on
TCP Connection offload - on
Transmit buffer -1500
Enable PME - disabled
flow control - Rx/Tx
Gigabit master slave mode - auto detect
header data split - disabled
interrupt moderation - enabled
I moderation rate - Adaptive
Ipv4 checksum offload - Rx/Tx
jumbo packets - 9014bytes
large send offload - on
duplex - auto negotiation
log link state event - on
max rss cpu's - 8
preferred numa node- default
priority and VLAN - enabled
rcv buffer- 250
recieve side scaling - on
RSS queues - 1 queue
TCP checksum offload - rx/tx
trasmit buffer - 512
UDP checksum offload - rx/tx
Virtual Machine queues - disabled
we use the Broadcom advanced control suite 3 as well, where 2 broadcom and one intel are bonded to make the primary LAN connection
the failover link cable uses a intel socket
which leaves 2 x intel and 2 x broadcom for the ISCSI.
according to BACS3 the broacom adapters have the following offload capabilities: TOE,LSO,CO,RSS
intel adapters show:LSO,CO,RSS
and yes, all 4 nic's are showing traffic, although some more than others (reported on the SAN side too)
also, all ISCSI cables are brand new 0.5m cat6.
Thank you for your fast response.
Last edited by sacrej; 28th February 2011 at 02:38 PM.
28th February 2011, 03:18 PM #4
Am I right in thinking that all 4 links are dropping at once, across both cards? If so, it would seem to rule out host hardware as an issue. That still leaves host software, the switch, or the SAN, unfortunately.
Are you using the Microsoft iSCSI connector on the Hyper-V hosts or a third party? Also, what model is the D-Link switch?
Last edited by AngryTechnician; 28th February 2011 at 03:25 PM.
28th February 2011, 03:33 PM #5
- Rep Power
well i'm not sure, in the SAN event log i'm getting 3 errors at 26minutes past and 3 errors at 56 minutes past every hour, 1 of these errors is for RAID controller 1 and 2 are for controller 0)
the switch is this
and yes, we are using the Microsoft connector, but it was setup via the Dell software.
hope that helps
28th February 2011, 04:35 PM #6
Are they always at those specific times? Or always 30 minutes apart?
28th February 2011, 04:42 PM #7
- Rep Power
roughly every 30minutes (give or take a few here and there)
which is strange because on the client they are reporting the errors at slightly later times (about 5minutes later) - one server reports 2 errors every 30mins the other reports either 8 or 11 (alternates)
28th February 2011, 04:57 PM #8
- Rep Power
one thing to add, only one server is generally dropping drives at the moment (named Hyperv-2)
i.e. "Cluster Shared Volume 'Volume6' ('Cluster Disk 6') is no longer available on this node because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished."
28th February 2011, 06:22 PM #9
Can you check your scheduled tasks event logs on the hosts to see if there is some background task being run periodically that might correspond with the timing of the errors?
Event Viewer > Applications and Services Logs > Microsoft > Windows > TaskScheduler > Operational
In particular any tasks that have to do with the disks...
28th February 2011, 11:51 PM #10
Just questions, sorry no answers. Looking at you network settings.
Are you using Jumbo Frames ? Ifs so why are jumbo frames set to different MTU? (9000/9014) All devices on the iSCSI network should have the same MTU. (Switch/iSCSI storage/HyperV).
Also double check Flow Control, are your switches configured correctly?
1st March 2011, 09:15 AM #11
- Rep Power
I`m not sure why the Intel adapters are showing 9014, they were definitely set as 9000 originally, as I was the one that configured them all, but I can't just try and change this on the fly though as its a live system with about 11 vm's running.
The switch is set to support jumbo and flow control, that was manually set for all designated ports.
There are the usual system tasks, but nothing that fits in with the times of the problems we are getting.
If it helps, I can provide some kind of temporary remote access so you can have a look.. as I realise its quite a complicated problem to try and describe.
2nd March 2011, 12:45 PM #12
I'm having a very similar problem; I've got a thread going on the TechNet Forums - Hyper-V Custer - STATUS_CONNECTION_DISCONNECTED issue
By mrbios in forum Hardware
Last Post: 24th June 2010, 09:44 AM
By tonyw3 in forum Hardware
Last Post: 16th June 2010, 08:53 AM
By ArchersIT in forum Hardware
Last Post: 16th June 2009, 05:10 PM
By albertwt in forum Thin Client and Virtual Machines
Last Post: 27th May 2009, 08:31 AM
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)