+ Post New Thread
Results 1 to 12 of 12
Hardware Thread, problem with ISCSI Traffic dropping in Technical; Morning all, Got a bit of an ongoing issue with or 2 hyper-v (scvmm) hosts connected to our dell md3200i ...
  1. #1

    Join Date
    Jul 2009
    Location
    Ryde
    Posts
    118
    Thank Post
    4
    Thanked 3 Times in 3 Posts
    Rep Power
    11

    problem with ISCSI Traffic dropping

    Morning all,

    Got a bit of an ongoing issue with or 2 hyper-v (scvmm) hosts connected to our dell md3200i san

    i'm getting thousands of error ID: 9, 39, 139 on both hyper-v servers, plus the SAN itself is kicking out disconnection errors

    everything seems to be functioning okay, but services (cluster drives) drop out briefly, usually out of hours thankfully..

    both of our servers have 8 1gb/lan ports and the san has two controllers, so 2 x 4ports
    the configuration for each server is:
    3ports aggregated to 3gbps on our main network subnet 172.16.*.*
    1port connected to our other hyper-v server on the subnet 172.15.*.*
    4 ports seperately connected to our SAN switch with different subnets for each port so..
    server 1.
    172.10.130.103
    172.10.131.103
    172.10.132.103
    172.10.133.103

    server 2.
    172.10.130.104
    172.10.131.104
    172.10.132.104
    172.10.133.104

    SAN ports.
    172.10.130.101
    172.10.131.101
    172.10.132.101
    172.10.133.101
    172.10.130.102
    172.10.131.102
    172.10.132.102
    172.10.133.102

    this should give us failover capacity if one controller goes down, or if a NIC fails.

    by the way, both Servers run server 2008R2 SP1 with failover clustering and the latest version of SCVMM.

    the san is reporting everything as ok, all adapters are IP4 with jumbo frames (9000) enabled.
    server adapters are a mix of broadcom and Intel...and they are Dell R610's if that helps.

    oh and the san switch is a D-link 24port gigabit managed switch (with jumbo enabled) and separate from the main network.

    if anyone has any suggestions I would be very appreciative, If you need any information please let me know.

    I just realised, this might be more suited to the virtualisation section, although...it is hardware.
    Last edited by sacrej; 28th February 2011 at 12:35 PM.

  2. #2

    AngryTechnician's Avatar
    Join Date
    Oct 2008
    Posts
    3,730
    Thank Post
    698
    Thanked 1,214 Times in 761 Posts
    Rep Power
    395
    Couple of questions:

    1. Do the network adapters have iSCSI offloading, and if so, are you using the drivers that provide this?
    2. Are you able to monitor traffic to check the host is definitely using all 4 NICs for iSCSI, not just preferring 1 all the time?

  3. #3

    Join Date
    Jul 2009
    Location
    Ryde
    Posts
    118
    Thank Post
    4
    Thanked 3 Times in 3 Posts
    Rep Power
    11
    Okay, due to the servers having 2 x 4 port nic's (intel and broadcom) we put two iscsi ports on the intel and 2 on the broadcom (Intel Gigabit ET Quad Port server adapter) and (Broadcom BCM5709C NetXtreme II) in the hope that if one card completely failed, then there would still be 2 ports serving ISCSI traffic on the server.

    the following settings are enabled on the Broadcom adapters:
    ethernet@wirespeed
    Flow Control
    Interrupt Moderation
    IPv4 Checksum Offload - tx/rx on
    IPv4 Large Send Offload - on
    Jumbo MTU - 9000
    RSS queues - 8
    priority and vlan- enabled
    Rcv Buffer - 750
    Recieve side scaling - on
    duplex -auto
    TCP Connection offload - on
    Transmit buffer -1500

    Intel Adapters
    Enable PME - disabled
    flow control - Rx/Tx
    Gigabit master slave mode - auto detect
    header data split - disabled
    interrupt moderation - enabled
    I moderation rate - Adaptive
    Ipv4 checksum offload - Rx/Tx
    jumbo packets - 9014bytes
    large send offload - on
    duplex - auto negotiation
    log link state event - on
    max rss cpu's - 8
    preferred numa node- default
    priority and VLAN - enabled
    rcv buffer- 250
    recieve side scaling - on
    RSS queues - 1 queue
    TCP checksum offload - rx/tx
    trasmit buffer - 512
    UDP checksum offload - rx/tx
    Virtual Machine queues - disabled

    we use the Broadcom advanced control suite 3 as well, where 2 broadcom and one intel are bonded to make the primary LAN connection

    the failover link cable uses a intel socket

    which leaves 2 x intel and 2 x broadcom for the ISCSI.

    according to BACS3 the broacom adapters have the following offload capabilities: TOE,LSO,CO,RSS

    intel adapters show:LSO,CO,RSS

    and yes, all 4 nic's are showing traffic, although some more than others (reported on the SAN side too)

    also, all ISCSI cables are brand new 0.5m cat6.

    Thank you for your fast response.
    Last edited by sacrej; 28th February 2011 at 02:38 PM.

  4. #4

    AngryTechnician's Avatar
    Join Date
    Oct 2008
    Posts
    3,730
    Thank Post
    698
    Thanked 1,214 Times in 761 Posts
    Rep Power
    395
    Am I right in thinking that all 4 links are dropping at once, across both cards? If so, it would seem to rule out host hardware as an issue. That still leaves host software, the switch, or the SAN, unfortunately.

    Are you using the Microsoft iSCSI connector on the Hyper-V hosts or a third party? Also, what model is the D-Link switch?
    Last edited by AngryTechnician; 28th February 2011 at 03:25 PM.

  5. #5

    Join Date
    Jul 2009
    Location
    Ryde
    Posts
    118
    Thank Post
    4
    Thanked 3 Times in 3 Posts
    Rep Power
    11
    well i'm not sure, in the SAN event log i'm getting 3 errors at 26minutes past and 3 errors at 56 minutes past every hour, 1 of these errors is for RAID controller 1 and 2 are for controller 0)
    the switch is this

    and yes, we are using the Microsoft connector, but it was setup via the Dell software.

    hope that helps

  6. #6

    AngryTechnician's Avatar
    Join Date
    Oct 2008
    Posts
    3,730
    Thank Post
    698
    Thanked 1,214 Times in 761 Posts
    Rep Power
    395
    Are they always at those specific times? Or always 30 minutes apart?

  7. #7

    Join Date
    Jul 2009
    Location
    Ryde
    Posts
    118
    Thank Post
    4
    Thanked 3 Times in 3 Posts
    Rep Power
    11
    roughly every 30minutes (give or take a few here and there)

    which is strange because on the client they are reporting the errors at slightly later times (about 5minutes later) - one server reports 2 errors every 30mins the other reports either 8 or 11 (alternates)

  8. #8

    Join Date
    Jul 2009
    Location
    Ryde
    Posts
    118
    Thank Post
    4
    Thanked 3 Times in 3 Posts
    Rep Power
    11
    one thing to add, only one server is generally dropping drives at the moment (named Hyperv-2)
    i.e. "Cluster Shared Volume 'Volume6' ('Cluster Disk 6') is no longer available on this node because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished."

  9. #9

    AngryTechnician's Avatar
    Join Date
    Oct 2008
    Posts
    3,730
    Thank Post
    698
    Thanked 1,214 Times in 761 Posts
    Rep Power
    395
    Can you check your scheduled tasks event logs on the hosts to see if there is some background task being run periodically that might correspond with the timing of the errors?

    Event Viewer > Applications and Services Logs > Microsoft > Windows > TaskScheduler > Operational

    In particular any tasks that have to do with the disks...

  10. #10
    apaton's Avatar
    Join Date
    Jun 2009
    Location
    Kings Norton
    Posts
    283
    Thank Post
    54
    Thanked 106 Times in 87 Posts
    Rep Power
    36
    Hi

    Just questions, sorry no answers. Looking at you network settings.

    Are you using Jumbo Frames ? Ifs so why are jumbo frames set to different MTU? (9000/9014) All devices on the iSCSI network should have the same MTU. (Switch/iSCSI storage/HyperV).
    Also double check Flow Control, are your switches configured correctly?

    Regards

    Andy

  11. #11

    Join Date
    Jul 2009
    Location
    Ryde
    Posts
    118
    Thank Post
    4
    Thanked 3 Times in 3 Posts
    Rep Power
    11
    I`m not sure why the Intel adapters are showing 9014, they were definitely set as 9000 originally, as I was the one that configured them all, but I can't just try and change this on the fly though as its a live system with about 11 vm's running.

    The switch is set to support jumbo and flow control, that was manually set for all designated ports.

    There are the usual system tasks, but nothing that fits in with the times of the problems we are getting.

    If it helps, I can provide some kind of temporary remote access so you can have a look.. as I realise its quite a complicated problem to try and describe.

  12. #12
    RobFuller's Avatar
    Join Date
    Feb 2007
    Location
    Chelmsford
    Posts
    316
    Thank Post
    83
    Thanked 39 Times in 29 Posts
    Rep Power
    22
    I'm having a very similar problem; I've got a thread going on the TechNet Forums - Hyper-V Custer - STATUS_CONNECTION_DISCONNECTED issue



SHARE:
+ Post New Thread

Similar Threads

  1. Major iSCSI problem
    By mrbios in forum Hardware
    Replies: 0
    Last Post: 24th June 2010, 09:44 AM
  2. Replies: 8
    Last Post: 16th June 2010, 08:53 AM
  3. iSCSI / SUN 7110 Virtual Simulator problem
    By ArchersIT in forum Hardware
    Replies: 14
    Last Post: 16th June 2009, 05:10 PM
  4. ESXi 3.5 into 4 upgrade problem in adding Existing VMFS partition on iSCSI SAN
    By albertwt in forum Thin Client and Virtual Machines
    Replies: 4
    Last Post: 27th May 2009, 08:31 AM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •