Arrived in work this morning to discover 3 of our VMs on our cluster had decided to stop responding. I couldn't connect to them, and checking through the event log in Failover Cluster Manager points out that the Resource Hosting Subsystem had issues communicating with them, so it attempted to restart itself.
I've looked through the cluster.log file generated when I did a Get-ClusterLog for the cluster and the specific node in question, and I see a few mentions of RHS having issues at the times when it all happened, but that's about the lot. No cause.
The Hyper-V node in question was running half a dozen VMs at the time, all on the same shared storage (another Windows Server 2012 machine, over SMB 3.0 shares). The rest of the VMs on the node had no issues at all, and there are no errors in the logs on the storage server. The core switch is fine also, so it wasn't down to network interruption.
So my question is this - what on earth could cause 3 random machines to have issues like this? They're not even all the same guest OS! One Ubuntu machine, one 2008 R2 machine and one 2012 machine.
I'm at a bit of a loss!
I had something similar last week -but on a 2008 R2 Hyper-V host
From what I can tell it was caused by a single VM blue screening so much that it locked its VMWP process on the host.
The longer it BSOD'ed the worse it got until all VMs were unresponsive, and I couldn't terminate the VMWP or parent VMMS service on the host - to the point where I couldn't even shutdown the host without pressing the reset button.
There are currently 1 users browsing this thread. (0 members and 1 guests)