We only have Windows 7 clients and it only happens a couple of times a month. If restarting the server service gets it going again, then I can live with it until a proper cure is found.
Definitely looks like that's true for us. We're just trying to work out if there are any differences in updates etc between the 150 that seemed ok this morning and the 60 we enabled at 12:30
The other thing I've noticed is occasionally the powershell command Get-SmbConnection shows that Windows 8.1 is talking to 2012 R2 using SMB3 instead of SMB3.02 which is odd. I'm wondering if there is an issue with SMB dialect negotiation
I'm running a Win7 client and I had as many problems as anyone at our place. We don't have Win8 at all. Sorry :-(
Originally Posted by Michael
Windows Server 2012 R2? Are all your Windows 7 machines running all the latest updates from WSUS?
Originally Posted by Seb1780
The one other thing to try is to have a 2012 or 2012 R2 server without File Services switched on, but could be a DC running AD, DNS, DHCP and Print server roles.
I've had this problem exactly once about 6 weeks ago, and assuming it was the exact same problem, it happened simultaneously with Windows 7 SP1 and Windows 8.1 clients, all running the latest updates for the time. My file server is running Server 2012 (not R2), and also had the most recent patches at the time.
No other roles are installed on the server, it just runs File And Storage Services. For what it's worth, I do have the Data Deduplication role service installed, not that it should make any difference given that local access to files was working fine.
Originally Posted by Seb1780
The "removed software" was, in fact, services that were stopped. Apparently we have made progress as we no longer have the loss of our file server happening several times a day but we are running without various services.
We are bringing the services back one at a time to see what effect each has.
More when I know what happens.
@Seb1780 - Hi any chance you could post a list of services that were disabled as I've also been seeing this problem?
I'll see what info I can get on Monday.
Originally Posted by robriley
ETA: After re-reading the whole thread, I suspect mine is a separate issue from what's being discussed here. I'll try posting my own question later. Leaving what I wrote before in case I'm wrong...
I'm not positive the problem I'm having is exactly the same as what's described here. Here's the reason I'm not sure this is related to what you guys are discussing: All of our clients don't fail at once. It's one client at a time, not all at once. And often as not, after a little waiting, the client WILL eventually reconnect to the file share. Not always, but usually. The problem is, once one person is having the problem, it often occurs on other client machines and the only sure way to fix it all at once is just to reboot the server. Details below:
VMWare ESXi 5.5 build 1474528 with Windows Server 2012 R2 (Domain Controller) running DHCP, DNS, AD, File and Printer sharing
Client computers: Mostly Windows XP, some Windows 7.
For us, the problem manifests as a stall on a client pc (Most often Windows 7, very rarely Windows XP) when we go to save files in Office 2007 to the mapped network drive that points to the server. It doesn't happen every time, and the length of time a computer has had the file open doesn't seem to affect it either. That makes me think it isn't an opportunistic locking problem. It doesn't appear to be limited to office files, but that's where we see it most frequently as that's the most frequent type of file in use. When we try to access the mapped network drive during the stalled save, it stalls opening the window for a while, but eventually opens in most cases. Often as not, you can then find a way to pull office out of its stall and save properly it properly. It creates a .tmp file in the directory that was being saved to sometimes as well, which isn't unusual for Office. During these stalls, a ping command to both the IP and the name of the server works fine. NSLookup works fine. Address resolution seems to work fine for everything on the network. Here's the weird thing: Eventviewer isn't showing any consistent errors across these instances besides the app hang errors and credentials being submitted and verified by the server. The server isn't showing any consistent errors either.
The events that are being logged on a fairly regular basis are 1001 (hang) and 4648 (log on, resolves successfully) I see a simultaneous event on the server, ID 4776 credential validation, and then 4624 logon successful. Once in a while, I also see a warning - ID2012 System Log warning indicating a network error during transmitting/receiving data.
I've tried swapping out our network devices (switches & gateways) and that didn't solve the issue.
I've tried disabled smb2/3 on the server and the client machines on account of reports that it could cause issues just like this in Office during saves to mapped network drives. No dice so far. This happens at least once a day, and it's driving me bonkers. The ONLY real fix that gives me a few hours of peace is to reboot the server. Thankfully, it reboots inside of 3 minutes, but it destroys workflow in the office on anyone using shared files. (That's everyone in the office.)
Now that I know what service to try restarting (I'd previously tried restarting a bunch of services related to SMB and file sharing, but hadn't tried restarting the server service.) I'll give that a shot next time it happens and see if I can avoid a reboot. I'm going on a month and a week of troubleshooting this, and I've pulled out just about all my hair. Very much looking forward to that list of services they disabled. I can't get funding from my company to get outside help or open a case with Microsoft. So you guys are my best, last hope. ;)
Interesting links worth browsing related to all the searches I've done trying to figure out what the hell is causing this, some of which come from this thread.
Whitepaper on Opportunistic Locking and possible file corruption problems:
Opportunistic Locking and Read Caching on Microsoft Windows Networks
SMB commands for enabling/disabling smb 1, 2/3 on various operating systems:
How to enable and disable SMBv1, SMBv2, and SMBv3 in Windows Vista, Windows Server 2008, Windows 7, Windows Server 2008 R2, Windows 8, and Windows Server 2012
Current hotfixes for 2012 & 2012R2 related to file sharing issues:
List of currently available hotfixes for the File Services technologies in Windows Server 2012 and in Windows Server 2012 R2
Your description of the user experience is exactly what we are seeing here; hung connections, not all machines at the same time, .tmp files being left behind etc.
As I promised an update here it is - there is nothing more to report. We remain functional, but still have some services switched off, and since last week we have not had any repeats of the mass disruption seen prior to the half-term holiday.
Hi @Seb1780 - did you have any luck with the list of disabled services?
Originally Posted by Seb1780
I've had this problem twice since I migrated from 2003 to 2012 R2 on my file server, October 2013. It is a pure file server, no additional roles. Win7 clients and a single Win8.1 client (mine, local profile). One user reports that they are having trouble opening or saving a file (both instances was a different excel 2010 spreadsheet), then slowly more and more people lose connectivity. Both times I have tried to access shared drives and my Windows 8.1 machine has locked up and had to hard reboot. I haven't experienced the issue since before the half-term and I did a complete server estate SUU and windows update deployment.
For what it's worth, disabling SMB2/3 seems to have stopped me from having to reboot workstations. It still stalls out when they go to access a shared drive after having a stalled save, but it seems to eventually reconciles itself after anywhere from 20 seconds to ~3 minutes. It's not a solution, but it allowed me to keep my system up in most cases. This only applies to Windows XP and Windows 7 home/pro
Basically if someone has a problem saving, I do this and it seems to avoid a workstation reboot:
Let Excel/Word/whatever chug along and try try try in the background. It leaves the 'saving' prompt up with a cancel button, which I do not hit.
Attempt to open the shared drive.
Wait for the shared drive to properly display contents - usually 40 seconds to 2 minutes.
Once it properly displays contents, navigate back to the program trying to save, hit cancel, wait for it become responsive again, and the try saving again.
Obviously this is by no means a solution, but believe it or not it's the only way I've been able to avoid a reboot on a workstation. Which is (unfortunately) necessary when they have 15 documents open that need to be saved. Here's the kicker... I get through that entire process, and it can take up to 5 minutes... and the only events I see in the logs on either the server or workstation relate to a credential negotiation (Event 4776) that returns a successful result (Event 4624). Once in a while, there's a 2012 network error event on the server, and of course it always registers apphang events related to excel/word/etc and explorer if that froze up for too long trying to display contents of a network share.
@bergmbe How you describe your issues almost mirror ours to the tee. Doesn't affect all clients all the time, but when it does it typically manifests itself as a stall out when opening Explorer (usually on Computer, so stalls refreshing the list of mapped drives) or when attempting to open/save office documents.
As with the others though, name resolution, pings, RDP - everything else appears to work fine during the periods of this beligerent behaviour.
Our "problematic" server is also a DC, hosting RID and PDC FSMO roles. I performed an "nltest /SC_VERIFY" early last week against the box and it couldn't find it's own domain name - a touch worrying - but other DCs could, so I figured the server hadn't correctly promoted itself so moved the two FSMO roles off to other DCs in preperation for a demote/re-promote to see if that cured it.
Now I didn't get around to the demote/repromote but interestingly the DC that got those FSMO roles now fails the "nltest /SC_VERIFY" test with the can't find domain error, and the problematic server now passes it - and I haven't seen the problem in near enough a week.
I know that may not help as people with this issue seem to be running a mix of 2012/R2 in both File server with DC, without DC, Hyper-V modes - but thought I would add it to the pot as anecdotal in case it helps somebody figure out what's going on.