DNS gone for a walk..
Mid morning i started getting people reporting that they couldn't logon, they where getting user profile service could not be found. We've had this a couple of times and a restart of the servers clears it.
Only this time upon rebooting i have errors all over the place. AD is error'd as DNS isn't working. DNS is error'd as AD isn't working.
Trying to load the DNS console i get a message asking where the DNS server is located. When i choose the local machine i'm told that Access is denied.
Please... someone out there must have an idea....
I ran dcdiag and the file is attached it doesn't appear to tell me anything i didn't know.
I'd take a guess at DNS scavenging removing the DC record. <I'm almost certainly wrong.
How many DCs?
How are their IP addresses configured?
What other errors are in your event logs? Post them verbatim (redact sensitive info)!
Check that there is one IP per DC, that it is static and the first DNS server entry points to itself and the second to the another DC (if one exisits)
How are you able to get out to the internet if your DNS is borken?
2 DC's the other appears to have dns running, although is complaining it cannot find the other.
Both DC's have static IP's and both DC's have DNS pointed to themselves and then the other.
I have a direct link to our router so luckily i can still get online!
A part of me thinks if i dcpromo the faulty DC and then re-add it it might suddenly work, although what that could break is beyond me.
AD has lots of errors about the Global Catalog being unable to communicate with, this sounds worrying.
It is almost the end of the school day. If it is still playing up, and it needs to be working at some point tomorrow morning, it is probably worth getting authorisation to drop £200 on a call to Microsoft Support (assuming you don't have a fully activated technet subscription to hand). You don't need to make the call, just get permission to do so (you will need a credit card, which they won't bill if you give them a Purchase Order and invoice address).
Last time we saw something like this it was a misconfigured gateway address on one of the DCs.
Can they ping each other?
Which server is FSMO role holder?
Are they both Global Catalogs?
When did replication last occur successfully?
What happens if you change the faulty DC's IP settings so that it uses the other DC as its primary DNS server? (best to reboot the faulty DC after making that change)
Document every single thing you change, just in case you do need to call MS.
If things come back to life after the reboot I'm going to stick with the scavenging prediction:
Check the DHCP server Dynamic DNS Settings, check the DNS server and zone scavenging setting. Read up about DNS/DHCP interaction.
If they don't comeback after a reboot, try restarting the netlogon service on the offending DC.
I'm going AFK in a few minutes for the rest of the day. Post your findings and hopefully a better SA than I will pick them up! But don't forget about the MS PSS option.
They can ping each other.
The one that isn't working is the FSMO role holder, although it is claiming it is not valid.
It appears as the faulty one was the global and replication appears to have occured today.
Do try the IP settings (DNS) service changes I suggested.
Waiting for it to come back up at the moment.....
With the DNS changed to the 2nd DC i can logon as an admin now. standard users are told that the profile service is not running and will not allow them to logon. Not sure if this is a step forward or not.
What does DCDiag report now? From both DCs.
How about if you shutdown the 'good' DC?
If all ok, bring it back up. You should now restart all servers sequentially, to ensure kerberos and GPO processing is happy.
Once your servers are all back up, reboot the workstations and they will then allow users to log on successfully.
Moving forward: Use Active Directory Best Practice Analyser to check things out.
Look into the DNS scavenging thing.
Resolve remaining errors and warnings in the event log.
If the faulty DC was your only GC server, demoting it would be a bad idea!
Thought i'd give everyone an update. We ended up calling in support from Microsoft, 4 1/2 hours later all was working again. The secure channel between the 2 DC's had failed to authorise.
Once this was resolved everything else started working. Thanks for all the suggestions although it didn't get me anywhere it did prevent the engineer doing a lot of it.