I have had a few gremlins emerging on my network over the last 6 months, which I put down as just incidents, they are now however escalating to problems. :)
I have the standard printer script everyone and his dog is using yet sometimes it's not deleting out previous printers as it should. Sometimes it even picks up a printer people have not even used for weeks.
Login scripts are not always applying and mapping drives.
Group policy randomly does not apply
All my servers are connected to our main core layer 3 switch using bonded/teamed connections, there is no more than 2 hops to anyone machine from servers, all links to outlying buildings are bonded gig fibre.
We use roaming profiles at the moment which cause profiles to be cached on the local machine (even though delete cache roaming profiles on local machine is selected in GPo).
Anyone have any ideas or know of any tests I can do to find out the problems?
Cheers in advance.
See if your servers are replicating correctly by using a program called sonar (which can be downloaded from the MS site).
Nice one thanks, will check it out.
Start keeping a detailed record of the incidents, see if you can notice a pattern.
Get the event logs to go with the incidents if they show any problems.
Check the switch event logs, see if anything is going wrong there, check the ports for above average errors transmitting / receiving.
Is your network flat our subnetted? Do you have too many machines on one subnet? You mention hops, this suggests routers are involved, maybe you could consider a DC per subnet?
If you want to get rid of cached profiles, add a startup script to group policy that runs delprof (this was originally an NT4 reskit command, I'm fairly sure it's part of the 2K / 2K3 reskits too).
I've seen issues like this when machines fail to get DHCP, but that's obvious from the logs so unlikely to be that. There's also one I've seen but for the life of me can't find the details, I tracked down the fix by using the event error ID.
TBF I can't see it being a replication error, but I it's probably worth a check anyway, the DC event logs should tell you if there's a problem. - I was wrong. Not unusal though.
if they are replicating fine, find a machine that is affected and see if it can ping the domain.local and ensure it returns a valid address of one of your domain controllers.
Another thing to check which i had a problem with around 3 weeks ago was that under delegation of the group polices enterprise domain controllers did not have read permissions to the group policys which where not applying, i never worked out why but once they where added back in (with read only perms) everything returned to normal. took me over a week to work out the problem!
I've seem some weirdness with bonded/teamed connections on dc's. I just run my DC's on a single cat5e gigabit link now.
Using the Sonar tool one of my DC's has failed to replicate, I get an error event ID 13568 Source:NTFRS.
The File Replication Service has detected that the replica set "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" is in JRNL_WRAP_ERROR.
Replica set name is : "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)"
Replica root path is : "c:\winnt\sysvol\domain"
Replica root volume is : "\\.\C:"
A Replica set hits JRNL_WRAP_ERROR when the record that it is trying to read from the NTFS USN journal is not found. This can occur because of one of the following reasons.
 Volume "\\.\C:" has been formatted.
 The NTFS USN journal on volume "\\.\C:" has been deleted.
 The NTFS USN journal on volume "\\.\C:" has been truncated. Chkdsk can truncate the journal if it finds corrupt entries at the end of the journal.
 File Replication Service was not running on this computer for a long time.
 File Replication Service could not keep up with the rate of Disk IO activity on "\\.\C:".
Setting the "Enable Journal Wrap Automatic Restore" registry parameter to 1 will cause the following recovery steps to be taken to automatically recover from this error state.
 At the first poll, which will occur in 5 minutes, this computer will be deleted from the replica set. If you do not want to wait 5 minutes, then run "net stop ntfrs" followed by "net start ntfrs" to restart the File Replication Service.
 At the poll following the deletion this computer will be re-added to the replica set. The re-addition will trigger a full tree sync for the replica set.
WARNING: During the recovery process data in the replica tree may be unavailable. You should reset the registry parameter described above to 0 to prevent automatic recovery from making the data unexpectedly unavailable if this error condition occurs again.
To change this registry parameter, run regedit.
Click on Start, Run and type regedit.
Click down the key path:
Double click on the value name
"Enable Journal Wrap Automatic Restore"
and update the value.
If the value name is not present you may add it with the New->DWORD Value function under the Edit Menu item. Type the value name exactly as shown above
I will try this fix later as this machine is also my DNS server and it says During this procedure, the data on that particular member becomes unavailable. Will see if that has any effect.