Machine connection failure – “Refused”

Firstly apologies for the slowdown of blog posts in recent times – between work and family life there never seems to be a free moment, I’ve been meaning to write this one up for a couple of months.

A client with a XenDesktop 7.6 farm of Windows 7 pooled desktops was experiencing, seemingly out of the blue, a large number of machine connection failures.  Most failure reasons listed were the extremely informative “Refused” along with the equally descriptive “Timeout” and “None”.

2015-11-02 10_37_19-Management Desktop - Desktop Viewer

Checking the Studio console showed nothing out of the ordinary, all machines were booted and registered.  Logging onto the console of a machine that had refused a connection also showed nothing unusual, in fact the event log didn’t even have a record of an attempted connection.  Very strange.

My first clue came when trying to open the event viewer remotely to one of the affected desktops.

2015-11-02 10_38_01-Management Desktop - Desktop Viewer

The above error appeared which I wasn’t immediately familiar with, but a quick Google pointed me in the right direction.  Most of the results mentioned kerberos errors relating to mismatched SPNs and account names which got me thinking.  Checking DNS for the hostnames and IPs of the affected virtual desktops showed multiple instances of an IP address being assigned to more than one DNS record.

2015-11-02 10_43_33-Management Desktop - Desktop Viewer

When the connection attempts to a virtual desktop were being made, the broker connection process involves a DNS lookup that in this situation resulted in the wrong IP being returned.

What had happened was the DHCP scope and DNS scavenging settings were misaligned meaning expired DHCP leases did not have their DNS records cleaned up.  For a read-through of recommended settings and how the process works have a look at the following two links:

Long story short, once the stale DNS records were manually purged, then the DHCP lease time was extended from 1 day to 14, and the DNS refresh/no-refresh intervals brought down, everything started to behave again.