XenServer 6.2 and updating drivers

A while back I posted an article on updating XenServer 6.0.2, with a quick tip on how to check what drivers are loaded on your system.  For reference that article is here – “XenServer 6.0.2 hotfix and driver disk install summary“.  I’ve been caught out though – my quick tip relies on the module name matching the driver disk filename which works in all but one instance.  Eg as you can see in the screenshot below, the bnx2x driver filename also starts with bnx2x.

XS62ESP1009 Driver Disks There is an anomaly though – the filename for the Emulex be2net driver does not match!

XS62ESP1009 Emulex

As you can see the driver name is “be2net” but the filename is “emulex”.  So when running your lsmod command, double-check the filenames match the driver names for all future driver updates, as I’m sure there will be other inconsistencies in the future.

For reference, the command to check your system for any required XenServer 6.2 SP1 Hotfix 9 driver updates (as per CTX141192) would be:

lsmod | egrep 'bnx2x|tg3|bnx2|cxgb4|enic|fnic|be2net|aacraid|hpsa|e1000e|megaraid_sas|qlcnic|qla4xxx|qla2xxx|qlge'

Happy updating :-)

XenServer iSCSI LUNs not mapping

I’m not a big fan of iSCSI – probably a hangover from years of fibre channel experience before ethernet based storage networks became commonplace – and the issue I ran into today didn’t do anything to increase my comfort level.  My experience to date is that fibre channel storage systems seem to “just work” whereas iSCSI can be finnicky and temperamental.

A customer had purchased a new HP P2000 G3 iSCSI storage array to provide a shared storage repository for XenServer HA for the blades hosting their XenDesktop farm, in addition to providing a storage location for templates and a handful of test VMs.  I configured two ports on each P2000 controller and assigned two NICs from each XenServer blade for iSCSI traffic as per the XenServer multipathing best practice guide which you can find at CTX136354.  I enabled multipathing on all hosts in the farm and proceeded to create the storage repository.

After creating the SR it was visible on one host only, and on the rest it showed with a status of “Unplugged” in XenCenter.  Watching the SMLog file while trying to “repair” the SR from the GUI, or running “xe pbd-plug uuid=…” from the command line, generated errors similar to the below:

Aug 21, 2013 4:27:14 AM Error: Repairing SR P2000_HA - Internal error: Failure("Storage_access failed with: SR_BACKEND_FAILURE: [ non-zero exit; ; Traceback (most recent call last):\n  File \"/opt/xensource/sm/LVMoISCSISR\", line 549, in ?\n    SRCommand.run(LVHDoISCSISR, DRIVER_INFO)\n  File \"/opt/xensource/sm/SRCommand.py\", line 250, in run\n    sr = driver(cmd, cmd.sr_uuid)\n  File \"/opt/xensource/sm/SR.py\", line 136, in __init__\n    self.load(sr_uuid)\n  File \"/opt/xensource/sm/LVMoISCSISR\", line 150, in load\n    self.iscsi = self.iscsiSRs[0]\nIndexError: list index out of range\n ]")

Needless to say this didn’t make a lot of sense.  Eventually after running out of ideas, some semi-random googling uncovered that the /etc/iscsi/initiatorname.iscsi file was missing (on 10 of 11 servers in the pool!!) and was not recreated by changing the IQN in the XenCenter console.  To fix this, I ran the following commands (note the initiator name must be the same as what is set in the XenCenter console)

[root@XenHome ~]# echo InitiatorName=iqn.2011-07.com.xenserver01:10f967a6 > /etc/iscsi/initiatorname.iscsi
[root@XenHome ~]# echo InitiatorAlias=XenServer01 >> /etc/iscsi/initiatorname.iscsi
[root@XenHome ~]# /etc/init.d/open-iscsi stop
[root@XenHome ~]# /etc/init.d/open-iscsi start

To test iSCSI was now operational, I ran the following command (replace the IP address with the address of your iSCSI SAN):

[root@XenHome ~]# iscsiadm -m discovery -t sendtargets -p 192.168.1.10

A list of target LUNs was returned, I was able to successfully “repair” the SR and get on with my day.

XenServer 6.0.2 hotfix and driver disk install summary

Confused about the multitude of XenServer 6.0.2 hotfixes and associated drivers currently available? The public hotfixes and drivers are listed here and as of April 2013 totals 88!  It’s not as daunting as it looks though – there are currently 21 hotfixes, the remainder are the drivers associated with various versions of hotfixes.  And of the 21 hotfixes a number of the older ones are superceded and included with newer hotfixes, but there are some dependencies so they need to be installed in the correct order.

Firstly, download the required hotfixes.  As mentioned we don’t need all 21 – so download the following 5 hotfixes (that’s 16 less reboots to do!)

Hotfix 6 CTX134130
Hotfix 10 CTX135225
Hotfix 19 (includes hotfixes 1 and 2) CTX137134
Hotfix 20 (includes hotfixes 4, 8, 14, 16, 18) CTX136478
Hotfix 21 (includes hotfixes 1, 3, 5, 7, 11, 13) CTX136479

Hotfix 21 links to an article (CTX136621) listing the various upgraded drivers to go with it.  To work out the drivers you need, fire up a XenServer command line via the physical console, XenCenter or SSH.  Enter the “lsmod” command to return all running modules and drivers, or combine it with the egrep command to narrow the search down.  Eg the driver page shows the modules that require updating as per below graphic (this is not the complete list, cropped for brevity):

XS602E012-Drivers

So to search for these drivers, issue the command as follows:

[root@XenHost ~]# lsmod | egrep 'bnx2x|bnx2|tg3|cxgb3|cxgb4'

Any lines that get returned indicate the drivers that are installed and need updating.  If no lines get returned, you have no drivers that need updating :)

Now we are prepared, install the hotfixes in the following order:

1 – Hotfix 6
2 – Hotfix 10
3 – Hotfix 20
4 – Hotfix 19
5 – Hotfix 21
6 – Drivers (if required)

Time and enthusiasm permitting, I’ll attempt to keep this post updated as future hotfixes get released – happy patching!

XenServer 5.6 SP2 hangs after Hotfix 19 applied

I have recently updated several XenServer installs from base 5.6 SP2 to the latest hotfix levels and encountered an issue that doesn’t seem to be specifically documented on the Citrix support site.  All of these sites had a similar hardware configuration, IBM servers with fibre channel adapters and switching connecting to IBM DS3400 storage arrays.  With the first of these servers, when applying the hotfix updates I installed them all at once, without rebooting in between.  The server then hung on restart, at the screen below:

Fortunately the process of reinstalling a XenServer host is quite straight-forward, so I performed a fresh install, configured the host and added back into the pool, and installed the hotfixes one by one rebooting in between.  The server then hung again after installing Hotfix 19.  After another fresh install and performing tasks in a slightly different order I isolated the problem, on the original server build and in my first reinstall I had enabled RDAC multipathing in order for XenServer to use the IBM DS3400 storage array before installing any hotfixes.  On my last reinstall, I didn’t enable RDAC multipathing until after the hotfixes were all installed.  If RDAC multipathing is enabled prior to HotFix 19 being installed then it will hang on restart.

So if this catches you out, a full reinstall is not necessary.  Restart your XenServer host and during the startup phase enter Safe Mode.  To do this, when XenServer is at the boot loader stage type “menu.c32” (be quick!)

You will then be presented with a blue screen with several options, scroll down to “safe” and hit Enter.

You may see a lot of “Buffer I/O Error” and “end_request: I/O Error” messages, this is normal on a system that has multiple paths to the storage array but doesn’t have a multipath driver enabled.  If you want, you can unplug your server from the fibre channel switch during startup to prevent these errors showing.

Once Xenserver has booted in Safe Mode, either in XenCenter or on the console of the host itself, from command prompt and re-enable RDAC multipathing as per the admin guide.  Ie type the following command and reboot (don’t forget to reconnect your storage array if you unplugged it).

/opt/xensource/libexec/mpp-rdac --enable

You should now be back in business!  Based on this experience, I would imagine it’s possible this issue would manifest itself with other XenServer versions and hotfixes / service pack installs – every version up to XenServer 6.1 still uses RDAC drivers for LSI storage arrays such as the IBM DS series.  Based on some reading since I encountered this issue, it appears installing certain hotfixes generate a new initrd which results in some sort of incompatible configuration with the RDAC drivers.  Running the “mpp-rdac –enable” command regenerates the initrd that is then correctly configured for RDAC.

XenDesktop VM PXE boot issue on XenServer

I had a previously stable XenDesktop farm start having issues today, with the client saying they could not connect to any of the Windows 7 desktops in their farm.  A quick investigation revealed most of the desktop VMs had been placed in maintenance mode by the Desktop Delivery Controller.  It will do this automatically if the VMs do not register after several reboots, see CTX126704 for the registry entries that control this behaviour.

After much investigation looking at the PXE and TFTP services on the Provisioning Servers, checking DHCP scopes weren’t full, confirming switch ACLs and VLANs were correct and more, I found the setting on the DHCP server for Conflict Detection had been recently enabled.  It was set to 2 attempts, but it seems this setting caused enough of a delay that the VMs would time out trying to PXE boot.  Once the setting was reverted to zero everything was back to normal.

Note that this client’s XenDesktop setup was XenServer 6.0.2, and the XenDesktop VMs were in the same VLAN as the Provisioning Servers so the client VMs could just broadcast to receive the PXE settings.  This is one of the preferred options for enabling TFTP high availablility as per Citrix Blog post by Nick Rintalan last year here (option 4 in his post).

I haven’t yet checked to see if this issue manifests itself in all environments, but now that I’ve seen it I recall having similar issues at another site where we couldn’t get PXE boot behaving reliably and we ended up handing out the TFTP server via DHCP options 66 & 67.  Hopefully this helps if you are troubleshooting a similar issue, and I’d be keen to hear from anyone that has the same issue and whether my fix above worked for you.