ESXi 5.5 Hosts Not Restoring Storage Paths Following SAN Switch Failure

While performing resilience tests for a customer I identified that following the failure of a SAN switch the ESXi hosts did not restore lost storage paths once the SAN switch was brought back into service.

The infrastructure in question was an IBM Flex System containing x240 compute nodes running vSphere ESXi 5.5.0 from an internal USB memory key. The compute nodes had the IBM Flex System FC5054-4-port 16Gb FC Adapters installed in them, which are Emulex LPe16000 quad port fibre channel adapters. The Flex System chassis contained 2 x IBM Flex System FC5022 24-port 16Gb ESB SAN Scalable switch, this is a Brocade Switch. The storage was presented from an IBM XIV over the Fibre Channel fabric.

When one of the SAN switches was pulled from the rear of the Flex System to simulate it failing then half of the paths to each LUN were lost. Once the SAN switch was brought back into service the paths via it still stayed “dead”. A reboot of the ESXi host appeared to be the only method of restoring these paths.

I tracked the problem down to the driver being used by the ESXi hosts for the fibre channel adapters; this was version 10.0.100.1-vmw.550.0.0.1331820. I downloaded the Emulex FC adapter for these cards from the VMware website, these was version 10.0.725.203-1OEM.550.0.0.1331820 (https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI55-EMULEX-LPFC-10072744&productId=353).

I uploaded the VIB file (lpfc-10.0.725.203-1OEM.550.0.0.1331820.x86_64.vib) from the downloaded package to a datastore named XIV_TEMPLATES and stored it in a folder named Drivers. All of the ESXi could access this datastore so I could then use the following procedure to update the driver on each of the ESXi hosts

  1. Put the host into maintenance mode first
  2. Enable SSH
  3. Login to the host via SSH as root
  4. Check the current version with the command
    esxcfg-module -i lpfc
    There is a lot of output from this command and the version is displayed on the 3rd line
  5. Update the driver with the command
    esxcli software vib update -v /vmfs/volumes/XIV_TEMPLATES_DS_01/Drivers/lpfc-10.0.725.203-1OEM.550.0.0.1331820.x86_64.vib
    output should be
    Installation Result
    Message: The update completed successfully, but the system needs to be reboot
    Reboot Required: true
    VIBs Installed: Emulex_bootbank_lpfc_10.0.725.203-1OEM.550.0.0.1331820
    VIBs Removed: VMware_bootbank_lpfc_10.0.100.1-1vmw.550.0.0.1331820
    VIBs Skipped:
  6. Reboot the host
  7. Check the version is now 10.0.725.203-1OEM.550.0.0.1331820 by running the command in 4 above, you will need to enable SSH again

Once the driver was updated and the hosts rebooted then the dead paths were automatically restored following a failure of the SAN switch.

This entry was posted in Flex System, IBM, Storage, VMware, vSphere, XIV. Bookmark the permalink.

2 Responses to ESXi 5.5 Hosts Not Restoring Storage Paths Following SAN Switch Failure

  1. renek says:

    Normally you would have a dual HBA and 2 san switch setup for redundancy so if 1 component goes down your virtual machines will keep running.

  2. Pingback: ESXi 5.5 Hosts Not Restoring Storage Paths Following SAN Switch Failure (Pelicano Hints & Tips) | NMS Test

Leave a Reply to renek Cancel reply

Your email address will not be published. Required fields are marked *