Saturday, March 23, 2013

VM Disk Timeouts

A pretty common issue to run into when using some SAN back-ends for virtual machines is that the VM's end up crashing, BSOD'ing, or (most commonly) remounting their "disks" read-only when there's a hiccup or failover in the storage system, often resulting in a need to reboot to restore functionality.

Updated 9/16/2013 to incorporate excellent suggestions of commenter Greg Smith.
Updated 5/13/2014 to incorporate on-the-job learning.


The most common fix is typically to increase the default timeout settings in the guest VM, and sometimes also in the host machine as well, as the root cause is usually that the SAN took longer than the default timeout to respond. This is usually because the SAN was involved in a failover, which can take > 60, or even > 120 seconds in some cases. I generally recommend setting it to at least 300 seconds, though 600 seconds or more I'm also perfectly happy with, personally. I only really have an issue with under 180 seconds or so.

This is in keeping with industry standards, I might add - VMware sets to 180, NetApp has long requested it be 180, and so on. I don't actually like how the timeouts and such are handled, and I especially do not like that in many scenarios the timeout is a global value applying to both the SAN-provided storage and that local spinning disk (which never needs or wants a timeout value this long), but them's the breaks I'm afraid.

Of course, the usual follow-up question from anyone told this is, "Ok, so where do I do that?" and then you're off to Google, and it can be annoying. Enough so that I decided to compile them all in one place, and add some scripts and such to simplify it (and be included in automated deployment tools, for instance). So, here you are.

Windows 2000, 2003, 2008, Vista, & Windows 7

Open the registry editor (regedit) and navigate to:

HKEY_LOCAL_MACHINE / System / CurrentControlSet / Services / Disk

Once there, look for 'TimeOutValue'. If it exists, edit it, and if it does not exist, right-click and choose 'Edit/Add Value' and create it. The type is REG_DWORD, and the value should be set in decimal to the timeout in seconds that you desire (so, I suggest, 300).

After that, if you're using the Microsoft iSCSI Initiator in the OS instead of being passed in the disk from a hypervisor, you should also modify the timeout value in the iSCSI initiator. On 2008, Vista, and Windows 7, navigate to:
HKEY_LOCAL_MACHINE / System / CurrentControlSet / Control / Class / {4D36E97B-E325-<HostID>
Under this key you'll find a number of subkeys named 0001, 0002 and so on. Expand each subkey until you find the one subkey that has another subkey called 'Parameters'. Within that Parameters subkey is the key you want, MaxRequestHoldTime. Modify it to 300 (decimal). There is another setting in here, LinkDownTime, that you would set instead if you're planning to use iSCSI MPIO on the Windows OS, but there's also other things to set for that and beyond the scope of this post for now.

These changes are permanent as far as I know, as well as global, so that's all you've got to do. I am unaware if you need to reboot for it to take affect, probably should to be sure.

Linux (2.6+ non-udev)

So the 'easy' but far from elegant solution is to go in and force the timeout to be higher on every block device you need to do so on. This is done on both 2.4 and 2.6 kernels by echo'ing the time in seconds you want at /sys/block/<device>/device/timeout, substituting the device name for <device>. So, for example, if the main disk (sda) was being offered up from the VM host and originated on a SAN and you wanted to make it timeout after 300 seconds, you'd do:

echo 300 > /sys/block/sda/device/timeout

The problem with this is that this isn't permanent, and will only survive until the system is rebooted. The quick and dirty answer to this is to add a command to do this into something like /etc/rc.local or create a full-blown init script that does it (be sure you add the command above the 'exit 0' that often ends the default rc.local file). For completeness, here's a simple script you can call from rc.local (put the contents below into a file, chmod +x it, and then call it from rc.local), that may or may not work for you out of the box (be sure to edit DISKS to be a list of the disks you care about):


#/bin/bash
#
# nex7.blogspot.com - VM Disk Timeouts - simple script for non-udev 2.6+ kernels
# - edit DISKS to be a list of disks you want to increase the timeout on to TIMEOUT_V

TIMEOUT_V=300
DISKS="sda sdb sdc"

for DISK in $DISKS; do
  echo $TIMEOUT_V > /sys/block/$DISK/device/timeout
done


Or, read on for the better way to do it if you have a fairly modern and mainstream distribution.

Linux (2.6+ with udev)

The slightly more complex but a bit more elegant method that I see, and that I wish the various major Linux distributions would adopt directly into their base releases, is something like what the VMware Tools does when installed on a supported Linux distribution. You can see their own explanation at this link.

The issue with this today is that not only is this only added if you install the VMware Tools, the line it adds to the udev rules only affects disks exposed using VMware. Something that will not help you if you are using Xen or KVM or VirtualBox and so on. So, something a bit more agnostic is called for. In building this little blog post, and coming upon this issue (admittedly for the umpteenth time), I decided to go ahead and finally do something about it.

My investigations so far have concluded there is no danger to 'bad' or unmatched rules in a udev rules file (at worst, you get a warning in syslog on boot from udev complaining about the lines it doesn't like, but it still parses the other rules fine). Thus, a simple single rules file put into /etc/udev/rules.d/ that contains rules for all possible OS and all possible exposed disks from a variety of virtualization hosts seems like the easiest way to go, so I give you this link. You can run the below command directly (as root) to install on most distributions (be sure /etc/udev/rules.d is where they go):

wget http://www.nex7.com/files/99-virt-scsi-udev.rules; mv 99-virt-disk-timeouts.rules /etc/udev/rules.d/;chmod 644 99-virt-disk-timeouts.rules

After putting it in /etc/udev/rules.d, just reboot. You can verify it is working with this one-liner (you're looking for results that have at least some entries that say '300', if you don't, it either isn't working or you don't have any disks the rules match against):

for file in `find /sys/devices -iname timeout`; do (echo $file && cat $file); done

And that's it. I've tested the file on CentOS 6.3 on top of KVM, Ubuntu 12.04 on top of KVM, and the VMware ones on a variety of OS's and versions. As far as I know, the list of presently supported virtualization platforms and guest OS's of this file are:

Hosts

VMware 5+ (disks offered up via scsi)
KVM 1.0+ (disks offered up via ide or scsi - virtio doesn't expose timeout at guest level)
XenServer 5+ (disks offered up via scsi)

Guests

RHEL 5+ / CentOS 5+
Ubuntu 10+

If you run into any problems with this file, please let me know.

FreeBSD 9

There are two variables that appear to be of note - and common wisdom seems to jump between which one to tweak. I'll err on the side of timeout over retry here, but that may not be the best option in all situations. To modify it, and it is a global variable as far as I can tell, you need to modify 'kern.cam.da.default_timeout' and change it from its default of 60 to 300. To modify it permanently, edit your /etc/sysctl.conf and add a line like this:

kern.cam.da.default_timeout = 300

If you're curious, the other variable mentioned online is 'kern.cam.da.retry_count', but I am less sure if the advice about it is fair or true.

NexentaStor (and other OpenSolaris-based derivatives)

So the easy way is to modify the sd timeout value. Unfortunately in OpenSolaris today, this value can only be set in /etc/system for all drives, with no config file method of setting it on a per-disk basis that I am aware of. To modify it globally, add this line to your /etc/system file and reboot:

set sd:sd_io_time=300

This is dangerous if there are any disks exposed to your VM that are not coming from a SAN and such, since this is a global value (much like the Windows one). There does exist a method of modifying the live value used by the kernel on a per-disk basis using mdb, but building this into a script to run on boot and when disks change I've decided not to try to tackle at this time. If you want more info, check out Alisdair's post on the issue, found here.

32 comments:

  1. In your "Linux (2.6+ with udev)" section, there's a small change I would make to the "find" pipeline that looks for devices without correct timeouts. First it's useful to show both the full filename and the timeout there, which takes two small changes:

    $ for file in `find /sys -iname timeout`; do (echo $file && cat $file); done
    /sys/devices/pci0000:00/0000:00:1f.1/host1/target1:0:0/1:0:0:0/timeout
    30
    /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/timeout
    30
    /sys/class/firmware/timeout
    60

    And if you look at the output from this system I found, it turns out there's this firmware timeout on there too. That doesn't seem as important to tune as the disk timeouts. What I settled on then to validate the disk timeouts are being set correctly was this pipeline, which only navigates /sys/devices where the disks are at. Here's sample output from a tuned VM install:

    $ for file in `find /sys/devices -iname timeout`; do (echo $file && cat $file); done
    /sys/devices/pci0000:00/0000:00:1f.1/host1/target1:0:0/1:0:0:0/timeout
    180
    /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/timeout
    180

    ReplyDelete
    Replies
    1. Good catch. Suggestion incorporated.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. Great code, the author is handsome! It seemed to me that you have it too detailed and from this large in size, I think you can reduce it at least twice if you use pseudo-classes and identifiers, for example, I generally recommend watching a video on Instagram on how to shorten any code by almost five times and not cut it its functionality, unfortunately I don't remember the name of this video, but I do remember that it had posted by account that had about 68 thousand of followers! I am sure that the owner of this account sometimes use the help of https://viplikes.net/buy-instagram-followers to quickly gain the number of profile followers.

      Delete
  2. should be:

    wget http://www.nex7.com/files/99-virt-scsi-udev.rules
    mv 99-virt-scsi-udev.rules /etc/udev/rules.d/
    chmod 644 /etc/udev/rules.d/99-virt-scsi-udev.rules

    ReplyDelete
  3. Awesome article. Thanks a lot for sharing...

    One uncommon question:
    Do you maybe know, how to configure this disk timeout parameter for an OS X Guest VM? I've tried it already with the one from FreeBSD, but unfortunately OS X doesn't recognize it.

    Any feedback appreciated!
    Thanks - Bojan

    ReplyDelete
  4. This has been the most helpful article about this problem.

    I found this while Googling about the problem I was having on Linux VMs.

    I find interesting that you have a suggested fix for Windows VMs. I've never seen this problem on my Windows VMs (2008 R2 and 7). In fact I've had my datastore offline for nearly an hour and all my Windows VMs recovered gracefully.

    I personally set this to 3600 seconds, because if there is a datastore issue, fixing it in 3 minutes is unlikely. Under an hour is to be expected.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. And what with XenServer ? XenServer block devices xvd* doesn't have any timeout parameters. We have very ugly crash with XenServer due NFS storage timeouts (not enough free space on ZFS storage). We subsequently tested all versions from XenServer 6.2 to 6.5SP1, NFS mount parameters (timeo, hard/soft), different Guest OSs and kernels (Ubuntu, CentOS) but without any positive results. All linux guests in xenserver crash immediately (<1s) when NFS server generate long IO response.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Thanks for the post and scripts Andrew!

    I am running KVM hypervisors with RHEL/Oracle Linux and Windows guests, and all guests are utilizing virtIO drivers/disks. So, since KVM does not expose timeout values to guests, what would my solution be if Nexenta is taking more than 60 seconds to failover?

    Do I only need to adjust timeout values for all the block devices on the hypervisors? If I adjust /sys/block/sda/device/timeout to 600 on the hypervisor, does this mean my virtIO VM will effectively have a timeout setting of 600 seconds?

    I see that my RHEL VM's don't have a timeout file under /sys/block/, but my Windows VM's still have the registry key. Is this registry key ignored when Windows uses a "Red Hat VirtIO SCSI Disk Device"(that is the description under Device Manger)?

    Thanks!

    ReplyDelete
    Replies
    1. These are very good questions. Too bad there were no responses. Let me know if you were able to get these answers. thanks.

      Delete
    2. This has been the most helpful article about this problem.

      I found this while Googling about the problem I was having on Linux VMs.

      I find interesting that you have a suggested fix for Windows VMs. I've never seen this problem on my Windows VMs (2008 R2 and 7). In fact I've had my datastore offline for nearly an hour and all my Windows VMs recovered gracefully.personal investigations

      Delete
  9. wget http://www.nex7.com/files/99-virt-scsi-udev.rules
    link is not working

    ReplyDelete
  10. Thank you for sharing helpful info. We've learned so much from your blog
    In offering quality education and academic excellence in South Asia, Lyceum Northwestern University has a lengthy heritage of over 50 years. Located in the Philippines town of Dagupan.

    ReplyDelete
  11. Thanks for this useful information...Good Job
    All The Best!!! cotton sarees in surat

    ReplyDelete
  12. Download your favorite Latest Mp3 Lyrics which are available in English, Hindi, Bangla, Telugu, Latin, Arabic, Russian, etc.

    Click Here

    Click Here

    Click Here

    Click Here

    Click Here

    ReplyDelete
  13. Great blog !It is best institute.Top Training institute In chennai
    http://chennaitraining.in/openspan-training-in-chennai/
    http://chennaitraining.in/uipath-training-in-chennai/
    http://chennaitraining.in/automation-anywhere-training-in-chennai/
    http://chennaitraining.in/microsoft-azure-training-in-chennai/
    http://chennaitraining.in/workday-training-in-chennai/
    http://chennaitraining.in/vmware-training-in-chennai/

    ReplyDelete
  14. its been long since i saw a post that's so educative and informational. i will make sure to share this my facebook group. you can also view contents on our websites below.

    French Bulldog Puppies For Sale

    French Bulldog Breeders

    French Bulldog Puppies For Sale Near Me

    French Bulldog Puppies For adoption

    French Bulldog Puppies


    Blue French Bulldog Puppies

    ReplyDelete
  15. it's so refreshing to see a post that talks straight to the point. thanks so much for writing about this it has really helped me with building my experience. thanks a lot



    siberian husky puppies for sale near me
    Siberian Husky puppies
    Siberian Husky puppies for adoption
      Siberian Husky puppies breeders near me  

      white Siberian Husky puppies  

    ReplyDelete
  16. I feel very glad to read your article. The content of the post is very informative and also i hope your next article is coming soon.
    Best Forex Course

    ReplyDelete
  17. I would like to say you are posting amazing article and i like your post very much. Also it is very informative. Thank you. Great work. Keep it up!!
    Private Investigator London

    ReplyDelete
  18. its been long since i saw a post that's so educative and informational. i will make sure to share this my facebook group. you can also view contents on our websites below. Private investigator uk

    ReplyDelete
  19. Your blog was quite frankly to us and has almost every answer to our question about virtual machines. Thanks for sharing and I hope you will keep sharing. PhD Dissertation Writing Services

    ReplyDelete
  20. Thanks for one marvelous posting! I truly enjoyed reading it, you might be a great author. I will make sure to bookmark your blog and will come back in the future. I want to encourage that you continue your great job.

    ibm full form in india |
    ssb ka full form |
    what is the full form of dp |
    full form of brics |
    gnm nursing full form |
    full form of bce |
    full form of php |
    bhim full form |
    nota full form in india |
    apec full form |

    ReplyDelete