Saturday, September 19, 2009

VMotion keep failing at 78% with Error bad0007

VMotion was failing at 78% with error message "A general system error occurred: Failed waiting for data. Error bad0007."

This was brand new cluster and everything is standard as per other cluster like ESX version ,host physical config etc…

I tried following:

V-motion was failing for all VM's within a cluster at 78%

  • Tried to vmotion other vms within this cluster
  • All of the Vm's failed at 78%
  • Implemented KB 1003577 by installing Update manager
  • Creating a baseline and updated 2 of the esx hosts involved in this and now both of the ESX Servers are now compliant with the latest versions of patches.
  • Created a new cluster called "test Cluster" added two ESX host to this cluster
  • Cold migrated over a VM and tried a Vmotion. It was still failing at same stage (78%)
  • Checked/changed the Following variables according to KB 1003577

Migrate.PageInTimeoutResetOnProgress: Set the value to 1.

Migrate.PageInProgress: Set the value to 30, if you get an error after configuring the

Migrate.PageInTimeoutResetOnProgress variable.

Toggle the Migrate.enabled setting from 1 to 0, click OK, then flip the value back to 1, click OK.

These variables are both identical to what was recommended in the KB

Tried a V-motion this failed at 78%

  • Tried Vmotion Servers on different Datastores this failed too.
  • Swopped the Vmotion network to eliminate a problem with the type of card been used and this did not make a difference. It still fail at 78%
  • Reduced the amount of ram that was been used in the VM this was set to 4GB to 2GB this still keep failing
  • Installed Vmware Tools to this VM.

I created brand new test VM and it works fine. So it looks like there are something fishy about the VM itself. So we started to look at the log and found following

Checked the contents of the Vmware log for the VM been v-motioned an error similar to the following appeared

The vmware.log for the Virtual Machine (VM) being migrated has entries similar to:


May 26 12:06:08.162: vmx Migrate_SetFailure: Now in new log file.

May 26 12:06:08.167: vmx Migrate_SetFailure: Failed to write checkpoint data (offset 33558528, size 16384): Limit exceeded May 26 12:06:08.186: vmx Msg_Post: Error May 26 12:06:08.186: vmx [vob.vmotion.write.outofbounds] VMotion [c0a8644e:1243364928717250] failed due to out of bounds write: offset 33558528 or size 16384 is greater than expected May 26 12:06:08.186: vmx [msg.checkpoint.migration.openfail] Failed to write checkpoint data (offset 33558528, size 16384): Limit exceeded.

May 26 12:06:08.187: vmx ----------------------------------------

May 26 12:06:08.190: vmx MigrateWrite: failed: Limit exceeded


When the amount of Video ram was commented out the following error appeared with the Log file for the VM


Aug 18 08:35:38.194: vmx MKS REMOTE Loading VNC Configuration from VM config file

Aug 18 08:35:38.196: vmx DVGA: Full screen VGA will not be available.

Aug 18 08:35:38.196: vmx Msg_Post: Warning

Aug 18 08:35:38.196: vmx [msg.svgaUI.badLimits] The size of video RAM is currently limited to 4194304 bytes, which is insufficient

for the configured maximum resolution of 3840x1200 at 16 bits per pixel.

Aug 18 08:35:38.196: vmx

Aug 18 08:35:38.196: vmx The maximum resolution is therefore being limited to 1180x885 at 16 bits per pixel.

Aug 18 08:35:38.196: vmx

Aug 18 08:35:38.196: vmx ----------------------------------------

Aug 18 08:35:38.199: vmx SVGA: Truncated max res to VRAM size: 4194304 bytes VRAM, 1180x885


I then checked the .vmx for the problematic server and found that

svga.vramSize= 4194304.

Where as newly created VM has following setting for .vmx file


After investigation with the application owner we found that these are desktop modeling workstation and uses software from HP . HP has recommended this setting for their software to be functional.

One of the KB article explain that “Video Ram (VRAM) assigned to the virtual machine is 30MB or less”

Checked with HP support and they asked me to contact VMWare . When we contacted VMware they told that it is the limitation with ESX3.5 and had been taken care with ESX4.0 U1.

........................................................................Crap …................................................................................

No comments: