Once in a while you’ll be facing Murphy and its law. Similarly last week. I was doing some maintenance and had to snapshot a big SQL VM. Normally this wouldn’t be a problem, but this time I got an error stating: “A general system error occurred: Unable to save snapshot file”. That was the moment Murphy entered and all problems started.
After consulting the Vmware knowledge base, it seemed that this error usually occurs when there is not enough free space on the disk to hold the snapshot.
Now this machine was a big one and it already had an older snapshot, so this warning was not that odd. But when I tried to consolidate the older snapshot, I got the following error: “The object has already been deleted or has not been completely created”.
This proved to be a problem with Virtual Center. Virtual Center thinks that the snapshot is already deleted, but it’s still there. To resolve this issue, you have to connect your VI-client to the ESX host where the relevant machine is running and remove the snapshot from there.
After the removal of the old snapshot there still was not enough free space for the new snapshot, so I browsed the datastore to see what was taking up all my free space and saw that there were three more snapshots on the datastore that were not consolidated and were not visible in the snapshot manager.
To resolve this problem Vmware suggests to:
1. Shutdown the VM
2. Take a new snapshot in Snapshot manager
3. Delete ALL snapshots in Snapshot manager
The “Delete all snapshots” option, will consolidate all the snapshots, also the once that are not visible in the snapshot manager. And by shutting down the VM, the new snapshot will take up no space because the VM’s memory is empty.
So off to action 1, shutting down the VM. How hard can that be, right. Well Murphy had some other intentions. The VM wasn’t responding to anything, no RDP, no console, no guest shutdown (vmware tools).
Because of this non-responsiveness I was forced to use a hard reset, but even this wouldn’t reset the VM. It looked like the management service of the ESX host didn’t give the right state of the VM, so I had to dig a little further.
I made a console connection to the ESX host and restarted the management service #service mgmt-vmware restart. This however didn’t change anything about the state of the VM, and it was still unavailable.
The next thing I tried was stopping the VM from the command line:
Unfortunately this also didn’t help so I tried a hard shutdown:
#vmware-cmd /vmfs/volumes/<datastorename>/<vmname>/<vmname> stop hard
Even this hard shutdown didn’t help me shutdown the machine. I was left by killing the whole process of the VM (The world). To do so I did the following steps:
1. #ps auxfww | grep <name of the VM>
2. Search for the PID of the VM
3. #kill -9 <PID>
At 1 we search the process list for the process of the VM. At 2 we look for the process identifier of that process. And at 3 we hard kill the process using its process identifier.
After this, it was possible again to make snapshots and use the “Delete all snapshots” function. The process of deleting all snapshots can take quite a bit of time, depending on the size of the snapshots.