Drive:Activated logo
Content sign

Recovering VMware snapshot after parent changed

   Filed under: , ,    

Scroll down to the problem or solution section below if you want to cut to the chase. 

I upgraded my Kubuntu installation to Gutsy today - of course, it wasn't as smooth as it should've been. First I had to work out how to do it - the instructions were brief, screenshots confusing, and the process just didn't feel natural. The 'version upgrade' button only appears after you have satisfied certain conditions, conditions that you don't know. It just magically appears when it wants to, after pressing a special sequence of buttons.

Then the 'distribution upgrade' process crashed, packages won't install. Ended up working after a few tries.

For some stupid reason, they still haven't fixed the 'failed to set xfermode' bug that heaps of people have encountered and really cripples the system because the system doesn't boot at all. In fact, it removes the fix for it too - adding irqpoll to the end of the kernel line for the appropriate entry in /boot/grub/menu.lst.

Plus they introduced a new bug by adding tablet settings into /etc/X11/xorg.conf by default, even if no tablet exists, tripping up the system. And did I mention that the network connection is flaky and standby/hibernate still doesn't work? Linux is still Linux it seems.

Anyway, it all worked out in the end after some googling so I went to install VMware Server on it so I could run my virtual machines on it as well as in Windows. There is no package install available for it, so follow the instructions here, however, use this patch instead.

Once all that was working, I ran the VMware Console, about to run my Windows Server 2003 Standard Edition virtual machine, when I thought, hmm..., I don't want this VMware instance fudging with the Windows VMware instance, so I'll create a new virtual machine, and link it to the existing virtual hard disk.

Problem

All sounded cool, until I accidentally linked to the base parent hard disk, and not the latest snapshot. So once I booted it, not only did I not have the latest changes, but when I re-linked it to the latest snapshot, it wouldn't boot anymore. Instead I got the error message, "Cannot open the disk ... Reason: The parent virtual disk has been modified since the child was created."

Did I mention that the virtual machine housed the test instance for this website, including the changes I had been working on all weekend, and I had no other backup? Stick out tongue

After a few minutes of cursing and swearing, banging on tables, wondering wtf I had done, and pondering redoing all those changes again, I did what every self-respecting nerd does when they're stuck - turn to google.

Solution

I found these links:

Here is my solution, which is basically a rewrite of the process in the last link above, with a few more details. I used Linux to do the recovery, mainly because it had commands that I needed. I assume you have some Linux command line knowledge, as all this will be performed in the terminal.

  1. Make a copy of the virtual machine folder in case you screw up.
  2. Look at the size of the snapshot virtual hard disk. If it is more than 2GB and you're running a 32-bit OS, or it is more than the amount of memory that you have available, the following method will probably not work. You're welcome to try though.

    The virtual hard disk files all end in .vmdk. The snapshot one has -xxxxxx on the end of the file name, indicating the snapshot number. For example, if my virtual machine was called Windows Server 2003 Standard Edition, my base parent virtual disk will be named Windows Server 2003 Standard Edition.vmdk, and my snapshot may be named Windows Server 2003 Standard Edition-000002.vmdk.
  3. Find out the CID of the base parent virtual hard disk. Because this virtual hard disk will most likely be larger than 2GB, you won't be able to open it in nano, vi etc. As we only need to read from it, we can use a linux command to print out only the first 20 or so lines.
    head --lines=20 {base parent vmdk path}

    Replace {vmdk path} with the path to the base parent virtual hard disk file, e.g.
    head --lines=20 /media/sda1/"Virtual Machines"/"Windows Server 2003 Standard Edition"/"Windows Server 2003 Standard Edition.vmdk"
    The CID is the 8-character random string on the line starting with CID=. Write this down somewhere.
  4. Now open up the snapshot virtual hard disk in a text editor, and change the parentCID (not CID) to the CID you recorded in the previous step. Then save. You can use nano, vi or some other Linux editor, e.g.
    sudo nano {snapshot vmdk path}
    Make sure to sudo the command, and also be patient - it could take a few minutes, during which the console may remain black; it is loading.

    I chose to do this in Windows instead, using Editpad Lite which is amazingly fast.
  5. That's it, your virtual machine should now start up again.

Further explanation

If you're interested, here's a deeper look into what you just did. At the beginning of each vmdk file is a disk descriptor section, which contains the properties of that virtual hard disk in text. The CID is a random unique identifier that identifies a particular state of the virtual disk - each time a change is made to the virtual hard disk, the CID changes.

In normal operation, the CID property of the base parent virtual hard disk is synced with the parentCID property of the snapshot virtual hard disk to show that the two files work together. The snapshot has to work with the base parent to be useful, as it only contains the differences from the base parent virtual hard disk. It is important to note that it is the snapshot's parentCID property that is synced with the base parent's CID property, not just the two CID properties in the virtual hard disks - the two virtual hard disks are in a parent-child relationship.

When you startup the base parent virtual hard disk on its own however, changes are made to that virtual hard disk without being in sync with the snapshot, so the CIDs no longer match.

And when the CIDs no longer match, VMware complains because the snapshot is out of sync and the changes in the snapshot may not apply properly to the base parent anymore, possibly resulting in data corruption.

By forcing the CIDs to match again, you effectively trick VMware into thinking it was never out of sync.

Depending on how complex your virtual machine is though, it may be worth recreating your virtual machine after recovering your data because it won't be known where the corruption is, if any. If you did anything to the base parent virtual hard disk before realising and shutting down, e.g. copied files around, the risk of corruption is higher.

Trackbacks sign
6 Trackbacks
Trackback URL

Pingback from  Ojat’s Blog  » Blog Archive   » Problem Dengan VMware Snapshot Disk

Pingback from  The parent virtual disk has been modified since the child was created « A Blog on Tech

Comment sign
Tech4Him tracked back:

Average: Select ratingPoorOkayGoodGreatAwesome Your rating: None Okay, I'm really thanking the good Lord right now. He granted us discernment that kept us from losing an entire day's worth of data for my employer. To Him be all the glory. T

Pingback from  VMWare ESXi - VM Crashes during failed Snapshot Delete | Tech4him - Technology with Integrity

Pingback from  Oracle/VMware/Hardware Week of Hell at APEXtras

Pingback from  /dev/zero » Fixing Vmware virtual disks

Comments sign
41 Comments
Comments RSS RSS icon
Comment sign
Ian said:

Hi Samuel --

One suggestion: instead of opening the snapshot file to replace the parentCID number (which, as you point out, doesn't work if the snapshot is >2GB), use command line utilities to make the change.

I found my parent CID from the base vmdk with:

grep --text -m2 CID= {base vmdk}

and the "wrong" parent CID in the snapshot vmdk:

grep --text -m2 CID= {snapshot vmdk}

Then replaced the child CID using a sed command:

sed -e 's/{wrong CID}/{right CID}/' {snapshot vmdk} > {snapshot vmdk}

That should get it done!

Comment sign
Sam said:

Good idea Ian.

Gotta admit that thought never really crossed my mind as my snapshots were small enough. My Linux command/regexp skills aren't that awesome, so I had no idea about the sed command, but I'm kicking myself for not using grep to find the parentCID and CID lines - so obvious now.

Thanks for the tip!

Comment sign
Oliver said:

Thank you!

That certainly saved me from my own stupidity. Even before I had a chance to lose any sleep.

From now on my snapshots are going to experience very short lives.

Test and commit shall be the new motto.

Comment sign
Francis said:

fantastic advice

Comment sign
Justin said:

You friggen rock!  You saved my 6 hours of a night shift and 2 secs of stupidity!

Comment sign
Lorenz said:

Thank you! Great manual!

Comment sign
fallermax said:

I would like to say thank you very much! This manual was very helpful. Now i will live longer.

if you have windows 32bit system you can open and save big files with the program "winhex". It is very fast - i tried it out because i had not linux on my notebook.

Comment sign
Lucas Violini said:

What a day.. This really really saved me. Now I'll have to re-do our backup policy, keep everybody out of our vmware, but most of all CONGRATULATE you for your skills and knowledge. This saved me and now I have a much better understanding of those freaking snapshots. You are the MEN!

Comment sign
Mike Slass said:

The outline of the fix is this:

1) BACK EVERYTHING UP

2) lookup the CID of the parent disk image

3) lookup the (incorrect) parentCID of the curdled snapshot

  (you'll need both to make the sed command as restrictive as possible)

4) KEEPING THE BACKUP, remove the original of the curdled snapshot file

5) pipe just the beginning of the curdled snapshot through sed to change the parentCID

     and save that as the beginning of the reconstructed snapshot

6) append the rest of the curdled snapshot to the end of the reconstructed snapshot

dd is the tool for snipping pieces of a HUGE file

And here's how it looks in practice:

[root@build12 virtual_machines]# cp -R sea-cm-winvm01 /backup

[root@build12 virtual_machines]# cd sea-cm-winvm01

[root@build12 sea-cm-winvm01]# head -10 /backup/sea-cm-winvm01/sea-cm-winvm01-000001.vmdk

KDMV

# Disk DescriptorFile

version=1

CID=0d55cd6c

parentCID=b1ce363c                           <-- INCORRECT PARENT CID

createType="monolithicSparse"

parentFileNameHint="sea-cm-winvm01.vmdk"

# Extent description

RW 83886080 SPARSE "sea-cm-winvm01-000001.vmdk"

[root@build12 sea-cm-winvm01]# head -10 /backup/sea-cm-winvm01/sea-cm-winvm01.vmdk

KDM

Disk DescriptorFile

version=1

CID=d68511e8                                 <-- CORRECT PARENT CID

parentCID=ffffffff

createType="monolithicSparse"

# Extent description

RW 83886080 SPARSE "sea-cm-winvm01.vmdk"

[root@build12 sea-cm-winvm01]# rm sea-cm-winvm01-000001.vmdk

[root@build12 sea-cm-winvm01]# dd if=/backup/sea-cm-winvm01/sea-cm-winvm01-000001.vmdk count=10 | sed 's/parentCID=b1ce363c/parentCID=d68511e8/' >sea-cm-winvm01-000001.vmdk

10+0 records in

10+0 records out

5120 bytes (5.1 kB) copied, 0.00722415 seconds, 709 kB/s

[root@build12 sea-cm-winvm01]# dd if=/backup/sea-cm-winvm01/sea-cm-winvm01-000001.vmdk skip=10 seek=10 of=sea-cm-winvm01-000001.vmdk oflag=append

75301238+0 records in

75301238+0 records out

38554233856 bytes (39 GB) copied, 716.488 seconds, 53.8 MB/s

Comment sign
Sam said:

Thanks Mike for that - the solutions for this problem are getting more and more streamlined :) The one thing I'd probably add is to pipe the head commands through grep to pick out only the CID and parentCID lines. A bash script anyone? (Although I'd rather do it line by line just to be sure; it's worth understanding how VMWare works underneath anyway.)

You gotta wonder why VMWare hasn't automated a solution for this yet given how common it seems to happen. Then again, I'm not sure if I want to use their solution, given their track record with VMWare Converter - it's extremely slow, and often randomly fails for no obvious reason.

Comment sign
WeSam said:

THANK YOU .... YOU JUST SAVED ME WITH YOUR BLOG...

I used "010 Editor" to edit the 30G file, which was very fast.. no loading time even.

Comment sign
WeSam said:

THANK YOU .... YOU JUST SAVED ME WITH YOUR BLOG...

I used "010 Editor" to edit the 30G file, which was very fast.. no loading time even.

Comment sign
OMG said:

Thank you so much, you saved my life!

Comment sign
redfive said:

A MLLION THANKS!

I messed the VMDKs of our main production server after attaching the main VMDK to another virtual machine to add some Windows files. When I attached the HD to the original virtual machine, I didn't boot any more, came up with the dreaded "parent modified..." message.

Fixed it on ESX server 3.5 from the console, with "head --lines=20" and "nano", following your instructions. Worked perfectly! the main file was 137Gb and there was 3 snapshot files, about 10Gb each. The snapshots were linked from last to first and then to the main file (3->2->1->original)

After fixing the CIDs, the machine worked fine, even after having writing and then deleting some files inside the VMDK.

You are a Star!

Angel, Santiago de Compostela, Spain.

Comment sign
Tang said:

I have solve the issue follow your steps.

But I didn't work in Linux.

I make a simple tool for windows.

Main Code:

try

           {

               txtResult.Clear();

               StreamReader sr = new StreamReader(txtPath.Text);

               decimal Up = nudLines.Value;

               decimal i = 0;

               while (i < Up)

               {

                   txtResult.Text += sr.ReadLine();

                   txtResult.AppendText("\r\n");

                   i++;

               }

               sr.Close();

           }

           catch (Exception ex)

           {

               MessageBox.Show(ex.ToString());

           }

Comment sign
Martin said:

Great solution!!

It save me a lot of time. Because I don't have to reinstall the hole system.

Thankyou very much.

Comment sign
untill said:

I googled, found you, and you just saved my day. Quick, comprehensive, and easy.

Thanks a lot!

Comment sign
Mark Fitzwater said:

Thank you. You save my life. I moved our primary domain controller only to find it would start up. AHH.

Your fix did the trick. In esx 3.5 the files you mention are much smaller now and the main disk is called ***flat.vdmk

Guys... to sum it up : THANK YOU!!!

I too had the bad luck of a non-booting VM.

This page contains more relevant info than the rest of the web...

Again... THANKS, you guys saved me weeks of work!!

Bert

Comment sign
Matt said:

Pefect this saved my bacon.  We had the issue described but the problem occured during a VCB backup.

Comment sign
Ruediger said:

You saved my day. 2 weeks of work where in that snapshot the i just clocked an old "Copy of ".vmx file.

I had more adrenaline than blood in me. If you are every looking for someone th marry you... ;-)

Thanx Ruediger

Comment sign
T. Lucas said:

Thanks for your post. It got us through quite a pickle last night when ESXi blew up a VM during a snapshot deletion. Great stuff!

Comment sign
WT said:

Thanks for many hours saved

For me this was the most useful blog entry since the beginning of blogs!

Somehow the CID's got messed up with the vmware-mount.pl command, so be careful with this and make a backup before using this command!

Thanks a lot!

Comment sign
puck said:

Just a quick note. You can also get the Parent CID from the vmware.log It will say something like "Content ID mismatch (f6c96825 != f6c96826)."

Comment sign
Ken said:

After reading all of the Techno Babble, I finally came to an article that I can understand!!! Thank you many times!!!

Comment sign
Marcos said:

I moved my VM (including snapshot) to a different blade, got this error message and had idea what to do. VMWare forums not really that helpfull or clear!

Thanks to your great article I'm up and running again.

Thanks very much, you saved my bacon >8)

Comment sign
Peter said:

And here's one more sucker you saved with this article! Thank you very much!! Yesterday i noticed my vm-harddisk (60Gb) had grown to use 160 Gb (no typo..) of diskspace. Of course i didn't backup at that time because of lack of backup-facilities/diskspace at that moment.. So went further and further from home... :0)

Anyway, thnx once more!!

regards,

Peter

Comment sign
damian said:

guys, i am new to vmware and have an esx 3i server running with 3 vm's. one of my colleagues has tried to clean up some snapshots and is now getting this error. how do i edit/access the vmdk files? is there a way to do this from the Infrastructure client or can i run the linux commands on the actual VMware server itself? pretty deperate - this is (or was) a live server. i have data backups but dont want to rebuild if the fix here is valid for my situation.

Comment sign
ricky said:

You are a genius! Thank you

Comment sign
Jones said:

I use the following to fix this problem:

1.  putty into the host

2.  run vmware-cmd -l to find the path of the bad VM

3. CD /path/to/vm/

4. cat NAMEOFTHEDISK-xxxxxx.vmdk (for hard disk 1)

5. (A) cat NAMEOFPARENTDISK.vmdk (shown in the previous command's output for parentFileNameHint

5. (B) keep running cat parent.vmdk until you have displayed each snapshot, it's parent --> to the base.vmdk disk

example...

[root@VMHost01 Server1]# cat SERVER1-000001.vmdk

# Disk DescriptorFile

version=1

CID=fe498eca

parentCID=66ed665b

createType="vmfsSparse"

parentFileNameHint="SERVER1.vmdk"

# Extent description

RW 35358082 VMFSSPARSE "SERVER1-000001-delta.vmdk"

# The Disk Data Base

#DDB

[root@VMHost01 Server1]# cat SERVER1.vmdk

# Disk DescriptorFile

version=1

CID=66ed665b

parentCID=ffffffff

createType="vmfs"

# Extent description

RW 35358082 VMFS "SERVER1-flat.vmdk"

# The Disk Data Base

#DDB

ddb.adapterType = "buslogic"

ddb.geometry.sectors = "63"

ddb.geometry.heads = "255"

ddb.geometry.cylinders = "2200"

ddb.uuid = "60 00 C2 9e 7c 4c 5e c4-ea f5 d8 1e 6c 36 06 40"

ddb.geometry.biosSectors = "63"

ddb.geometry.biosHeads = "255"

ddb.geometry.biosCylinders = "2200"

ddb.toolsVersion = "7299"

ddb.virtualHWVersion = "4"

6.  Notice the CID and ParentCID entries of the output:

server1-000001.vmdk

CID=fe498eca

parentCID=332a8cca   <---- THIS ONE IS NOT POINTING TO...

server1.vmdk

CID=66ed665b    <---- THIS ONE

parentCID=ffffffff

7.  run the following:

nano server1-000001.vmdk

edit the parentCID by overwriting 332a8cca with 66ed665b

Do CTRL+X and then answer 'Y' to save the changes

8.  now go back and show the output again (use the cat commands like before) each parentCID should be pointing to the parent file that VMWare expects as listed in the parentFileNameHint.

9.  once this is completed if you do not need the snapshot you should also go to the VMClient and go to snapshot manager and delete all snapshots.  If there are no snapshots to delete, create one, then immediately delete it.  This should remove all snapshots.

*** NOTE *** if you have to create a snapshot, you may want to check it's CID/ParentCID for all disks to make sure VMWare didn't do something stupid like create a snapshot file with a CID and ParentCID pointing to itself.  If that occurs, just fix the pointers like before and then delete all snapshots.

This works for me 100% of the time when I have any of the corruptRedo log errors, Parent had been modified errors, bad CID/ParentCID issues, or VM in stuck state due to failure to consolidate snapshots after VCB backups

Comment sign
Marc BENISTY said:

Thanks for your help guys !

You saved me :)

Particulary regarding the tool to edit a 60GB vmdk file very quickly with no delay !! (010 Editor... great tool !)

Comment sign
George K said:

You are a true Saint.  After accidently clicking on the vmdk file while backing up my Mac, I got the dreaded error.   5 hours into the ordeal, I finally got things back up and running.  It took hours to back everything up first.  Then, I couldn't find a good editor that could handle a 14Gig snapshot on the Mac.   I finally found 0xED which Rocked!!!

www.suavetech.com/.../0xed.html

Great text editor loads the file instantly  -a quick hex edit of the Parent CID on the snapshot file and I was back up and running.   Keep up the good work and thanks for posting such a concise solution.    Going to bed now...

Comment sign
Aaron said:

I'm having a crisis with the same issue!  I need to get the Outlook data off of this silly VMWare Fusion windows XP partition.

Comment sign
Jeff said:

Thanks for this article!! You saved my ass, thank you!

Comment sign
Sepp said:

Thanks a lot for googleing around and cutting the information down to the essential point - you saved me really much time rescuing my VM :o)

And as it seems, I'm not the only stupid one who screws up links to parent VM-Disks etc. :o)

Comment sign
lemonadecowboy said:

Hi all,

just a maybe stupid question:

can I recover VM with snapshots from main *.vmdk,  *flat.vmdk and only *delta.vmdk (without describtor file *.vmdk)?

more precise, I have:

server.vmdk

server-flat.vmdk

server-000002-delta.vmdk

server-000003-delta.vmdk

I successfully recovered VM from only *flat.vmdk, however, withnout snapshots - so it is possible to recover from *delta.vmdk all the rest files, like *.vmsn, *.vmdk(for *delta.vmdk)

Comment sign
Rui said:

Hi,

i follow your steps but when i try to start vm i get "Failed to retrieve disk information for: xxx.vmdk" Success

and i can't startup :(

can you help me?

Comment sign
Ulli said:

In case anyone needs more background to solve similar problems see sanbarrow.com/sickbay.html

Comment sign
Jonathan Marianu said:

Thank You very very much.

I was trying to be slick by using a common VMDK on a RAM disk and run multiple concurrent copies of VMs each with their own vmx and snapshots. That part had been working for months.

Then I set one of my vmx's to use the base vmdk as independent-nonpersistent. That broke the entire chain and nothing would boot! I got this sick sick feeling in my stomach. Then I read your blog and I was hopeful. I updated the CID in my snapshots and it all worked.

Thank You, Thank You, Thank You.

Comment sign
Mr. TSE said:

If you have a large amount of snapshots finding where is the broken CID can take a while.

Here is a script that will do that check and many others for you.

http://vmutils.blogspot.com/

And here is a video to help you avoiding surprises with the snapshots.

www.youtube.com/watch

Post comment sign
Leave a Comment
I know you want to!
(required)  
(optional)
(required)  

Want to keep stay in the loop with the comments here? Leave your email address below and you'll be informed when a new comment is added to this blog post.

(optional):  

Submit