Tuesday, October 21, 2008

Using ZFS to manage your VM zoo

ZFS - good for VMs

If you need to support or develop for another operating system or if you need to run that one app for which no Mac version or alternative exists, then you will probably be familiar with one of these products - VMware Fusion, VirtualBox or Parallels. These let you run virtual machines from within your Mac, so you don't have to leave your OS X goodness when playing in foreign OS's.

Unfortunately, once you get past one or two, managing VM's can take a whole lot of disk-space, and you often find yourself (re)creating the same thing over and over. Fortunately, ZFS, the new generation file system from Sun Microsystems, can deliver the sort of features one might want:
  • compression
  • snapshot/rollback
  • clone-able file-systems
I won't bother describing ZFS in more detail - others have done that very well. However it should be pointed out that OS X 10.5 (Leopard), supports ZFS in read-only mode right out of the box. Speculation about Apple's future plans for ZFS aside, you can start dabbling with ZFS right now, using the binaries from the nice folk over at Mac OS Forge.

This article will walk through the following scenario:
  • need to host a couple of Windows VM's (e.g. a network connected VM wth anti-virus installed for running office apps, and a separate non-networked, lightweight VM for playing DirectX games).
  • need to conserve disk space as much as possible.
If you have never seen the Apple kernel-panic screen, consider yourself warned. There is a high likelihood that you may encounter it. Obviously, ZFS for OS X is still not production ready, so if you use this and end up losing all your files you have no-one to blame but yourself. I warrant that this procedure is not useful for anything other than potentially losing you a whole bunch of data. If you continue pass this point, then upon your head be it. Having said, that I have tried to find a narrow but safe path which if followed diligently, should lead to something useful.

Still going huh? Well, brave or foolhardy - its your call.

Step 0: Install ZFS

Download and install the ZFS binaries for Mac from here (please follow the instructions carefully).

Step 1: Procure physical storage

This step is kind of optional. If you have a physical storage device that you want to dedicate to ZFS (whether it be a partition or a whole disk, you can skip this step - just make sure your storage device is mounted). In case anyone needs to be told, DO NOT USE YOUR BOOT DISK FOR THIS!

If, like me, you don't have another physical storage device, you will have to resort to creating a disk image using Disk Utility which will contain the ZFS storage. Make sure you specify the following:
  • Volume Name: winxpvm (or whatever you like)
  • Save As: WINXPVM_ZFS (or whatever you like)
  • Volume Size: 30 Gb (or as big as you like - be generous, we are using the sparse bundle so it won't preallocate the whole amount)
  • Volume Format: Mac OS Extended (Journaled)
  • Encryption: none
  • Partitions: Single partition - GUID Partition Map
  • Image Format: sparse bundle disk image
I've chosen to use the sparse bundle format in order to make the resulting disk images Time Machine friendly.

So now we have somewhere to create our file systems: From the Terminal, using the diskutil command we see something like this:
$ diskutil list
0: GUID_partition_scheme *186.3 Gi disk0
1: EFI 200.0 Mi disk0s1
2: Apple_HFS Macintosh HD 186.0 Gi disk0s2
0: GUID_partition_scheme *30.0 Gi disk1
1: EFI 200.0 Mi disk1s1
2: Apple_HFS winxpvm 29.7 Gi disk1s2
Your list will vary. In my example, /dev/disk1 is the disk we've just created. and /dev/disk1s2 is the winxpvm volume.

Step 2: Format disk and create storage pool

Format the disk using ZFS (make sure you specify the correct disk here!):
$ diskutil partitiondisk /dev/disk1 GPTFormat ZFS %noformat% 100%
Started partitioning on disk disk1
Creating partition map
[ + 0%..10%..20%..30%..40%..50%..60%..70%..80%..90%..100% ]
Finished partitioning on disk disk1
0: GUID_partition_scheme *30.0 Gi disk1
1: EFI 200.0 Mi disk1s1
2: ZFS 29.7 Gi disk1s2
For the next few commands we will need to load the ZFS kernel extension:
$ sudo kextload /System/Library/Extensions/zfs.kext

Now, we can create our zpool:

$ zpool create winxpvm /dev/disk1s2
$ zpool upgrade winxpvm

The upgrade command above upgrades the on-disk format. This may not be desirable (and can be safely ommitted) if you will be sharing the ZFS disk image with other stock-Leopard users.

Now we can see our basic ZFS file system:
$ zfs list
winxpvm 257K 29.0G 170K /Volumes/winxpvm
Step 3: Tweak the ZFS knobs
Having created the core ZFS filesystem we want to tweak a few settings. The following commands will:
  • turn on ZFS compression
  • tune the record-size (i.e. block size)
  • turn off updating the last-accessed time of files.
  • turn off Spotlight indexing of our ZFS pool.

$ zfs set compression=on winxpvm
$ zfs set recordsize=16K winxpvm
$ zfs set atime=off winxpvm
$ touch /Volumes/winxpvm/.metadata_never_index
$ chmod 444 /Volumes/winxpvm/.metadata_never_index
$ sudo rm -rf /Volumes/winxpvm/.Spotlight-V100

Note that I've set the record-size to 16K, the default block-size for FAT32, which is what I will be using as the disk format for my VM's. This step is based on my guess that it will help performance. I'm not really sure as I don't know enough about how VMware works (which is what I'm using) to know for sure. The default record-size for ZFS is 128K. Most Linux distributions use ext3 which has a default 4K block-size. If any ZFS or VMware mavens care to reply regarding the best way to go here then I will very grateful.

Step 4: Create the base file-system for our VM
In order to be able to clone ZFS file-systems, we need to have at least a two-level hierarchy of file-systems under the root ZFS filesystem. Then we will be able to do the following:
  • create a "winxpvm/base" file-system to hold our clean VM configuration (by "clean" I mean basic install with minimal customisations).
  • create a snapshot of the base before adding anti-virus software to it
  • clone the base (as at the point before we installed the anti-virus) to a new file-system: "winxpvm/gamesnonet"

$ zfs create winxpvm/base
$ zfs list
winxpvm 248K 29.0G 131K /Volumes/winxpvm
winxpvm/base 18K 29.0G 18K /Volumes/winxpvm/base

$ zfs snapshot winxpvm/base@empty

Note how I created a snapshot (called "empty") of the empty file-system. If anything goes wrong when I create the VM, I can always rollback to the empty state using a command like:
$ zfs rollback winxpvm/base@empty

What we now have is a volume within a volume (from the OS point of view). In the Finder it looks like this:

Step 5: Install base VM

Obviously this depends on your software and requirements. Just make sure that you create the VM in: /Volumes/winxpvm/base (or your equivalent if you've changed any of the names).

When you're happy with the VM, shut it down and create a "post-installation" snapshot of the base.
$ zfs snapshot winxpvm/base@origin

"Origin" is the snapshot of our fresh, clean VM - the "origin" of all cloned VM's.

Step 6: Clone VM

We've now created the VM in our base file system.
$ zfs list
winxpvm 5.19G 23.8G 79.5K /Volumes/winxpvm
winxpvm/base 5.16G 23.8G 4.15G /Volumes/winxpvm/base
winxpvm/base@empty 20K - 22K -
winxpvm/base@origin 225M - 2.50G -

Now we want to clone the freshly installed VM for other purposes (in my example for installing games). To do this, we clone the base file-system:
$ zfs clone winxpvm/base@origin winxpvm/gamesnonet
Ta-da! Cloning takes all of a second to do, and takes no extra space. This is because ZFS employs copy-on-write when cloning file-systems i.e. it only writes differences from the original. As the clone becomes more and more different from the original, it will take up more space.

We now have 2 VM's for the price of one. You can open each one in your VM software and customize them as you will. It's worth creating a snapshot in the clone in-case we make a mistake when customising it.
$ zfs snapshot winxpvm/gamesnonet@origin

After some customisation of both base and clone, we end up with:
$ zfs list
winxpvm 5.19G 23.8G 79.5K /Volumes/winxpvm
winxpvm@empty 236K - 274K -
winxpvm/base 5.16G 23.8G 4.15G /Volumes/winxpvm/base
winxpvm/base@empty 20K - 22K -
winxpvm/base@origin 225M - 2.50G -
winxpvm/base@antivirus 247M - 2.63G -
winxpvm/base@office 253M - 3.06G -
winxpvm/gamesnonet 26.2M 23.8G 2.51G /Volumes/winxpvm/gamesnonet
winxpvm/gamesnonet@origin 23K - 2.51G -

As you can see, I've installed a few things on my base VM and taken snapshots along the way. Further clones can be made off any snapshot.

If you plan on simultaneously running multiple VMs which share the same origin, make sure that you've customised them such that they have different network MAC addresses, IP address and host names. If you don't, many hours of fun and hilarity will ensue.

Step 7: Auto-mount ZFS volumes at login

Since I regularly use my VM's I wanted to have them auto-mount when I login. I did this by adding my sparse-image bundle to my login items in System Preferences.

You will get an error when mounting such ZFS formatted disk-images, however it can be safely ignored:

The volume mounts just fine.

Step 8: Unmounting ZFS volumes

This is where I've had a few kernel panics. The trick seems to be to get ZFS to unmount its mounted file-systems before trying to eject the sparse disk bundle (the "outer" volume). For each file-system in your ZFS sparse disk bundle:
$ sudo zfs unmount {filesystem}
Now you should be able to safely eject.


ZFS on OS X is not quite up to Apple's production release standards yet. However, for the brave of heart, it can prove quite a useful little tool in a tight corner. This article has explored one tiny aspect of the basics of what ZFS can do. If Apple can port all the awesome goodness that Sun have created, and merge it into their next release (much like they did when they took DTrace and created Instruments), then it will be a pretty cool achievement.