Tag Archives: storage

Failed to deploy template (follow-up)

The workaround that I mentioned in this post turned out not to fix the problem as it has returned while doing a bulk deploy of 30 new virtual workstations. It is a nuisance but not blocking because I don’t use a non-persistent pool right now.

Next step will probably be to log a support call if we can’t find another possible solution.

Failed to deploy template: The virtual disk is either corrupted or not a supported format

Ran into this error when I was at a customer site and found that there is not a lot of information available online. The first post that I found describing the error was on VM/ETC. Have not found a definitive solution yet as I first suspected something had gone wrong with a snapshot on the particular template I was using. Built a new template and did not run into the error for a couple of days.

Then I found these posts (post 1, post 2) on the VMware Forums and have tried the workaround mentioned in this comment as the template and the VM’s to be generated from this template were located on the same LUN.

The error has not turned up since but it stayed away for a couple of days last time.

SMB virtualization with Equallogic PS5000E

I am currently working on an SMB design which will probably include a Dell/Equallogic PS5000E unit. As this unit works with 7200rpm SATA disks it will be interesting to see if the performance is on par with all the metrics that we collected.

If it works like we have calculated that it will it could make the cornerstone of a very nice standardized SMB virtualization design.

Storage network switching will be done by Cisco 3750 switches as they are proven technology. In the ideal world it would be nice to have a side by side comparison of 3750’s with cheaper alternatives like the PowerConnect 6224 or comparable ProCurve’s but for now the 3750’s will do nicely.

All is well and we still love ESX

Well, in the end it didn’t need an official response from VMware to solve the problem:

It seems that our troubleshooting efforts (rescanning, resignaturing and rebooting the servers somehow fixed the problem). We checked the /var/log/vmkernel and found that after setting LVM.EnableResignature to on and LVM.DisallowSnapshotLUN to off the host already saw the LUN’s as regular LUN’s in stead of snapshots but we weren’t able to rename them.

At first we thought the two problems to be related but this morning (after sleeping over it) we started to look in another direction and found an orphaned Shared Storage resource that probably wasn’t cleaned up correctly by ESX with the rescan of the storage.

After deleting this orphan we were able to rename all the Shared Storage resources. After a final reboot of the host all Shared Storage resources were detected as normal LUN’s so everything is fixed now. I have made a post in the VMWare forums with more details.

The fun thing this morning was that we were running full production while fixing one host. We just VMotioned everything over to the other host ! And then when we were done we just VMotioned everything back. And we had no complaints from users while doing that !

I mean, taking 40 servers down for maintenance in the middle of the morning would normally be impossible. After hating it yesterday when it was giving us problems, I am loving it today.

SAN migration final

Well, as expected Murphy had to show up after everything had been going so well in the first part.

After connecting the ESX hosts to the new SAN we ran into what seems to be a documented problem: a LUN that is incorrectly detected as being a snapshot and therefore acces is restricted to that LUN.

This gives the problem that in the VI client you cannot see that Shared Storage resources. The solution is to flag LVM.DisallowSnapshotLUN to zero and rescan the drives. The problem we had is that ESX still thinks these are snapshot LUN’s. After some discussion we decided to remove all VM’s and re-add them to the inventory. This seems to work.

We will have to wait on final say from VMWare support what we do about this.

SAN migration status

For those who are interested: we are now in our second day of the SAN migration from MSA1000’s to an EMC CX3-40.

So far everything is going OK, with a few hiccups yesterday evening. The speed of the migration (we use EMC’s SANCopy for the data move) has surprised us with seven out nine LUN’s ready by yesterday six o’clock in the evening. Two LUN’s from one MSA, with primarily fileserver data, didn’t copy over normally. We suspect because of the controller having difficulties with the I/O load. After the EMC consultant rescheduled the copy jobs and moved them to another storage processor on the CX they competed fine at nine o’clock yesterday.

All that is left now (…) is to patch the fibers to the new Cisco MDS9020 SAN switches, configure the zones for the servers / LUN’s and bring the servers online with their new storage.

SAN migration this weekend

If you were wondering why there seemed to be a lack of activity: we are migrating to our new SAN this weekend.

To have a relatively stable environment during our migration planning we only converted two fileservers (600GB and 500GB respectively). Although they are the largest fileservers we have and with this conversion all fileservers have now been virtualized the impact on the hosts has been minimal.

Memory usage is now 54% and 50% and CPU usage is 16% and 11% per host. Delivery of our new hosts has been confirmed for next week and since the licenses have all been bought and activated already we can quickly add these servers and carry on with the migrations.

Clariion CX3-40 ordered

emc logoAs of yesterday afternoon we will be EMC customers. The configuration as described in my earlier post hasn’t changed so that is what we’ll be ordering. EMC will also assist with the migration from our current HP MSA 1000 based SAN. The highlights of the progress of that process will also appear here as it impacts our virtualization project.

PowerConvert issue + storage status

PowerConvert has been giving us some headaches with the licensing. We have a Proof of Concept package but the license doesn’t allow us to deploy it. A call has been entered with Platespin and we are awaiting their information.

Also the decision on storage has not been taken yet because of political issues. EMC has our preference but there are other factors at play.

The final configuration looks like this:

  • Clariion CX3-40 with:
    • 2 cages with 146GB 10k FC disks (for transaction systems and VM’s)
    • 1 cage with 300GB 10k FC disks (mostly for fileserver data)
    • 1 cage with 500GB LC FC disks (LC = LowCost / for archiving of data and diskbased backup)
  • 2 Cisco MDS 9020 SAN Switches
  • EMC DiskXtender
    • This is software that does policy based file server archiving (if it goes to EMC we’ll be adding EmailXtender in 2007 for policy based e-mail archiving)
  • SanCopy
    • This is software to make array based copies between different SAN’s.
  • Snapview
    • This is software to make snapshots and clones
  • Navisphere
    • SAN Management software

Archiving for now will happen straight to the 500GB disks. In the future and taking into account changes that could happen in the field of data retention and new legislation we have the option of adding more near-line storage or adding Centera’s.

We will keep our MSL5026 robot for now and do backup and e-mailarchiving in 2007. We wouldn’t have the personnel for it anyway and it saves some serious money in the budget.

I consider myself lucky that we do not have to be SOX compliant or overly apprehensive because of liability. There is some privacy law that we have to live with but the consequence for the infrastructure is minimal. For example: BCM and our ISO27001:2005 certification are more important at the moment.

Have good weekend,