Tag Archives: error

Failed to deploy template (follow-up)

The workaround that I mentioned in this post turned out not to fix the problem as it has returned while doing a bulk deploy of 30 new virtual workstations. It is a nuisance but not blocking because I don’t use a non-persistent pool right now.

Next step will probably be to log a support call if we can’t find another possible solution.

Failed to deploy template: The virtual disk is either corrupted or not a supported format

Ran into this error when I was at a customer site and found that there is not a lot of information available online. The first post that I found describing the error was on VM/ETC. Have not found a definitive solution yet as I first suspected something had gone wrong with a snapshot on the particular template I was using. Built a new template and did not run into the error for a couple of days.

Then I found these posts (post 1, post 2) on the VMware Forums and have tried the workaround mentioned in this comment as the template and the VM’s to be generated from this template were located on the same LUN.

The error has not turned up since but it stayed away for a couple of days last time.

Computer account disabled when adding machine to domain

During a VMware View deployment I ran into this one because we chose to create a number of computer accounts beforehand. The answer from jwalsh in the VMware forums described the issue and the solution perfectly:

I’ve seen something very similar. The workaround was to use “username@domain” as the user in the customization specification.
The first time the join is done, it is done just with the username, if this fails then username@domain is used. It may be that the first failed attempt is locking the account in AD.

See the entire topic here: http://communities.vmware.com/message/1149793

Upload file to datastore error: Failed to login into NFC server

Got this error while trying to upload some ISO’s to a datastore ISO folder via a VirtualCenter client.

While looking for causes I found these possibilities:

  • Not enough space in the datastore
  • Port 901/902 not open between the VI Client to the source of the file (and/or VC Server, could not find if that is also a factor)
  • DNS configuration for the host servers

As a possible solution I also found that restarting the VC Server service could help.

In this particular case it turned out that DNS was not configured for the host servers so adding the servers to the hosts file of the machine running the VI Client did the trick.

Follow up: Oracle troubles

I will be writing a “lessons learned” post about our project in the near future but for now I will follow up on the Oracle problems we had last month.

What we learned from using Oracle in a VI environment is that memory settings turned out to be highly critical in relation to the performance of the VM. We didn’t have this problem on a physical machine, at least not as intrusive as this, but when Oracle doesn’t have enough memory available it starts very heavy swapping to the virtual disk. This results in a 100% CPU load and processes start to hang with an unresponsive application as the result for the users.

Adding memory to the amount the server needs (indicated by checking the swapsize in the VM) has fixed these problems.

Oracle troubles

I shouldn’t be writing that all is well on the Oracle front.

Just now two of the Oracle servers froze with database problems. The DBA tells me that they have block corrupts which he hasn’t seen in five years of running the things.

More later.

{edit}

Both instances were terminated by PMON due to unreadable
logfile or spfile. The errors look like this;

ORA-00471: DBWR process terminated with error
PMON: terminating instance due to error 471
Instance terminated by PMON, pid = 26973

ORA-00470: LGWR process terminated with error
PMON: terminating instance due to error 470
Instance terminated by PMON, pid = 25260

One instance is a 9.2.0.4 database, the other a 10.2.0.1 database, on separate virtual machines.

For Oracle 9i: Restore from backup.
For Oracle 10g: Resetlogs by:

startup mount
recover database until cancel;
recover database until cancel; (a message ‘media recovery complete’ will appear)
alter database open resetlogs;

We are viewing this as an unfortunate incident at the moment but will keep monitoring this closely.

All is well and we still love ESX

Well, in the end it didn’t need an official response from VMware to solve the problem:

It seems that our troubleshooting efforts (rescanning, resignaturing and rebooting the servers somehow fixed the problem). We checked the /var/log/vmkernel and found that after setting LVM.EnableResignature to on and LVM.DisallowSnapshotLUN to off the host already saw the LUN’s as regular LUN’s in stead of snapshots but we weren’t able to rename them.

At first we thought the two problems to be related but this morning (after sleeping over it) we started to look in another direction and found an orphaned Shared Storage resource that probably wasn’t cleaned up correctly by ESX with the rescan of the storage.

After deleting this orphan we were able to rename all the Shared Storage resources. After a final reboot of the host all Shared Storage resources were detected as normal LUN’s so everything is fixed now. I have made a post in the VMWare forums with more details.

The fun thing this morning was that we were running full production while fixing one host. We just VMotioned everything over to the other host ! And then when we were done we just VMotioned everything back. And we had no complaints from users while doing that !

I mean, taking 40 servers down for maintenance in the middle of the morning would normally be impossible. After hating it yesterday when it was giving us problems, I am loving it today.

SAN migration final

Well, as expected Murphy had to show up after everything had been going so well in the first part.

After connecting the ESX hosts to the new SAN we ran into what seems to be a documented problem: a LUN that is incorrectly detected as being a snapshot and therefore acces is restricted to that LUN.

This gives the problem that in the VI client you cannot see that Shared Storage resources. The solution is to flag LVM.DisallowSnapshotLUN to zero and rescan the drives. The problem we had is that ESX still thinks these are snapshot LUN’s. After some discussion we decided to remove all VM’s and re-add them to the inventory. This seems to work.

We will have to wait on final say from VMWare support what we do about this.