ESX 3.0.x – vmware-hostd is not cool …

If you’re using autostart for your VMs. You’ll have to be very careful because it will SHUTDOWN your VMs!

The way autostart works in 3.0.x is that your autostart will automatically start the VMs with hostd and shutdown with hostd, so you don’t want to be restarting mgmt-vmware if you’re using autostart for your VMs.

interesting iSCSI – started w/ snapshot luns / resignature

“Error: Invalid vmhba name at position 1” uhhh … okay … And when you try logging into VC, vpxa crashes and you get:
Failed to serialize result of method vmodl.query.PropertyCollector.waitForUpdates: You get “Failed to serialize result” when logging into the host directly via the VIC as well, but it doesn’t crash vmware-hostd. So now what??? Well, we checked the SAN and it showed that the LUNs were presented properly. Then, we found that running:
killall -HUP vmkiscsid and then running:
esxcfg-rescan vmhba40
got us going again. Of course, we got the snapshot LUN problem again, so we just set the DisallowSnapshotLun to 0 and EnableResignature to 1 and then rescanned and it resignatured and changed the values back immediately after.]]>

Kill and Resurrect the Master Boot Record

The MBR is a 512-byte segment at the very beginning (the first sector) of a hard drive. This segment contains two major parts: the boot code in the first 446 bytes and the partition table (plus a 2-byte signature) in the remaining 66 bytes. When you run lilo, grub-install, or fdisk /mbr in DOS, it writes to these first 446 bytes. When you run cfdisk or some other disk-partition program, it writes to the remaining 66 bytes. Writing directly to your MBR can be dangerous. One typo or mistake can make your entire system unbootable or even erase your entire partition table. Make sure you have a complete backup of your MBR, if not your full hard drive, on other media (like a floppy or anything other than the hard drive itself) before you try any potentially destructive commands. The MBR is very important and crucial for booting your system, and in the case of your partition table, crucial for accessing your data recovery; however, many people never back up their MBR. Use Knoppix to easily create backups of your MBR, which you can later restore in case you ever accidentally overwrite your partition table or boot code.Or you can try to see if you can find a working carbonite offer code It is important to double-check each command you type, as typing 466 instead of 446 can mean the difference between blanking the boot code and partially destroying your partition table.

6.4.1 Save the MBR

First, before you attempt anything potentially destructive, back up the current MBR. Boot into Knoppix, and type the following command into a terminal:
knoppix@ttyp0[knoppix]$ sudo dd if=/dev/hda 


of=/home/knoppix/mbr_backup  bs=512 count=1
Change /dev/hda to match the drive you wish to back up. In your home directory, you should now see a 512-byte file called mbr_backup.Dd is used to create images of entire hard drives [Hack #48], and in this case, a similar command is used; however, it contains two new options: bs and count. The bs (byte size) option tells dd to input and output 512 bytes at a time, and the count option tells dd to do this only once. The result of the command is that the first 512 bytes of the drive (the MBR) are copied into the file. If for some reason you only want to back up the boot sector (although it’s wise to always back up the partition table as well), replace 512 with 446. Now that you have backed up the MBR, copy it to a safe location, such as another computer or a CD-ROM. The full 512-byte copy of the MBR contains the partition table, so it gets out of sync whenever you change partitions on your drive. If you back up the full MBR, be sure to update your backup whenever you make partition changes.

6.4.2 Kill the MBR

Now that you know how to back up, you should know how to totally destroy the MBR. To do this, simply use the same command you use to back up an MBR, but replace the input file with /dev/zero and the output file with the drive, overwriting each byte of the MBR with zero. If you only want to blank your boot code, type:
knoppix@ttyp0[knoppix]$ sudo dd if=/dev/zero of=/dev/hda bs=446 count=1
To clear the complete MBR, including the partition table, type:
knoppix@ttyp0[knoppix]$ sudo dd if=/dev/zero of=/dev/hda bs=512 count=1
While blanking the partition table in effect prevents you from accessing files on the drive, it isn’t a replacement for proper wiping of the complete drive, because the files are still potentially retrievable from the drive. Even the partition table itself is recoverable with the right tools [Hack #55] .

6.4.3 Resurrect the MBR

If you deleted your boot sector in the last section, you probably want to restore it now. To do this, copy the backup you made earlier to your home directory in Knoppix and run:
knoppix@ttyp0[knoppix]$ sudo dd if=/home/knoppix/mbr_backup of=/dev/hda 




bs=446 count=1
Because of the bs=446 element, this command only restores the boot code in the MBR. I purposely left out the last 66 bytes of the file so the partition table would not be overwritten (just in case you have repartitioned or changed any partition sizes since your last MBR backup). If you have accidentally corrupted or deleted your partition table, restore the full 512 bytes to the MBR with:
knoppix@ttyp0[knoppix]$ sudo dd if=mbr_backup of=/dev/hda 


bs=512 count=1

6.4.4 How Do I fdisk/mbr?

Knoppix also provides a useful tool called install-mbr that allows you to manipulate the MBR in many ways. The most useful feature of this tool is that it can install a “default” master boot record on a drive, which is useful if you want to remove lilo or grub completely from the MBR so Windows can boot by itself, or so you can install Windows to a hard drive that previously used Linux. The results are the same as if you were to type fdisk /mbr in DOS. To remove the traces of lilo or grub from your MBR, run:
knoppix@ttyp0[knoppix]$ sudo install-mbr /dev/hda
Replace /dev/hda with your drive.

6.4.5 See Also

The install-mbr manpage by typing man install-mbr in a console.]]>

Find Lost Partitions (from Knoppix Hacks)

OK, so you had a little too much fun with the previous hack, ignored the warnings, accidentally typed 512 when you should have typed 446, and now your partition table is gone. Or maybe you accidentally ran fdisk on the wrong drive. No problem. Just restore from the backup you made before you started. You did back up your MBR, right? Don’t worry; it happens to the best of us. The last time I trashed my partition table, I was trying to update grub on my laptop using dd. Like an idiot, I followed the instructions to create a grub boot floppy and applied them to install grub on my laptop’s hard drive. Overwriting the first 512 bytes of a floppy with the grub boot sector is fine; overwriting the first 512 bytes of my hard drive is not. I was unable to boot and had no partition table. For many people, this might have been the time to reinstall, but I knew the files and partitions were there—I just couldn’t get to them. If only I had a tool to figure out where the partitions began and ended, I could then recreate my partition table and everything would be back to normal. Lucky for me, there is such a tool: gpart (short for “guess partition”). Gpart scans a hard drive for signs of a partition’s start by comparing a list of filesystem-recognition modules it has with the sectors it is scanning, and then creates a partition table based on these guesses. Doubly lucky for me, gpart comes included with Knoppix, so I was able to restore my laptop’s MBR without having to take apart the laptop and hook the drive to a desktop machine. I ran gpart, checked over its guesses, which matched my drive, and voila! My partitions were back. Gpart is an incredibly useful tool, and I am grateful for it; however, it does have its limitations. Gpart works best when you are restoring a partition table of primary partitions. In the case of extended partitions, gpart tries its best to recover the partition information, but there is less of a chance of recovery. To recover your partition table, run gpart, and then tell it to scan your drive:

knoppix@ttyp0[knoppix]$ sudo gpart /dev/hda

By default, gpart only scans the drive and outputs results; it does not actually write to the drive or overwrite your MBR. This is important because gpart may not correctly guess all of your partitions, so you should check its guesses before you actually write them to disk. Gpart scans through the hard drive and outputs possible partition tables as it finds them. When it is finished scanning the drive, gpart outputs a complete list of partition tables it has found. Read through this list of partitions and make sure that it reflects the partitions you have created on the disk. It might be that gpart can recover only some of the partitions on the drive. Once you have reviewed the partitions that gpart has guessed, run gpart again but with the -W option to write the guessed partition table to the disk:
knoppix@ttyp0[knoppix]$ sudo gpart -W /dev/hda /dev/hda

This isn’t a typo; you do actually put /dev/hda twice in the command. You can potentially tell gpart to write the partition table to a second drive, based on what it detected on the first drive. Once the partition table has been written, reboot and attempt to access the drives again. If you get errors when mounting the drives, check the partitioning within Knoppix with a tool like fdisk, cfdisk, or qtparted to see whether gpart has incorrectly guessed where your partition ends. I’ve had to modify a partition that gpart ended 4 MB too early, but afterwards, the filesystem mounted correctly, and I was able to access all of my files. It is scary to be in a position where you must think about partition-table recovery. At least with Knoppix and gpart, it’s possible to recover the partition table without completely reinstalling the operating system.]]>

Fiber Channel – Node Types

    Used in Point to Point or Switched Fabric topologies.
    N_Ports are connected to each other through the fabric topology.
    Found on HBAs and Storage Processors.
F_PORT (Fabric Ports)
    Used in a Switch Fabric topology.
    Found on a switch – enables HBA & Storage Processor to connect.
NL_PORT (Arbitrated Loop)
    Supports Arbitrated Loop topology.
    These ports are found on HBAs & Storage Processors.
FL_PORT (Fabric port with Arbitrated Loop capabilities)
    Ports on a switch that support connecting to an Arbitrated Loop.
E_PORT (Extension Port)
    Extension port for interconnecting switches in a multi switch fabric.
    Ports on a switch that connect other switches to the fabric.
G_PORT (Generic Port)
    It can be configured to either an E_Port or an F_Port.
    These ports are found on a switch, not on a HBA or an SP.
]]>

How do I resignature VMFS3 volumes that are not snapshots

You could run into this issue after changing the Host Mode setting on your HDS Storage Array. This results in all of his VMFS3 volumes being seen as snapshot volumes by the ESX server.

A similar situation occurs when you set the SPC-2 flag on your EMC Symmetrix Storage Array.

This procedure is used to allow you to change the Host Mode setting/director flags on your array and make all of the VMFS3 volumes visible again.

1. Stop the running VMs on all the ESX servers.

2. Change the Host Mode/Director flags on the Storage Array – now when you rescan, you will see snapshot LUN mentioned in /var/log/vmkernel.

3. Enable LVM Resignaturing on the first ESX server => set LVM.EnableResignature to 1.

-log to the ESX with VI client

-select the configuration tab

-select the Advanced setting option

-select the LVM section

-make sure that the fourth and last option allowresignaturing is set to 1.

-save the change

-select storage adapter

-select rescan adapter

-leave the default option and proceed

-you should now be able to see the VMFS

4. Disable LVM Resignaturing

-log to the ESX with VI client

-select the configuration tab

-select the Advanced setting option

-select the LVM section

-make sure that the fourth and last option allowresignaturing is set to 0.

-save the change

5. No snapshot messages should now be visible in the /var/log/vmkernel.

6. Re-label the volume

-log to the ESX with VI client

-select Datastores view in inventory view

-select the datastore, right click, select remove to remove the old label as this is associated with the old UUID of the volume

-select Hosts & Clusters view instead of Datastores view

-in the summary tab, you should see the list of datastores

-click in the name field for the volume in question and change it to the original name – you now have the correct original label associated with the resignatured volume

7. Now rescan from all ESX servers

8. Re-register all the VMs

-Because the VMs will be registered against the old UUID, you will need to re-register them in VC.

-log to the ESX with VI client

-select the configuration tab

-select Storage(SCSI, SAN & NFS)

-double-click on any of the datastores to open the Datatstore browser

-navigate to the .vmx file of any of the VMs by clicking on the folders

-right click, select ‘add to inventory’

9. Remap any RDMs

-If you a VM which uses an RDM, you will have to recreate the mapping.

-the problem here is that you may not be able to identify which RDM is which if you used multiple ones.

-if they are different sizes, then this is ok – you should be able to map them in the correct order by their size

-make a note of the sizes of the RDMS and which VMs they are associated with before starting this process

-make a note of the LUN ID before starting this process too – you may be able to use this to recreate the mapping

-if they are all the same size, this is a drag since you will have to map them and boot the VM, and then check them

-if you do not use RDMs, you can ignore this step

10. Powering on the VMs

-start the VM, reply yes if prompted about a new UUID

-if any of the VMs refer to missing disks when they power up, check the .vmx file and ensure that the scsi disk references are not made against the old uuid instead of the label.

-if any of the VMs refer to missing disks when they power up, check the .vmx file and ensure that the scsi disk references are not made against the old label instead of the new label if you changed it.

11. Repeat steps 3 thru 10 for all subsequent ESX servers that are still seeing snapshot volumes.

-if all ESX servers share the same volumes, then this step will not be necessary

How to Troubleshoot ESX 2.5.x by loading vmkernel manually

# chkconfig vmware off This will let you boot into ESX without starting the VMkernel. Reboot the server and allow it to boot into the standard “ESX” mode. You will notice that on the next reboot that although ESX was selected, the typical VMware services will be skipped. This provides you with a clean slate to manually step through the process of loading the VMkernel to narrow down the root cause of your boot issues. 1. Load the vminx module:
# /sbin/insmod -s -f vmnixmod You will get a message about tainted drivers, which can be ignored.
2. Load the VMKernel itself:
# /usr/sbin/vmkloader /usr/lib/vmware/vmkernel 3. Allow the VMkernel to run Linux drivers:
# /usr/sbin/vmkload_mod -e /usr/lib/vmware/vmkmod/vmklinux linux As we understand it, this is the step in which the final transformations are occurring to load the management console as a virtual machine.
4. Make sure all devices are enumerated:
# /usr/sbin/vmkchdev -n The next steps would be system specific based on the hardware installed in the system. This is typically where we see a majority of the issues while loading the VMkernel. If the system freezes while loading a specific module, you have narrowed down your issue to a very specific portion of the boot process and further investigation may be performed with VMware support or other methods. To review which modules need to be loaded, check the contents of your vmkmodule.conf file:
# cat /etc/vmware/vmkmodule.conf We will utilize one of our servers as an example configuration. vmklinux linux
nfshaper.o nfshaper
bcm5700.o vmnic
e1000.o vmnic
aic79xx.o aic79xx We are now going to load the drivers one by one using vmkload_mod. Since the vmklinux module was previously loaded in step 3 above, it is not necessary here. If a module is commented out, it is not required in this step.
Load the packet shaper driver (This is disabled by default) # /usr/sbin/vmkload_mod /usr/lib/vmware/vmkmod/nfshaper.o shaper Load an Intel e1000 network adapter
# vmkload_mod /usr/lib/vmware/vmkmod/e1000.o vmnic Load a Broadcom BCM5700 network adapter
# vmkload_mod /usr/lib/vmware/vmkmod/bcm5700.o vmnic Load a SCSI adapter
# vmkload_mod /usr/lib/vmware/vmkmod/aic79xx.o aic79xx If any one module hangs the system, you have found your culprit. A complete list of steps followed should be documented in the event a support call needs to be opened with VMware. The above steps will help narrow problems to a specific area. If the system starts as expected without error in the above process VMware support should be consulted to help further analyze why a particular system may hang during its boot process.
When all is said and done, do not forget to re-enable the VMkernel services on startup with the following command:
# chkconfig vmware on]]>