Thursday, March 19, 2009

ESXTOP series 1

“ESXTOP” is very handy command to check host status from the performance point of view. I have decided to write series of ESX related tips.

In this series we will check network status

So if you press

1. esxtop

2. s - to have delay calculated in second

3. then 2 for two second delay

4. n to check nic status

esxtop s 2 n

Output will be

clip_image002

Monday, March 16, 2009

Fix sshd_config issue

Sometime “sshd_config ” get messed because of ILO flaky or if you are not good with VI. So here is what we have to do to get it fix

1. If “sshd_config” is under /etc/ssh/sshd_config got modified you may not be able to do ssh into ESX host.

2. If host is already connected to virtual center then upload the file using “Browse Database ” and upload it under /vmfs/volumes partition. From there move it to /etc/ssh folder. You can use sshd_config from any other host.

clip_image002

clip_image004

mv /vmfs/volumes /sshd_config /etc/ssh/

I usually do this only after I add the host into VC. Then restart ssh service

/etc/init.d/ssh restart.

This should fix the problem

Recover ESX password for different flavor of ESX

ESX Server 3.x:

  1. Reboot the ESX Server host.
  2. When the GRUB screen appears, press the space bar to stop the server from automatically booting into VMware ESX Server.
  3. Use the arrow keys to select Service Console only (troubleshooting mode).
  4. Press the a key to modify the kernel arguments (boot options).
  5. On the line presented, type a space followed by the word single.
  6. Press Enter. The server continues to boot into single-user mode.
  7. When presented with a bash prompt such as sh-2.05b#, type the command passwd and press Enter.
  8. Follow the prompts to set a new root user password.
  9. When the password is changed successfully, reboot the host using the command reboot and allow VMware ESX Server to boot normally.

ESX Server 2.x:

  1. Reboot the ESX Server Host.
  2. When the LILO screen appears hit the space bar to stop the server from automatically booting into VMware ESX Server.
  3. At the LILO prompt select "linux" adding the -s to the end of the line, it should read: linux -s
  4. Press Enter. The system begins to boot. The server continues to boot into single-user mode.
  5. When presented with a bash prompt such as sh-2.05b#, type the command passwd and press Enter.
  6. Follow the prompts to set a new root user password.

    When the system is finished booting up, you can log in as the root user using the new password. When the password is changed successfully, reboot the host using the command reboot and allow VMware ESX Server to boot normally.


ESX Server 3.5:


A lost Root password cannot be recovered, however it can be reset. The process below outlines how to do this.


  1. Power on the host server. When the ESX bootloader selection screen appears, press a to allow you to modify kernel arguments



  1. Type single to add the single argument to the kernel arguments and then press enter



  1. The host server will now boot



  1. Once the host server has booted, you will be presented with a prompt such as sh-2.05b#. At the prompt enter the command passwd and press enter



  1. Enter the new Root password. Retype the new password at the prompt. Once changed successfully the all authentication tokens updated successfully message will appear



  1. Reboot the host server by typing reboot at the prompt and pressing enter



Source
  1. http://knowledge.xtravirt.com/hot-tips/esx-3x/11-esx-server-3x/104-recovering-from-a-lost-esx-root-password-in-esx-35.html

  2. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1317898

Friday, March 13, 2009

Netapp Filer error "retrying all the portals again, sincethe portal list got exhausted"

One of the ESX host which is hooked up to software iSCI was throwing following iSCSI related messages into VMKernal :

Mar 11 05:10:17 esxhost vmkernel: 4:12:19:31.755 cpu0:1076)iSCSI: session 0x35203f90 connect timed out at 38997177 Mar 11 05:10:17 esxhost vmkernel: 4:12:19:31.755 cpu0:1076)<5>iSCSI: session 0x35203f90 iSCSI: session 0x35203f90 retrying all the portals again, sincethe portal list got exhausted Mar 11 05:10:17 esxhost vmkernel: 4:12:19:31.755 pu0:1076)iSCSI: session 0x35203f90 to iqn.1992-08.com.netapp:sn.118062462 waiting 60 seconds before next login attempt Mar 11 05:11:17 esxhost vmkernel: 4:12:20:31.756 cpu0:1076)iSCSI: bus 0 target 0 trying to establish session 0x35203f90 to portal 0, address 10.100.200.137 port 3260 group 2000 Mar 11 05:11:32 esxhost vmkernel: 4:12:20:46.756 cpu1:1074)iSCSI: login phase for session 0x35203f90 (rx 1076, tx 1075) timed out at 39004677, timeout was set for 9004677 Mar 11 05:11:32 esxhost vmkernel: 4:12:20:46.756 cpu0:1076)iSCSI: session 0x35203f90 connect timed out at 39004677 Mar 11 05:11:32 esxhost vmkernel: 4:12:20:46.756 cpu0:1076)<5>iSCSI: session 0x35203f90 iSCSI: session 0x35203f90 retrying all the portals again, sincethe portal list got exhausted

As per NetApp, this is somewhat expected behavior from the array itself; as the filer is designed to provide default access through iSCSI to any hosts configured to access any portal groups;ESX
host's iSCSI initiator logs into the filer's target IP address.
Using Dynamic discovery, the ESX host issues a “SendTargets” command to the filer. Filer responds with “ALL available” target portals which by default includes any network interface with iSCSI enabled (this is the default behavior). ESX host sees this list of portals and attempts to establish connection on them . Does it randomly select from the available target portals, or does it have some logic built in about how to determine which portal to try first? ESX host attempts to log into a target portal to which it has no network route. This causes the errors to be generated in the “vmkernel” logs. So, in order to fix this problem from the filer perspective, we added an accesslist for the iSCSI initiator node name of the ESX host. Basically, by default an iSCSI initiator has access to all available portals, as I mentioned above. In order to limit the target portals that are reported back to the ESX host,
we create an access list entry that states which iSCSI target portals are available to a given iSCSI initiator node name.

Update/Remove Network card for vSwitch

There may the time when you have to update/remove vSwitch NIC with some other NIC
To do that run following command which will add vmnic0 into vSwitch0
esxcfg-vswitch -U vmnic0 vSwitch0

This will remove vmnic0 from vSwitch0
esxcfg-vswitch -L vmnic0 vSwitch0