Monday, March 16, 2009

Fix sshd_config issue

Sometime “sshd_config ” get messed because of ILO flaky or if you are not good with VI. So here is what we have to do to get it fix

1. If “sshd_config” is under /etc/ssh/sshd_config got modified you may not be able to do ssh into ESX host.

2. If host is already connected to virtual center then upload the file using “Browse Database ” and upload it under /vmfs/volumes partition. From there move it to /etc/ssh folder. You can use sshd_config from any other host.

clip_image002

clip_image004

mv /vmfs/volumes /sshd_config /etc/ssh/

I usually do this only after I add the host into VC. Then restart ssh service

/etc/init.d/ssh restart.

This should fix the problem

Recover ESX password for different flavor of ESX

ESX Server 3.x:

  1. Reboot the ESX Server host.
  2. When the GRUB screen appears, press the space bar to stop the server from automatically booting into VMware ESX Server.
  3. Use the arrow keys to select Service Console only (troubleshooting mode).
  4. Press the a key to modify the kernel arguments (boot options).
  5. On the line presented, type a space followed by the word single.
  6. Press Enter. The server continues to boot into single-user mode.
  7. When presented with a bash prompt such as sh-2.05b#, type the command passwd and press Enter.
  8. Follow the prompts to set a new root user password.
  9. When the password is changed successfully, reboot the host using the command reboot and allow VMware ESX Server to boot normally.

ESX Server 2.x:

  1. Reboot the ESX Server Host.
  2. When the LILO screen appears hit the space bar to stop the server from automatically booting into VMware ESX Server.
  3. At the LILO prompt select "linux" adding the -s to the end of the line, it should read: linux -s
  4. Press Enter. The system begins to boot. The server continues to boot into single-user mode.
  5. When presented with a bash prompt such as sh-2.05b#, type the command passwd and press Enter.
  6. Follow the prompts to set a new root user password.

    When the system is finished booting up, you can log in as the root user using the new password. When the password is changed successfully, reboot the host using the command reboot and allow VMware ESX Server to boot normally.


ESX Server 3.5:


A lost Root password cannot be recovered, however it can be reset. The process below outlines how to do this.


  1. Power on the host server. When the ESX bootloader selection screen appears, press a to allow you to modify kernel arguments



  1. Type single to add the single argument to the kernel arguments and then press enter



  1. The host server will now boot



  1. Once the host server has booted, you will be presented with a prompt such as sh-2.05b#. At the prompt enter the command passwd and press enter



  1. Enter the new Root password. Retype the new password at the prompt. Once changed successfully the all authentication tokens updated successfully message will appear



  1. Reboot the host server by typing reboot at the prompt and pressing enter



Source
  1. http://knowledge.xtravirt.com/hot-tips/esx-3x/11-esx-server-3x/104-recovering-from-a-lost-esx-root-password-in-esx-35.html

  2. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1317898

Friday, March 13, 2009

Netapp Filer error "retrying all the portals again, sincethe portal list got exhausted"

One of the ESX host which is hooked up to software iSCI was throwing following iSCSI related messages into VMKernal :

Mar 11 05:10:17 esxhost vmkernel: 4:12:19:31.755 cpu0:1076)iSCSI: session 0x35203f90 connect timed out at 38997177 Mar 11 05:10:17 esxhost vmkernel: 4:12:19:31.755 cpu0:1076)<5>iSCSI: session 0x35203f90 iSCSI: session 0x35203f90 retrying all the portals again, sincethe portal list got exhausted Mar 11 05:10:17 esxhost vmkernel: 4:12:19:31.755 pu0:1076)iSCSI: session 0x35203f90 to iqn.1992-08.com.netapp:sn.118062462 waiting 60 seconds before next login attempt Mar 11 05:11:17 esxhost vmkernel: 4:12:20:31.756 cpu0:1076)iSCSI: bus 0 target 0 trying to establish session 0x35203f90 to portal 0, address 10.100.200.137 port 3260 group 2000 Mar 11 05:11:32 esxhost vmkernel: 4:12:20:46.756 cpu1:1074)iSCSI: login phase for session 0x35203f90 (rx 1076, tx 1075) timed out at 39004677, timeout was set for 9004677 Mar 11 05:11:32 esxhost vmkernel: 4:12:20:46.756 cpu0:1076)iSCSI: session 0x35203f90 connect timed out at 39004677 Mar 11 05:11:32 esxhost vmkernel: 4:12:20:46.756 cpu0:1076)<5>iSCSI: session 0x35203f90 iSCSI: session 0x35203f90 retrying all the portals again, sincethe portal list got exhausted

As per NetApp, this is somewhat expected behavior from the array itself; as the filer is designed to provide default access through iSCSI to any hosts configured to access any portal groups;ESX
host's iSCSI initiator logs into the filer's target IP address.
Using Dynamic discovery, the ESX host issues a “SendTargets” command to the filer. Filer responds with “ALL available” target portals which by default includes any network interface with iSCSI enabled (this is the default behavior). ESX host sees this list of portals and attempts to establish connection on them . Does it randomly select from the available target portals, or does it have some logic built in about how to determine which portal to try first? ESX host attempts to log into a target portal to which it has no network route. This causes the errors to be generated in the “vmkernel” logs. So, in order to fix this problem from the filer perspective, we added an accesslist for the iSCSI initiator node name of the ESX host. Basically, by default an iSCSI initiator has access to all available portals, as I mentioned above. In order to limit the target portals that are reported back to the ESX host,
we create an access list entry that states which iSCSI target portals are available to a given iSCSI initiator node name.

Update/Remove Network card for vSwitch

There may the time when you have to update/remove vSwitch NIC with some other NIC
To do that run following command which will add vmnic0 into vSwitch0
esxcfg-vswitch -U vmnic0 vSwitch0

This will remove vmnic0 from vSwitch0
esxcfg-vswitch -L vmnic0 vSwitch0

Thursday, March 12, 2009

P2Ving IIS windows server in DMZ

Please read this before you attempt .

We had been attempting to virtualize one of the dying old hardware hosting business critical application in DMZ. This was my first experience virtualizing web server in DMZ. So I learn too many things to from this P2V effort.

  1. When you virtulize any web server in DMZ please involve Firewall/Network Admin.
  2. Ports need to be open between ESX host which will be hosting that physical machine + Physical machine + VC client box which is performing this effort.
ESX Host IP -> (All ports) Physical server IP -> (All ports)VC client server IP .
As you can see we have open all the ports and reason behind doing this was to avoid any port related error while doing P2V. This is safest approach from P2V and security prospective.
  1. Before virtualizing take backup of IIS application using IIS console. Once backup completed shut down all the IIS service and any other services like Anitvirus. Also run ipconfig /all > ip.txt to note down all the IP addresses.

  2. Use convertor and virtualize the physical box and choose not to start and install vm tools.

  3. Once this box is virtualize remove all the unwanted hardware using following link.
  4. Make sure you have removed the vm NIC card and then power on the system. The reason we do this to have a neat and clean system before adding NIC. This also safeguards from any IP or application conflict.
  5. Once the machine is powered ON check if any NIC teaming was done. I had a great difficulties because I used this link and deleted the hidden driver before removing it from NIC.
  6. Remove the NIC team because IP address are assign to the team in case of Teamed server not to the individual NIC adapter.
  7. Once the team is removed from the NIC then uninstall the NIC team software and rest all software (Hardware related) which is not required. Please follow the reboot sequence as machine request you. I thought I will reboot it once for all and landed up in trouble. I think we should have sometime listening attitude towards Mr. Gates creativity.
  8. Now follow the link to remove hidden NIC adapter imported as a part of P2V.
  9. Once all unwanted adapter is removed then proceed with installing VM tools.
  10. After this add VM nic and then power on machine and assign IP address.

There may be the case where you have to restore the IIS or if your application is DOT Net based then you have to reinstall. In my case I have attempted thrice to P2V same IIS server thinking that either application or P2V got corrupted. But finally it was .Net which was causing the whole problem. Reinstalling fixed the problem with the application. Not sure why .Net got corrupted