Wednesday, August 6, 2008

Lesson and experience- Implementing VLAN trunking

For past couple of weeks I was trying to solve IP address issue. We have host with DL380 and DL385 which were underutilized. Network engineer gave me couple of VLAN and told me to use. I tried to explain that these VLAN is not good for me until we have VLAN trunked on the physical switch port. It started with formal discussion and then it went to support request with VMWare. VMWare tech support asks us to follow following KB’s 1004074 and 1004127. Our network engineer followed exactly the same way as narrated in 1004074 but he forget to put the port into trunked mode and traffic was not able to pass through another VLAN but only through native VLAN. So what is the Native VLAN? It is like when you install ESX without any VLAN info it mark the network as “VM Network” and your switch configuration looks like this

interface GigabitEthernet1/29

switchport

switchport access vlan 2

switchport mode access

no ip address

no snmp trap link-status

spanning-tree portfast

end

This is native VLAN and traffic does not get tagged when ESX host pass the traffic to the physical switch.

FIG: SITUATON 1

So in the above example it was very easy since I had two NIC and even I lose connectivity of my VM’s while implementing VST then I still connection over my management console (Service Console). After speaking to VMWware tech support who was really good in explaining about VMWare networking. Here is how he explained.

He also forwarded couples of KB’s

http://www.vmware.com/resources/techresources/997 1003825,1004088,1004048,1004252 and 1004127 to some extend it was helpful. After understanding the KB we have implemented on one of the host which is on above pics and it was successful. We thought we were confident to implement across the environment but no we were wrong. Why we were wrong? Because the other host were has their unique wiring. What was unique about it? The service console and VM Network on the single NIC. Yeah I know it sounds so stupid as per standard VMWare practice but again that is the reason I am here or else why anyone need my service.

Our next show started when we thought of implementing on the host with single NIC

FIG: SITUATON 2

Again I planned to test on this host along with my Network engineer. He verified and the configuration looks like this

interface GigabitEthernet1/29

switchport

switchport access vlan 2

switchport mode access

no ip address

no snmp trap link-status

spanning-tree portfast

end

and then he tried to wipe it out the configuration and then enter

interface GigabitEthernet2/9

description xxxxxxxx

switchport

switchport trunk encapsulation dot1q

switchport trunk allowed vlan 1-3,5,224

switchport mode trunk

switchport nonegotiate

no ip address

no snmp trap link-status

no cdp enable

spanning-tree portfast trunk

end

(DON’T worry about description it would vary from environment to environment)

When he was trying to write the configuration back to IOS on the switch it was not accepting the command switchport trunk encapsulation dot1q . With this we have to stop the changes and role it back to as it was traffic was back to original state. Every one was scratching their head is it fault from VMWare side or it is the network. Why the heck it does not accept switchport trunk encapsulation dot1q . I also did some research and guessed that it may be because both the management VLAN and console VLAN is on the same port that might be causing the issue. So I told my network folks to roll back the changes and let me try something. I thought I would label original “Service Console” to have it on VLAN2 (VM Network was on VLAN 2) but it gave me warning and I landed up in creating another “Service Console “ and assigning different IP address to it but still no luck. I had been asked to call VMware tech support guy and have conference with Network folks. My network engineer came back online and he was able to trunk the VLAN’s over the port but I my host was kicked out of the network. At this point VMWare tech support asked me to get the ILO session and asked me to run esxcfg-vswitch –l . Following was the output

You can see that VM Network and Service Console are on VLAN 0 so when you run the following command

interface GigabitEthernet2/9

description xxxxxxxx

switchport

switchport trunk encapsulation dot1q

switchport trunk allowed vlan 1-3,5,224

switchport mode trunk

switchport nonegotiate

no ip address

no snmp trap link-status

no cdp enable

spanning-tree portfast trunk

end

Service Console also comes into VLAN 2 (Original Service Console IP address is the part of the VLAN 2). So VMWare tech support asked me to lable the existing service console to VLAN 2 using following command esxcfg-vswitch –p “Service Console” –v 2 vSwitch0 BINGO my host got back into network. He also deleted my newly created service console using esxcfg-vswitch –D “Service Console2” vSwitch0. We thought this is the process of implementing the VLAN trunking. So I scheduled another downtime to implementing VLAN trunking on some of the production host. Again when we started implementing on the host our network engineer had trouble passing the command switchport trunk encapsulation dot1q and I was kind of stuck. Everyone was behind my as*** since I have asked for second downtime in two days.

Our network folks called their support and those guys figured out that Service Console is on VLAN 0 which is in NATIVE state and that is the reason it is not allowing to pass the command switchport trunk encapsulation dot1q ,I was kind of shocked why does switch will not allow to make changes if NIC on the host is in Native MODE.

It was very important to note that when you are making the changes as per “FIG Situation 2” we have to make changes on the host level first and then network engineer will make the configuration changes. So that steps would be

  1. Open the Putty session and run the following command to label the “Service Console” as per the original IP address using following command

esxcfg-vswitch –p “Service Console” –v 2 vSwitch0

  1. And then Network Engineer would be making the changes only after stripping everything out of the switch port

interface GigabitEthernet2/9

description xxxxxxxx

switchport

switchport trunk encapsulation dot1q

switchport trunk allowed vlan 1-3,5,224

switchport mode trunk

switchport nonegotiate

no ip address

no snmp trap link-status

no cdp enable

spanning-tree portfast trunk

end


If the steps are not followed in the same way port trunking will never WORK. I have read so many blogs from Scott Lowe and many KB but no were people has experienced something like this. Well no I claim that I am above Scott Lowe or anyone but it was real good experience for me.

1 comment:

Vikash Kumar Roy said...

Here is what My GURU has to say
There's only 4096 possible VLANs you can use
VLAN 0-10 should be reserved for network management. You should never create a new VLAN that uses 0 if at all possible and if you do you should have plans to get everything off of that one.