Thursday, April 2, 2009

vSRM 1: Network Consideration

I have done demo in past for vSRM 1.0 using NetApp SIM. I though of starting a new fresh topic on my blog about Site recovery Manager.

Option 1: Use the Same IP
When we start powering on VMs at the Recovery Site – they may be on totally different networks requiring different IP addresses and DNS updates to allow for user connectivity. The good news is SRM can control and automate this process. One very easy way to simplify this for SRM is to implement “stretched VLANs” where two geographically different locations appear to be on the same VLAN/subnet. However, you may not have the authority to implement this – and unless it is already in place it is a major change to your physical switch configuration, to say the least. It’s worth making it clear that even if you do implement stretched VLANs you may still have to create inventory mappings because of port group differences. For example there may indeed be a VLAN 101 in Boston and a VLAN 101 in Amsterdam . But if the administrative team in Boston call their port groups on a virtual switch BOS-101 and the guys in Amsterdam call theirs AMS-101 then you would still need a port group mapping in the Inventory Mappings tab.

(Source : Administering VMware's Site Recovery Manager chapter 4)

Option 2: Re-IP all the VM’s at the recovery site

This can be done using “dr-ip-customizer.exe” The basic process is to export the current NIC settings to a CSV, edit the CSV with your new IP addresses and then import it back into SRM with the new desired IP addresses. In reality this tool is helping to create the Customization Specifications for you, so you can actually see them from the Edit - Customization Specifications window after you follow the process.

(Source : Site Recovery Manager Administration Guide Page54)

Duncan : If you want to keep it simple in terms of failover I would always prefer stretched vlan's. Especially dependency wise re-ip'ing can be a huge problem. (hardcoded ip's within apps / databases etc)

Lee Dilworth : From the x86 side of things I sometimes get the feeling that for historic reasons a lot of x86 teams were forced to re-ip in DR since
A) Networks didn't support transparent failover
B) Not that many workloads existed on x86 that were classed as critical for the business that they needed to be included in any DR strategy, as we all know in the last 5-10 years this swing has shifted massively and now the vast majority of workloads will be running on x86 based OS's. So maybe its time for a re-think?
What you need to consider are todays "modern" and multi-tier applications (web/msg/rdbms) how many in your portfolio would you expect to work unaided if you simply re-ip'd them? probably not many and confidence is low...for example would you really want to failover lots of oracle/sql srvr/exchange VM's and then give them all brand new ipaddresses. ok they may work if you are VERY confident that the apps respond well to that and that you have been very strict company wide in your use of FQDN's and aliases if not then there's always those hardcoded ipaddresses that can bring your application stacks to their knees if you forget (or miss) to change on. Bottom line, when changing the ip-addresses over you could (not always) introduce a whole extra set of tests that need to be run once the VM's are recovered. The nice thing about SRM is that at least now you can perform these tests in a non-disruptive way which is great but if you feel that in real DR situations you still need to run them no matter how well your testing went then any extra tests

1. Whats our strategy for failover? change ip addresses / keep same address?
2. Do we stretch layer2 vlan?
3. Do we failover layer 3?
4. How do we update DNS if we make changes? is that scripted?
5. Do we need a secure test set of vlans created at the test site that can be used for DR testing? do these exist?
6. If we do re-ip do we have an existing mechanism/solution/set of scripts/dhcp reservations that do this for us (if you do then these same techniques will almost certainly work unaltered against your VM's)
7. Does SRM offer any other options for re-ip if we don't have an existing mechanism (yes it does!)

Once you can answer the above you then will have a better idea of what your solution should look like. customers i am working with now that are deploying SRM and have answered this usually then come back with a simple addition to their network setup. they create the required vlans at the network layer, ensure they have the correct routing / ACL's set etc and then they present these down the same trunk ports that the recovery site ESX hosts are using. once this is done they can then create portgroups for these new test vlans and finally in their SRM recovery plans they now select these "test" portgroups as the ones to use for the test network rather than the "Auto" default.


Joshua said...

Hi Roy,

In your post you say:
"One very easy way to simplify this for SRM is to implement “stretched VLANs” where two geographically different locations appear to be on the same VLAN/subnet."

Have you run into any good papers discussing stretched VLAN's across the WAN?

This sounds like the way to go, I am just having a hard time wrapping my mind around the way you would do this over the WAN with BGP, etc.


Vikash Kumar Roy said...

Not good paper but myself and my N/W engineer has an idea to implement stretched VLAN.
I will be updating my blog once I have done with my testing. Till then keep watching my blog

Anonymous said...

I am sure a lot of people including myself are very interesting in reading your thoughts on a stretched VLAN.