Tuesday, August 27, 2013

vCenter Server Appliance Limitation Improved

Current vCenter Linux  Appliance had limitation of maximum 5 Hosts and 50 VMs due to embedded database. With the release of vSphere 5.5, vCenter Server Appliance now uses a reengineered, embedded  database that can now support as many as 500 vSphere hosts or 5,000 virtual machines


VMware vSphere 5.5 Announced

This 5.5 release offers several enhancements, These enhancements include:

  • Greater Scalability – Configurations have doubled from previous limits when it comes to physical CPUs, memory and NUMA nodes. Virtual disk files also now scale up to 64TBs.
  • vSphere Customization for Low Latency Applications – vSphere with Operations Management can be tuned to deliver the best performance for low latency applications, such as in-memory databases
  • vSphere Flash Read Cache – Server side flash can now be virtualized to provide a high performance read cache layer that dramatically lowers application latency.
  • vSphere App HA – This new level of availability enables vSphere with Operations Management to detect and recover from application or operating system failure.
  • vSphere Big Data Extensions – Apache Hadoop workloads can now run on vSphere with Operations Management to achieve higher utilization, reliability and agility



Saturday, August 24, 2013

ESXi Host Migration from 1 vCenter to Another

Last week I had to migrate ESXi Host from 1 vCenter to Another. Following are the reason why

We had vCenter Linux Appliance which has limitation of 5 Hosts and 50 VMs (Don’t ask the reason why we did this. Answer is Commercial aspect J)

We had  reached the near the limit  so we decided to move to Windows based vCenter along with SQL Server

Setup

VMware vCenter 5.1
VMware ESXi 5.1
Port Groups on Local vSwitches
PVLAN Configured on the Distributed Switch

Procedure

Following procedure was followed for migration

  • Create Windows VM
  • Install vCenter 5.1
  • Backed up of Distributed Switch along with Port Group from Source VC
  • Restore Distributed Switch along with Port Groups to Destination VC (New one )
  • Disconnect Hosts from old vCenter and add it to vCenter
  • Verify the Servers are accessible
  • Verify PVLAN Configuration is intact


Observations

  • manually create the distributed switch and the restore the configuration which included all port groups and its configuration along with PVLAN
  • Restoration only happens via web client
  • Post Migration of ESXi Hosts it didn’t map to the distributed switch automatically
    • Had to manually add the Host to VDS (Distributed Switch )
  • Port Group mapping went to Invalid Backing State 
    • As soon as I added the Host manually to VDS we only connectivity to PVLAN IPs as the Port Group mapping was in Invalid Backing State. Had to manually mapping of Port Group


Backup/Restore Distributed Switch Configuration In vSphere 5.1

VMware introduced this feature in vSphere 5.1 where it allows backup of the distributed switch
Following are the observations while performing the test

Observations

  • This is only possible from web client
  • It gives you 2 options while exporting the configuration. Only distributed switch only & Distributed Switch and all port groups and its configuration
    • This was quite helpful for me as I have PVLAN configuration and I wanted to replicate that to a separate vCenter so instead of doing it manually I preferred this option and it saved me time
  • Restore option doesn’t create the distributed switch we need to manually create the switch and the restore the configuration
  • Also following a good video from VMware   - Link



EMC Recover Point Direct Access Test

We came across issue with SRM Test Failover with EMC Recover Point where the VMs became inaccessible due to lack of Image Access space in Journal Volumes.

They had followed EMC Best Practices of having Journal Volumes of 20% of the total Protected Volume however since the Data Change Rate was quite High after the Test ran for 24 hours it ran out of space

EMC recommended using EMC Recover Point Direct Access instead of Image Access if the Test is going to last for more then 24-48 hours

Also we had this requirement when performing the Test Failover we need to cut network connectivity from Production (Please refer my earlier post for the issue faced and work around applied)
Following are the observations of the Test we conducted

Setup

VMware SRM 5.1
EMC Recover Pont Appliance (RPA) version 3.5.SP1.P2 (o.175)
EMC Recover Point SRA 2.1

Discoveries

Sr. No
Tasks
Observation
1
Initiate Test Failover from SRM
All the VMs come up as expected
2
Await till the Recovery Plan Pauses

3
Break Network Connection
Replication Status changed to Paused  in RPA
4
Resume the Recovery Plan and ensure it is successful and VMS are accessible

5
Change Consistency Group Policy to "Group is Managed by Recover Point"

6
Enable Direct Access for the Consistency Group
It threw warning for Removal/Deletion of Journals
7
Change Consistency Group Policy to "Group is Managed by SRM"
This is required for Cleanup
8
Continue with the Test of Recovered VMs
Verified that there is no impact on recovered VM in terms of Performance/Access
Just  noticed a bit of lag while RDPing to the VM
This is normal as all the writes from Image Access are being committed to the actual disk on DR side 
9
Initiate Cleanup from SRM
Replication Status changed from Paused to Paused by System in RPA
Also it says Distributing Pre-replication Image
10
Enable Network Connection
Replication Status Changed from Paused by System to Ready then Initializing and finally to Active in RPA
Also it created the Remote Copy automatically
Also it started Distributing Image
First Image was 22 MB in my Lab
Interesting thing what I found here was RPA doesn’t do a complete initialization, RPA has this concept of short initialization basically it has some bookmarks internally so that it can track the changes that have happened.
Need to read more on this tough J

VMware SRM Protected VMs Count

Setup

VMware vCenter 5.X
VMware SRM 5.X

Particulars

We manage Customers DR environment using VMware SRM. Currently VMware PowerCLI doesn’t have any SRM specific cmdlets so we have written this script in Python to capture following details and export them in a Excel File.

Number of Protected VMs
Protected VM List
Configuration of the Protected VM List such as vCPU, vRAM

Next steps

Schedule the script and send email Alert



EMC Recover Point Test Failover

It’s been long time since I have got some time to write blogs however I decided I will spare couple of hours today. Here we go

We came across issue with SRM Test Failover with EMC Recover Point. Following are the details for the same

Setup

VMware SRM 5.1
EMC Recover Pont Appliance (RPA) version 3.5.SP1.P2 (o.175)
EMC Recover Point SRA 2.1

Discoveries

  • When there is no communication between Production and Recovery site the SRM recovery plan breaks at storage Snapshot layer and throws following error :
    • “Error - Failed to create snapshots of replica devices. Failed to create snapshot of replica consistency group 16531CP. SRA command 'testFailoverStart' failed for consistency group '16531CP'. Replication is not active for some reason in group copy. Please see server logs for further details.”
  • We have verified the same with EMC and they have confirmed this is a normal behavior in EMC RPA 3.5 however it has been fixed 4.0 SP1


Resolution/WorkAround

There are 2 options as a work around

  •  Perform SRM Recovery failover because of the failure to execute the test failover with network links down. Post completion discard the data and resync again to get back to normal state
  •  Introduce a Pause in Recovery Plan post Mounting of Storage so that it mounts the storage and pauses the recovery plan which will allow us to break the network connection
    • In our case this was the requirement from the Customer so we had to follow this work around