100% found this document useful (1 vote)
590 views

VMware Scenario Based

The document describes the interviewee's experience supporting a VMware infrastructure for a large global customer. It includes details about: 1. The infrastructure consists of 6 vCenter servers across 2 data centers managing over 350 ESXi hosts and 11,000 virtual machines. 2. The interviewee's daily responsibilities include health checks, monitoring alerts, capacity planning, and troubleshooting incidents. 3. When asked how virtual machine issues are resolved, the interviewee provides a multi-step process to determine the affected host and use ESXi shell commands to troubleshoot and recover unresponsive VMs.

Uploaded by

Karam Shaik
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
590 views

VMware Scenario Based

The document describes the interviewee's experience supporting a VMware infrastructure for a large global customer. It includes details about: 1. The infrastructure consists of 6 vCenter servers across 2 data centers managing over 350 ESXi hosts and 11,000 virtual machines. 2. The interviewee's daily responsibilities include health checks, monitoring alerts, capacity planning, and troubleshooting incidents. 3. When asked how virtual machine issues are resolved, the interviewee provides a multi-step process to determine the affected host and use ESXi shell commands to troubleshoot and recover unresponsive VMs.

Uploaded by

Karam Shaik
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Can you describe about your VMware Infrastructure that you are supporting?

What are the daily


operations? Hint: Interviewer wants to know whether you have live experience or not. Based on your
answer he will conclude couple of items.

Answer: I am part of VMware team (5 to 10 members) and working as L2 level supporting Global
Customer Account. Tell him/her customer details are confidential and can’t be disclosed.

Technical details: We have 6 vCenter servers configured for two Data Center locations of the Customer.
3- production 3 – DR. Production vCenter servers having multiple (say 5) clusters and each of them has
10+ ESXi servers. We are supporting around 350+ ESXi servers which are running 2500+ Virtual Machines
from windows end and 6000 VMs from Linux. Clusters are configured as “All automated”.

We have 5.x & 6.X versions of ESXi. vCenter servers are configured at 5.5 U2 version.

Daily tasks:

1. VMware Health Check Report – Running the script and sending the reports to Management
2. Checking vCenter server console for alarms and alerts
3. Scheduling changes required for VMware tasks
4. Attend meetings with Architects
5. Working on VMware related incidents like backup failure, VM not pinging, ESXi host down,
vCenter service failure …. Etc
6. Datastore usage
7. Resource capacity planning for CPU, RAM & Disk
8. Network related issues
9. VMs related issues
10. Hardware’s related issues for HP, Cisco & IBM Servers
11. Co-ordinating with vendors like VMWARE, Microsoft, HP & Cisco

Question: There is Virtual Machine which is not pingable/rdp and in vCenter console it was hung. All
options are grayed out at VM options. How to recover this Virtual machine? Hint: Interviewer wants to
know your ESXi command line skills to troubleshoot the scenario

Answer: From the symptoms it is clear that there is no option available from vCenter server except you
can see Events & Tasks to understand if any action performed before it went to hung state.
For eg: Backup jobs taking snapshot, L1/L2 Admin tried to hit multiple tasks to
shutdown/Restart/Power-off VM
With all these symptoms let us identify the ESXi host on which VM is running and get the root password
either from your Team Lead or Tool where you can get the shared ID password.

Step 1: Determining the virtual machine’s location


Determine the host on which the virtual machine is running. This information is available in the virtual
machine’s Summary tab in VI Client. Subsequent commands will be performed on, or remotely
reference, the ESXi host where the virtual machine is running.
Step 2: Open a console session either in the ESXi Shell or Putty session to ESXi host via Name/IP
Step 3: Get a list of running virtual machines, identified by World ID, UUID, Display Name, and path to
the. vmx configuration file

esxcli vm process list

Step 4: Power off one of the virtual machines from the list using this command:

esxcli vm process kill –type= [soft, hard, force] –world-id=WorldNumber

Three power-off methods are available – Soft is the most graceful, hard performs an immediate
shutdown, and force should be used as a last resort.

Step 5: Check the Virtual Machine process again to make sure it is no more exist
esxcli vm process list

Question: What are the steps that you will take when vCenter service failed to start in your
Infrastructure? It is running with 5.1 or 5.5 version. Hint: Interviewer wants to know your
troubleshooting skills and confirm whether you worked on this issue at-least once

Answer: Validate if each troubleshooting step below is true for your environment. Each step provides
instructions or a link to a document that helps eliminate possible causes and take corrective action as
necessary.

Note: If you perform a corrective action in any of the following steps, attempt to restart the VMware
Virtual Center Server service.
1. Verify that the VMware Virtual Center Server service cannot be restarted. Try to restart the service
once again and check for logs for error messages.
2. Verify that the configuration of the ODBC Data Source (DSN) used for connection to the database for
vCenter Server is correct.
Based on your Infrastructure – SQL/DB Server either on vCenter or on other Production SQL Cluster
3. Verify there is enough free disk space on the vCenter Server. Also, disk space on SQL DB is running, DB
configured with dynamic size, DB logs are grown ... etc
Sometimes you need to contact SQL Team who can perform advanced troubleshooting steps
4. Verify that ports 902, 80, 8080, 8433 and 443 are not being used by any other application.
If another application, such as Microsoft Internet Information Server (IIS) (also known as Web Server
(IIS) on Windows 2008 Enterprise), Routing and Remote Access Service (RAS), World Wide Web
Publishing Services (W3SVC), Windows Remote Management service (WS-Management) or the Citrix
Licensing Support service are utilizing any of the ports, vCenter Server cannot start.

If you see an error similar to one of the following when reviewing the logs, another application may be
using the ports:

Failed to create http proxy: Resource is already in use: Listen socket: :<port>
Failed to create http proxy: An attempt was made to access a socket in a way forbidden by its access
permissions.
proxy failed on port <port>: Only one usage of each socket address (protocol/network address/port) is
normally permitted
5. Verify the health of the database server that is being used for vCenter Server. If the hard drives are
out of space, the database transaction logs are full, or if the database is heavily fragmented, vCenter
Server may not start.
Sometimes you need to contact SQL Team who can perform advanced troubleshooting steps
6. Verify the VMware VirtualCenter Service is running with the proper credentials.
vpxd.exe utility helps you to update DB credential (KB 1006482)
7. Verify that critical folders exist on the vCenter Server host
8. Verify that no hardware or software changes have been made to the vCenter server that may have
caused the failure. If you have recently made any changes to the vCenter server, undo these changes
temporarily for testing purposes
9. Before launching vCenter Server, ensure that the VMwareVCMSDS service is running
10. Verify that the vpxd.exe is present in C:\Program Files\VMware\Infrastructure\VirtualCenter
Server\vpxd.exe location. If this file is not present, reinstall vCenter Server
Your troubleshooting skills will be useful to identify error messages and use Google to find nearest
solution. Logical thinking is always required.

Question: How do you perform ESXi patching in your Infrastructure? Hint: Interviewer wants to
understand your ITIL Skills like Change management along with Technical answer

Answer: Based on your Infrastructure size and Internet Connectivity status your answer will vary. Let us
cover generic information followed by each case.

ITIL Process:
Take outage approval from Customer.
Submit the Change record based on outage window.
Discuss with management and Customer via meetings to get approval.
Once approved – apply the patches at approved outage window.
Submit the artifacts about successful closure of patching and Change record.

Patching will be performed once in 3months [OR] once in 6 months and before we start patching the
hosts, we need to configure the Update Manager.
Open VC server, and on the Home, page click the Update Manager icon select Network Connectivity
Network connectivity – On this section you can change the ports on which clients and ESX/ESXi servers
communicate with the Update Manager server.

Download Settings: Direct connection to the Internet – If the Update Manager server has an internet
connection you should choose this option to download patches from the VMware repository.
Use a share repository – This is for those that don’t have an internet connection on the Update
Manager server, and they are using an internal web server to publish VMware patches.
Use proxy – Use this only if your Update Manager server needs to pass through a proxy server to
connect to the internet.
When you are done with your configuration hit the Apply button to save the changes. To start
downloading the patches, press the Download Now button. By pressing the download button, it will not
start to download the patches only an index of them.

As soon as you click the download button, the patches index is downloaded. When the process is done
you can see all the available updates on the Patch Repository tab.

The next step is to create a Baseline, where we tell Update Manager what updates to download, and
what type of updates to use for patching. Usually the default baselines are sufficient, but we can
customize it based on requirement. Go to the Baseline and Groups tab and click the Create link.

Give the baseline a name and leave the default baseline type which is Host Patch

If you go with the first option, future updates will not be included in this baseline and you will need to
create a new baseline, or edit this one to include those updates.

Choose the patch type you want to include in this baseline based on you ESX/ESXi hosts and select Finish
button for successful creation of baseline

Click the data center object in the Inventory pane. If you want to patch one server only, click the server
object. It is not recommended to patch all your hosts in the datacenter at once, especially if you are in
a production environment because your VMs will stop, and customer will be unhappy.

Click the Attach link in the upper right corner.

In the Attach Baseline window select the baseline which is created earlier then click the Attach button.

Remember, all your VMs will stop because the ESX/ESXi hosts need to be in maintenance mode before
the actual patching begins. Move the VMs to another host if you are in a production environment.

At the Ready to Complete screen click the Finish button to start the patching process.

This is going to take a while, because those patches need to be downloaded from the VMware
repository.

Your ESXi hosts will reboot a couple of times maybe, depends on the updates.

Question: How do you troubleshoot Virtual Machine Backup failures in your Infrastructure?
Hint: Interviewer wants to understand your skills with various backup solutions at vCenter
Answer: We have many vendors to provide backup solutions like Symantec NetBackup, Veeam, IBM
Tivoli, VMware Data Protector, Avamar, Commvault… etc. Vendors started releasing Backup solutions
exclusively for Hyper-visors like VMware vSphere, Microsoft Hyper-V & Citrix Xen Server

Based on your customer Infrastructure Architecture you will fall in below cases:

1) Run backup clients from within a virtual machine performing file‐level or image‐level backups.
2) Run backup clients from the ESX service console, backing up virtual machines in their entirety as
files residing in the VMFS file system.
3) Back up virtual machine data by running a backup server within a virtual machine that is connected
to a tape drive or other SCSI‐based backup media attached to the physical system.
4) When virtual machine files reside on shared storage, use storage‐based imaging on storage such as
SAN, NAS, or iSCSI, or an independent backup server (a proxy backup server or NDMP) to back up virtual
machine files

Sample Error messages:


The virtual machine identified by <unset> is unknown. Exit Code: 16
status: 156: snapshot error encountered
status: 13: file read failed
the backup failed to back up the requested files (6)

Troubleshooting backup failures purely depends on error message received during backup failure like
“Cannot create a quiesced snapshot”

You need to logically separate the failure scenarios like if all backups are failing then it should be your
vCenter server service.

If only one VM failing with specific error then check for status of below services:
VSS: Volume Shadow Copy
SWPRV: Microsoft Software Shadow Copy Provider

Check for output of command “VSSADMIN LIST WRITERS” – Make sure no errors are reported.

Take manual snapshot from vCenter and check for errors (if any)

VMware vSphere backup best practices:


1) Don’t back up virtual machines at the guest OS layer
2) Leverage the vStorage APIs
3) Know how quiescing and VSS works
4) Don’t skimp on backup resources
5) Snapshots are not backups
6) Schedule backups carefully
7) Know your Fault Tolerance backup alternatives
8) Don’t forget to back up host and vCenter Server configs

Question: How do you troubleshoot ESXi/ESX host that is disconnected or not responding in your
Infrastructure? Hint: Interviewer wants to understand your skills for common issues in VMware with
ESX/ESXi.
Answer: It is common problem faced by every VMware Administrator in their Infrastructure. Most of
them will stuck at Network troubleshooting as they feel disconnected happened due to connectivity
problems and their approach is correct for few scenarios.

Let us discuss various reasons for the host disconnect which is expected by Interviewer

1) Verify that the ESXi host is in a powered-on state – sounds silly but most of the people forget to
check server status via Remote cards like ILO/DRAC/RIB … etc

2) Verify that network connectivity exists from vCenter Server to the ESXi host with the IP and FQDN –
this is VMware Administrator common suspicious point

3) Verify that the ESXi host can be reconnected, or if reconnecting the ESXi host resolves the issue –
Simple and sometimes it will resolve the problem

4) Verify that the ESXi host is able to respond back to vCenter Server at the correct IP address. If
vCenter Server does not receive heartbeats from the ESXi host, it goes into a not responding state. To
verify if the correct Managed IP Address is set, see Verifying the vCenter Server Managed IP Address and
ESXi 5.0 hosts are marked as Not Responding 60 seconds after being added to vCenter Server. (Known
issue)

5) ESXi/ESX host disconnects from vCenter Server after adding or connecting it to the inventory
(VMware KB2040630)

6) Verify that you can connect from vCenter Server to the ESXi host on TCP/UDP port 902. If the host
was upgraded from version 2.x and you cannot connect on port 902, then verify that you can connect on
port 905.

Use simple Telnet command for checking the ports status

7) Verify if restarting the ESXi Management Agents resolves the issue – You ran these commands many
times right

Run these commands:


/etc/init.d/hostd restart
/etc/init.d/vpxa restart

To restart all management agents on the host, run the command:

services.sh restart

8) ESXi hosts can disconnect from vCenter Server due to underlying storage issues – Complex one to
explain but you should know the pain points from HBA card of ESXi server to LUN/Disk of the Storage
Box for fixing these issues.

Question: How do you troubleshoot unable/failed to power-on Virtual Machine Scenario?


Hint: Interviewer wants to understand your skills for common issues in VMware with Virtual Machines
Answer: Let me ask you this question – what happens if production SQL DB Virtual Machine failed to
power-on which is hosting critical data bases for HR-Payroll department. No salary slips are generated
and your Company can’t pay your salary (Just kidding). Now you can easily understand the impact for
Customer of this real time scenario when their Virtual Machines are in Power-off state. Let us focus on
Troubleshooting methodology for this Scenario.

Symptoms:
Cannot power on the virtual machine
Powering on the virtual machine fails
You see the errors similar to: Failed to power on VM
Could not power on VM: Admission check failed for memory resource See the VMware ESX Resource
Management Guide for information on resource management settings.
Group vm.3582: Invalid memory allocation parameters for VM vmm0:New_Virtual_Machine. (min:
524288, max: -1, minLimit: -1, shares: -1, units: pages)
Group vm.13327: Cannot admit VM: Memory admission check failed. Requested reservation: 311199
pages
You see the error: Unsupported and/or invalid disk type
In the Events tab on the ESXi hosts or vCenter Server, you see the error:Module DevicePowerOn power
on failed.
Unable to create virtual SCSI device for scsi0:0,
‘/vmfs/volumes/datastorename/VirtualMachineHome/VirtualMachineDisk.vmdk’
Failed to open disk scsi0:0: Unsupported and/or invalid disk type 7. Did you forget to import the disk
first?

Troubleshooting Steps:
Couple of straight answers for above symptoms will be

Issue occurs if a virtual machine that is meant for VMware Hosted products such as VMware
Workstation, VMware Player or VMware Fusion is powered-on on an ESX/ESXi host. The underlying
format used to store virtual machines on VMware Hosted products differs from the format used to store
virtual machines on ESX/ESXi hosts. User or Distributed Resource Scheduling (DRS) have assigned limited
resource to a resource pool. Virtual machine’s host does not have enough memory for the reservation
required. Virtual machine’s resource usage does not match its resource settings

Detailed steps are:


1. The virtual machine monitor may be asking a question to be answered during startup. A virtual
machine may pause the power-on task at 95% and requires manual intervention to bring the VM
online. Check the Summary tab of VM and select “I copied it “.

2. Creating a new power-on task may fail if another task for the virtual machine or other
component is already in progress, and multiple concurrent tasks on the object are not
permitted.

3. A virtual machine may fail to power on if licensing requirements are not met and you can see
Error Message “Cannot Power on Virtual Machines, not enough licenses installed to perform the
operation”. It is simple to fix by adding proper licenses for ESXi & vCenter server.
4. The virtual machine may be configured to reserve physical memory on the host, but the host
memory is over-committed and the required memory is unavailable
5. The virtual machine may be starting in a VMware High Availability cluster with strict admission
control enabled, and there are insufficient resources to guarantee failover for all virtual
machines.

6. A file required for starting the virtual machine, such as a virtual disk or swap file, may be
unavailable or missing.
7. The virtual machine may have been previously suspended and making use of CPU features
which are unavailable or incompatible with the CPU features available on this host. The virtual
machine cannot be started without the required features.

8. The virtual machine may require both a VT-capable CPU and the VT feature to be enabled in the
host system’s BIOS. This is true for all 64-bit virtual machines. If the VT feature is unavailable,
the virtual machine may produce the message msg.cpuid.noLongmode.

9. The virtual machine may require another CPU feature which is unavailable on this host. The
virtual machine may produce a message similar to msg.cpuid.<FeatureName>, identifying the
specific feature it has been configured to require. Move the virtual machine back to the host
which has the required CPU features, or edit the virtual machine’s configuration to remove the
requirement.

10. The virtual machine may start, but quickly fail with an error during startup. Review the contents
of the log file in the virtual machine’s directory for any errors or warnings, and search the
Knowledge Base for the error or warning. Base your troubleshooting on the specific messages
seen in the logs – this is the best method to fix this problem

11. If the virtual machine does successfully power on, but the guest OS doesn’t start correctly, there
may be an incompatibility between the virtual hardware and drivers within the guest OS. For
example, a missing SCSI driver may be required for booting.

12. If the guest OS, or a driver or application within the virtual machine experiences a problem
during startup, the guest OS may become unresponsive.

Question: How do you troubleshoot V-Motion failures for Virtual Machine Scenario?
Hint: Interviewer wants to understand your skills for common issues in VMware with Virtual Machines

Answer: How many times you see below messages at your vCenter Server? V-Motion failing at 14% –
10% – 82% – 90 to 95%

You are attempting a vMotion migration between two ESX/ESXi hosts, and the vMotion task reaches
14%, then times out with this error message:

The vMotion migrations failed because the ESX hosts were not able to connect over the vMotion
network. Check the vMotion network settings and physical network configuration.

vMotion fails at 82% Cannot migrate a virtual machine with vMotion


In the /var/log/vmware/hostd.log file on source ESX host, you see the error:

ResolveCb: Failed with fault: (vmodl.fault.SystemError) {


reason = “Source detected that destination failed to resume.”
msg = “”
}

vMotion fails at 90-95% Cannot perform a vMotion. vMotion times out In vCenter Server, you see the
error:

Operation timed out vMotion stops at 90% then fails with the error
a general system error occurred: failed to resume on destination

VMware vMotion fails at 10% vMotion times out The VirtualCenter/vCenter Server reports these errors:

A general system error occurred: Failed waiting for data. Error 16. Invalid argument
A general system error occurred: failed to look up VMotion destination resource pool object

Enough of error messages and let us see how to answer this question in the Interview:

Thumb rule for V-Motion failure issues is like if the operation fails below 15% then you can assume it
as Network/configuration issue.

You can tell him below are the settings to be verified for any V-Motion failures:
a) Ensure that vMotion is enabled on all ESX/ESXi hosts.
b) Determine if resetting the Migrate. Enabled setting on both the source and destination ESX or
ESXi hosts addresses the vMotion failure
c) Verify that VMkernel network connectivity exists using vmkping
d) Verify that VMkernel networking configuration is valid
e) Verify that the virtual machine is not configured to use a device that is not valid on the target
host
f) If Jumbo Frames are enabled (MTU of 9000) (9000 -8 bytes (ICMP header) -20 bytes (IP header)
for a total of 8972), ensure that vmkping is run like vmkping -d -s 8972 <destinationIPaddress>.
You may experience problems with the trunk between two physical switches that have been
misconfigured to an MTU of 1500
g) Verify that Name Resolution is valid on the host
h) Verify that Console OS network connectivity exists
i) Verify if the ESXi/ESX host can be reconnected or if reconnecting the ESX/ESXi host resolves the
issue
j) Verify that the required disk space is available
k) Verify that time is synchronized across environment
l) Verify that valid limits are set for the virtual machine being vMotioned
m) Verify that hostd is not spiking the console
n) This issue may be caused by SAN configuration. Specifically, this issue may occur if zoning is set
up differently on different servers in the same cluster
o) Verify and ensure that the log.rotateSize parameter in the virtual machine’s configuration file is
not set to a very low value
p) If the virtual machine being vMotioned is a 64-bit virtual machine, verify that the VT option is
enabled on both of your ESX hosts.
q) Restart the host management agents
r) Verify that time is synchronized across your environment
s) Verify that valid limits are set for the virtual machine being vMotioned
t) Verify that host management agents are not spiking the Service Console (ESX only)
u) Verify that there are no issues with the shared storage.

To check the health of the vMotion network:

a) Check for IP address conflicts on the vMotion network. Each host in the cluster should have a
vMotion vmknic, assigned a unique IP address

b) Check for packet loss over the vMotion network. Try having the source host ping (vmkping) the
destination host’s vMotion vmknic IP address for the duration of the vMotion.

c) Check that each vMotion vmkernel port group have the same security settings. A security
mismatch causes a vMotion operation to fail. For example, a failure occurs if a source vmkernel
portgroup is set to allow promiscuous mode and the destination vmkernel portgroup is set to
disallow promiscuous mode.

d) Check for connectivity between the two hosts. Try having the source host ping (vmkping) the
destination host’s vMotion vmknic IP address

Question: How do you troubleshoot ESXi host PSOD problems? Most of the Windows Administrators
are familiar with Blue Screen of Death and it is time to know new term PSOD – Purple Screen of Death
(VMware seems try to be unique by not using Blue color )

Interviewer explain this as scenario-based question:

I have an ESXi 5.5 installed on my HP ProLiant DL 180 G6 with a configuration of 8X Intel(R) Xeon(R) CPU
E5540 @ 2.53GHz, 24 GB RAM. Recently the server has crashed four times, showing the Purple Screen of
Death. Once this happens all of the virtual machines on the server stops and crashes until I restart this
server.
Answer: You can start with definition of what is PSOD to impress him/her and followed by
Troubleshooting steps

“A Purple Screen of Death (PSOD) is a diagnostic screen with white type on a purple background that is
displayed when the VMkernel of an ESX/ESXi host experiences a critical error, becomes inoperative and
terminates any virtual machines that are running”

You need to highlight important step to capture log file information after the PSOD occurred.

To resolve this issue, extract the log file from a vmkernel-zdump file using a command line utility on the
ESX or ESXi host. This utility differs for different versions of ESX or ESXi.

For ESXi 3.5, ESXi/ESX 4.x and ESXi 5.x, use the esxcfg-dumppart utility:# esxcfg-dumppart -L vmkernel-
zdump-filename
To extract the log file from a vmkernel-zdump file:

Find thevmkernel-zdump file in the /root/ or /var/core/ directory: # ls /root/vmkernel*


/var/core/vmkernel*
/var/core/vmkernel-zdump-073108.09.16.1
Use thevmkdump or esxcfg-dumppart utility to extract the log. For example:# vmkdump -l
/var/core/vmkernel-zdump-073108.09.16.1
created file vmkernel-log.1# esxcfg-dumppart -L /var/core/vmkernel-zdump-073108.09.16.1
created file vmkernel-log.1
Thevmkernel-log.1 file is plain text, though may start with null characters. Focus on the end of the log,
which is similar to:
VMware ESX Server [Releasebuild-98103]

PCPU 1 locked up. Failed to ack TLB invalidate.


frame=0x3a37d98 ip=0x625e94 cr2=0x0 cr3=0x40c66000 cr4=0x16c
es=0xffffffff ds=0xffffffff fs=0xffffffff gs=0xffffffff
eax=0xffffffff ebx=0xffffffff ecx=0xffffffff edx=0xffffffff

Note: The file name created for the log in this example is vmkernel-log.1. If another file with the same
name already exists, the new file is created with the number suffix incremented.
Most of the times it will be hardware issue and you need to open a case with Hardware vendors, in this
case it is HP. Based on findings you need to replace the Hardware devices or upgrade the firmware as
suggested by Hardware vendors via ITIL Change Management process.

In some cases, it may be problem with software installed on ESXi server like additional agents for
monitoring both software & hardware, additional VIBs added for Storage … etc

Finally, if you want to be expert to analyze the logs on your own, then here is the good KB Article from
VMware. It’s rare that Interviewer asking about debugging this issue but he wants to check your
understanding about procedure followed in case of PSOD.

VMware KB1004250

Question: How do you troubleshoot P2V Failure Issues in your Infrastructure? (P2V = Physical to Virtual)

Answer: There is lot of discussion about which Physical server is good candidate for VMware
Infrastructure like Exchange, SQL or Cluster … etc

Interviewer also show interest to hear from you that, how you judge which Physical server is good
candidate for Virtualization

Answer for this point is first we need to analyze 3 months of data from any Performance reporting tools,
if you notice server utilization is 80% of CPU & Memory then most likely that Physical server not much
suitable for VMware Infrastructure.

If the Server utilization is less than 70% then you can recommend it for VMware Infrastructure. Once the
server is selected for P2V and you started the process (hope you have Pre & Post P2V checklists) and ran
into some issue. Here you can find good check list to fix P2V problems.

✔ To eliminate permission issues, always use the local administrator account instead of a domain
account.
✔ Note: Disable UAC for Windows Vista, Windows 7, or Windows 8 prior to converting.

✔ To eliminate DNS problems, use IP addresses instead of host names.


✔ Ensure that you do not choose partitions that contain any vendor specific Diagnostic Partitions
before proceeding with a conversion.
✔ To reduce network obstructions, convert directly to an ESX host instead of vCenter Server as the
destination.
✔ Notes: This is only an option in VMware vCenter Converter Standalone

✔ If you are unable to convert directly to an ESX host in vCenter Server 5.0, see vCenter
Converter Standalone 5.0 errors when an ESXi 5.0 host is selected as a destination. Check
KB2012310

✔ VMware vCenter Converter Standalone has many more options available to customize your
conversion. If you are having issues using the Converter Plug-in inside vCenter Server, consider
trying the Standalone version.
✔ If a conversion fails using the exact size of hard disks, decrease the size of the disks by at least
1MB. This forces VMware Converter to do a file level copy instead of a block level copy, which
can be more successful if there are errors with the volume or if there are file-locking issues.
✔ Make sure there is at least 500MB of free space on the machine being converted. VMware
Converter requires this space to copy data.
✔ Shut down any unnecessary services, such as SQL, antivirus programs, and firewalls. These
services can cause issues during conversion.
✔ Run a check disk on the volume before running a conversion as errors on disk volumes can cause
VMware Converter to fail.
✔ Do not install VMware Tools during the conversion. Install VMware Tools after you confirm that
the conversion was successful.
✔ Do not customize the new virtual machine before conversion.
✔ Ensure that these services are enabled:
✔ Workstation Service
✔ Server Service
✔ TCP/IP NetBIOS Helper Service
✔ Volume Shadow Copy Service

✔ Check that the appropriate firewall ports are opened


✔ Check that boot.ini is not looking for a Diagnostic/Utility Partition that no longer exists.
✔ If you are unable to see some or all of the data disks on the source system, ensure that you are
not using GPT on the disk partitions.
✔ In Windows XP, disable Windows Simple File Sharing. This service has been known to cause
issues during conversion.
✔ Unplug any USB, serial/parallel port devices from the source system. VMware Converter may
interpret these as additional devices, such as external hard drives which may cause the
conversion to fail.
✔ If the source machine contains multiple drives or partitions and you are having issues failing on
certain drives, consider converting one drive or partition at a time.
✔ Verify that there are no host NICs or network devices in the environment that have been
statically configured to be at a different speed or duplex. This includes settings on the source
operating system, switches and networking devices between the source and destination server.
If this is the case, Converter sees the C: drive but not the D: drive.
✔ If you are using a security firewall or Stateful Packet Inspecting (SPI) firewall, check firewall
alerts and logs to make sure the connection is not being blocked as malicious traffic.
✔ If you have static IP addresses assigned, assign the interfaces DHCP addresses prior to
conversion.
✔ If the source server contains a hard drive or partition larger than 256GB, ensure that the
destination datastores block size is 2MB, 4MB, or 8MB, and not the default 1MB size. The 1MB
default block size cannot accommodate a file larger than 256GB.
✔ Clear any third-party software from the physical machine that could be using the Volume
Shadow Copy Service (VSS). VMware Converter relies on VSS, and other programs can cause
contention.
✔ Disable mirrored or striped volumes. Mirrored or striped volumes cannot be converted
✔ Verify that the VMware Converter agent is installed on the source machine. It may not be if the
conversion fails right away.
✔ Verify that DNS and reverse DNS lookups are working. It may be necessary to make entries into
the local hosts file on source machine. Use IP addresses, if possible.
✔ Run msconfig on the source server to reduce the number of services and applications running at
startup. Only Microsoft services and the VMware Converter Service should be running.
✔ Inject VMware SCSI drivers into the machine before conversion. Windows tries to Plug-n-Play
the new SCSI Controller, and Windows may fail if the proper drivers are not installed.
✔ If you customized permissions in your environment, ensure that local administrator has rights to
all files, directories, or registry permissions before conversion.
✔ Uninstall any UPS software. This has been known to cause issues after Conversion.
✔ Ensure that you do not have any virtual mounted media through an ILO- or DRAC-type
connection. Converter can misinterpret these as convertible drives, and fails upon detecting
them. As a precaution, disconnect your ILO or DRAC to prevent this issue.

✔ Your answer should also cover Logs information which will prove your real time experience

VMware Converter logs: There are also several ways to diagnose issues by viewing the VMware
Converter logs. The logs can contain information that is not apparent from error messages. In newer
versions of VMware Converter, you can use the Export Log Data button. Otherwise, logs are typically
stored in these directories:

Windows NT, 2000, XP, and 2003:


C:\Documents and Settings\All Users\Application Data\VMware\VMware Converter Enterprise\Logs
C:\WINDOWS\Temp\vmware-converter
C:\WINDOWS\Temp\vmware-temp

Windows Vista, 7, and 2008:


C:\Users\All Users\Application Data\VMware\VMware Converter Enterprise\Logs

Windows 8 and Windows 2012:


C:\ProgramData\VMware\VMware vCenter Converter Standalone\logs

Note: In order to access this location in Windows Vista, 7, or 2008, you may need to go into the folder
options and ensure that Show Hidden Files is enabled and that Hide Protected Operating System Files is
disabled.

C:\WINDOWS\Temp\vmware-converter
C:\WINDOWS\Temp\vmware-temp

Windows NT and 2000:


C:\WINNT\Temp\vmware-converter
C:\WINNT\Temp\vmware-temp

Linux:
$HOME/.vmware/VMware vCenter Converter Standalone/Logs
/var/log/vmware-vcenter-converter-standalone

Question: As part of Data Center Network devices upgrade/change – someone changed vCenter IP
Address. How do you tackle this Scenario as a VMware Administrator? What is the Technical plan that
you will follow for this Change Record? (ITIL Process) Hint: Interviewer looking at your Technical
direction/plan along with ITIL Change management procedures

Answer: We may think that changing IP Address is easy job like going to vCenter VM Console (most of
the cases) [OR] Remote console for Physical servers and modify the Network Adapter Settings. But what
happens to your ESXi servers, NSX VM’s and Update Manager?? will they communicate to your vCenter
server with new IP directly without any modification? Here is the detailed Technical Plan to answer this
question.

1. Create backups of the vCenter Server & underlying SQL database for Backup Plan
2. Set DRS to manual mode to avoid anything moving around
3. Identify the ESXi host running the vCenter VM and connected directly to the host with the
vSphere Client – Do not forget your vCenter going to disconnect and you can’t manage it
anymore via vSphere client
4. Close any sessions you have open to the vCenter Server (Web Client, vSphere Client sessions
5. Open a console window to the vCenter Server by way of the ESXi host.
6. Stop all VMware related services
7. Change the IPv4 address and IPv4 gateway as per new Networking configuration
8. Put DRS back to fully automated (optional based on your setup)
9. Uninstall Update Manager software from the VM (Sometimes it’s installed other than vCenter)
10. Install Update Manager and point it new vCenter Server IP Address
a. NOTE: There is easy method to update vCenter IP Address at Update Manager via
command line (we will discuss it in future posts)
11. Update the vCenter Managed IP Address with below procedure

12. NSX requires your attention as vCenter re-registration is complex procedure – leave this for
Network Specialists to provide technical plan
13. Disconnect host from vCenter to flush out the database entry
14. Reconnect to the ESXi host to use new vCenter IP Address for communication and agents
Installation

Question: Can you explain me any major issue that you fixed in VMware Infrastructure? How you
handle that situation? Can you list the steps that you followed to fix that problem?

Answer: It’s always good to pick Storage related issues which creates high impact for both Physical &
VMware Infrastructures. I’m going to take the scenario of couple of VMs’ are not responding including
vCenter Server which is also running as Virtual Machine.
You can start the narration like – you got a call from Helpdesk about some VM’s are inaccessible in the
vCenter server. When you connected to the vCenter server, you can’t access the console of VM’s as they
stuck at black screen. From the screenshot it’s clear that, CPU & Memory usage is high for couple of
hosts in the Cluster. You want to understand the reason for CPU & Memory utilization on the ESXi hosts
and ran the “esxtop” command to know which VM/process creating more CPU/Memory utilization. If
you are new to esxtop, then follow this article to know the usage of this command. As the issue is
related to couple of virtual machines’ only then you downloaded vmware.log file to understand more
about this issue from VM’s point of view. You notice that there is some latency to download the log files.
While this issue is running, vCenter VM went to unresponsive state which makes the situation worse.
From the attempt of esxtop & vwware.log there is not much information to extract and management
wants you to fix the vCenter VM issue first. Also, SQL DB server also running as Virtual Machine which is
holding vCenter & Update Manager DB’s. Also, you installed PernixData (acquired by Nutanix) to
improve storage I/O Performance for VM’s. These factors make the situation complex and below
diagram helps you to understand the scenario in better way.

Now let’s talk about resolution step by step and how you found the issue with Storage.
Help desk reported that another set of VM’s are not reachable in the network and they are hung at
black screen when they opened the VM console from making direct connection to ESXi server. It means
issue is really growing and impacting more business functions like SharePoint, Exchange, Citrix, Finance
Applications … etc. Management declared this as MI (Major Incident) and requested you to join the
bridge call along with other Technical teams like Storage, Network & Incident escalation managers. This
is going to be collaboration effort rather than you fix this issue alone from VMware team. As there was
latency to download the file and black screen for the VM console brings your attention to Storage
related problems. esxtop has specific switches to understand storage related problems and check this
article to know more details. For this escalation you need to know KAVG & DAVG values and thresholds
as listed below.

So from your troubleshooting it’s clear that there is ongoing Storage related problem in the
Infrastructure which shows in esxtop and VM consoles are with black screen. You requested Storage
team to validate the VNX configuration and performance to make sure they are also seeing any alerts or
issues from their side. They came back quickly with an update that ALL stats are looking good for read &
write operations. Management asking what is the problem and you are confident that there is some
Storage issue but they can’t see in VNX side. So, you requested the Storage team to give some
recommendations and they comeback with a plan that, they are going to try to switch the Storage
Processor from A to B. They executed this plan with management approval but there is no change in the
situation. Later they want to try to reboot storage processors one by one but it has major impact if they
can’t come online after reboot. Management agreed as ALL the production VM’s are down at that time.
Storage team finished rebooting both processors successfully in 1 hour but that didn’t change anything
in the VM’s status. Which means Storage team performed all the Troubleshooting including rebooting
the SPA & SPB. Now management wants to know next action plan from VMware Infrastructure.

You are confident that there is Storage problem but VNX looks good then there should be a problem
with another Storage layer which is PernixData. Upon investigating more from Pernixdata process – you
found that there are lot of pending I/O jobs waiting at one of ESXi host server. You killed the pernix
process on that specific host which brought the services online to normal state. One the VM’s became
normal, you recommended the Management to perform clean ESXi reboot to avoid any immediate
potential outage to the Production. They agreed for it and clean reboot performed for each ESXi host
and conference call got closed after validating the Applications from the end users. (UAT – User
Acceptance Test)
Resolution: PernixData helps to improve the Storage I/O performance by offloading that service to ESXi
host local cache, however in this situation it’s holding lot of jobs which created above situation. By killing
pernix process from ESXi host console and clean ESXi reboot brought the situation to normal state. Hope
this helps you to answer Interview questions in better way and be social to share the knowledge.

You might also like