Category: VMware vSphere

vSphere 6.5 – Top KB Articles for Upgrade Prep

Here are the top KB articles for upgrading to vSphere 6.5:

KB 2146420 – Estimating vCenter Server 5.5 to vCenter Server Appliance 6.x migration time
KB 2147711 – Estimating vCenter Server 5.5 or 6.0 to vCenter Server Appliance 6.5 migration time
KB 2112283 – Regenerate vCenter Certificates
KB 2147548 – Important information before upgrading to vSphere 6.5
KB 2147686 – vCenter Server 6.5 upgrade best practices
KB 2147824 – Migrating VMFS 5 datastore to VMFS 6 datastore
KB 2147289 – Update Sequence for vSphere 6.5 and its compatible VMware products
KB 2147929 – vSphere Client (HTML5) and vSphere Web Client 6.5 FAQ
KB 2113917 – Repointing VC 6.x to a PSC
KB 2113115 – VMware PSC 6.x FAQs
KB 2147454 – Linux VMware Tools update fails to complete
KB 2147672 – Supported and deprecated topologies for VMware vSphere 6.5

Monitoring Compute Performance of a VM

Even when using vC Ops from VMware, vCenter metrics, vCloud Metrics, and Hyperic Montoring there are things that can be missed.

What if the datastore shows 1ms latency on the vSphere host, the CPU usage is low, the Memory is not being swapped, but a user insists it is just ‘slow’. What would you check?

In our cloud deployment (25,000 VMs in a single vCloud instance, and we have 4); I have deployed 4 – 6 VMs per vCenter across differnet datastores, on different hosts. These VMs are 512MB of RAM, 16GB of Hard Drive, and 1 vCPU. On these VMs I installed CentOS, MySQL and sysbench.

sysbench for those that do not know runs artificial tests against the HDD, RAM, CPU and MySQL. Since we have the CPU, RAM and HDD monitored by other products, I configured it to only point at the MySQL server. The MySQL test will test every portion of the compute stack. While the RAM and CPU may report within acceptable ranges, they maybe high in those ranges. Combinded that may cause a performance impact.

My script runs every 5 minutes via a cron job, and logs the data to a SQL database, for later reporting and analysis.

I won’t cover how to install sysbench or MySQL here, since that can be found else where on the internet. sysbench & MySQL

You will also need to install iSQL for CentOS, to write your data to a SQL server (if that is your target), those instructions can be found here.

After you have the above installed and ready, you can run the following to get a report on the status of your VMs performance:

sysbench –num-threads=16 –max-requests=10000 –test=oltp –oltp-table-size=500000 –mysql-socket=/var/lib/mysql/mysql.sock –oltp-test-mode=complex –mysql-user=root –mysql-password=VMware1! run > mysql.sysbench

$testvaule = cat mysql.sysbench | egrep ” cat|transactions:” | awk {‘print substr($3,2) ‘}

$date = date -u “+%F %R”

$machine = ifconfig eth0 | grep inet | awk ‘{ print substr($2,6) }’

$hostname = hostname

inssql=”insert into TABLENAME VALUES (‘$date’, ‘$machine’, ‘$testvalue’,’$hostname’)”

echo $inssql | isql HOSTNAME USERNAME PASSWORD

The above will insert the data in to a SQL table, for later reporting.

Keep in mind you do not want to make a ‘monster’ VM, the smaller the better, you want to tax all of the components. If you give it 4GB of RAM, then MySQL will cache all of the disk I/O in RAM and not report an accurate number. If you give it 4 vCPU’s you may actually artificially lower your score by causing the hypervisor to schedule 4 vCPUs worth of tasks. In this case smaller is most definitely better.

This will not replace the other monitoring solutions you may have, but it will help to augment those solutions.

Host Isolation Response Settings

Over the past few weeks I have had some people ask what I would recommend for host isolation addresses and responses, when using vSphere x.x with high availability (HA) enabled.

First off as with most everything in computers ‘It depends’.

Items to consider with host isolation response is what type of storage is being utilized? Is it considered an outage if the vSphere hosts can talk to each other but the virtual machines (VMs) cannot? How stable is my management network? Is datastore heart-beating setup correctly? The list can of things to consider can be quite long.

Here is what I generally recommend…

NAS Based Storage Options (iSCSI or NFS)

First, set one of the host isolation response addresses to the virtual IP (VIP) or the IP of the storage array. This will tell you if you can at least talk to the array; chances are that is you cannot talk to the array your machines have already failed.

That brings me to the isolation response, since the VMs should already be dead, remember the NAS array and host are no longer able to communicate, I recommend ‘Power Off’. If you were to leave the response to ‘Leave Powered On’ you can run into a chance of corruption on the guest disks. Think about it, the disk goes away for 60 seconds, then comes back. Do you think that VM is going to be able to recover, what are the odds of it corrupting that vital disk? What about ‘Shutdown Guest’, well that has a chance of the same exact thing happening.

Power Off in this setting is the safest setting, not only will it allow for that VM to be brought up elsewhere, you run a less chance of corruption or VM failure.

Fiber based storage (SAN)

With fiber the network isolation address does not matter as much, reason being the storage is not relying on the network for communication.

General Host Isolation Techniques

If it is considered a down time for the VM network to fail but not the management network, is there a way we can mitigate this?

Of course, put a vmkernel port on the virtual machine network, enable it for HA traffic, then place an isolation address of that networks gateway in to the HA configuration.

vCenter Server Appliance Database

The vCenter Server Appliance comes with a built-in DB2 database for use as an all-in-one solution for small lab environments. The sizing is supposed to be less than 5 hosts and 50 VMs. But it can work much higher than that if given enough resources, :-).

One issue that occurs when using this appliance is that the DB2 log settings are too small for any extended operation. This is due to stat roll-up jobs taking more log space to complete than the DB2 instanace is set for. Symptoms of this will be your vCenter service resetting itself every so often, you can verify this happening by watching the vpxd.log file for the line “Transaction log full” and the service dying shortly after that. The way I found it is that my vSphere Client would disconnect and ask for me to log back in everyonce in a while.

The easy(-ish) fix for this is to do the following:

  1. SSH to the vCenter Appliance, if you left the default the username is ‘root’ and the password is ‘vmware’. Once you are SSH’ed to the box you will need to change to the DB2 user.
  2. su -l db2inst1

  3. Check your current log setting by running:
  4. db2 get db cfg for vCDB | grep log

    This will show you something like this:

    User exit for logging status = NO

    Catalog cache size (4KB) (CATALOGCACHE_SZ) = 300

    Number of primary log files (LOGPRIMARY) = 128

    Number of secondary log files (LOGSECOND) = 16

    Changed path to log files (NEWLOGPATH) =

    Path to log files = /storage/db/db2/home/db2inst1/db2inst1/NODE0000/SQL00001/SQLOGDIR/

    Overflow log path (OVERFLOWLOGPATH) =

    Mirror log path (MIRRORLOGPATH) =

    First active log file =

    Block log on disk full (BLK_LOG_DSK_FUL) = NO

    Block non logged operations (BLOCKNONLOGGED) = NO

    Percent max primary log space by transaction (MAX_LOG) = 0

    Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0

    Percent log file reclaimed before soft chckpt (SOFTMAX) = 520

    User exit for logging enabled (USEREXIT) = OFF

    HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC

    First log archive method (LOGARCHMETH1) = OFF

    Options for logarchmeth1 (LOGARCHOPT1) =

    Second log archive method (LOGARCHMETH2) = OFF

    Options for logarchmeth2 (LOGARCHOPT2) =

    Failover log archive path (FAILARCHPATH) =

    Number of log archive retries on error (NUMARCHRETRY) = 5

  5. To change the log file sizing run the following:
  6. db2 UPDATE DB CFG FOR VCDB USING logprimary 128 logsecond 16 logfilsiz 8192

  7. For the setting to take effect you will need to reboot the appliance. Once the vCenter Appliance is running again you can check/watch the log files being created.
  8. ls -lah /storage/db/db2/home/db2inst1/db2inst1/NODE0000/SQL00001/SQLOGDIR/

netstat for ESXi

With the removal of the service console with vSphere 5, in the form of ESXi, netstat went by the way side.  The way to get that information is by executing this on the console.  Either in tech support mode (SSH) or at the console.

esxcli network ip connection list

Nesting Resource Pools; Good Idea or Bad?

I was asked recently on my opinion on nesting resource pools with VMs inter-mixed, well what should you do?

Well here is an easy example; if I have a Resource Pool with 9 VMs and 1 nested Resource Pool, what level of resources are each given?  10% each.

Think about that, each of the VMs have 10% of the resources and the ENTIRE nested Resource Pool has 10% of the resources.  

How many VMs are in that nest Resource Pool?  Lets assume 10 VMs.

What entitlement to the resources does a single VM in the nested Resource Pool have?  10% of the nested Resource Pools resources, which is 10% of the parents Resource Pool. So this one VM will get 1% of the parent Resource Pools resources.

How can this problem be fixed? A better practice is to not have Resource Pools and VMs to exist at the same level of the tree. The above example would be changed to have 2 Resource Pools under the parent Resource Pool. One Resource Pool could be named ‘prod’ and the other Resource Pool named ‘development’. The ‘prod’ Resource Pool would be given normal shares and the ‘development’ Resource Pool would be given low shares.  

So now what is the resource allocation to each?

The ‘prod’ Resource Pool will have 2/3 of the total parent Resource Pool, and the ‘development’ Resource Pool would have 1/3 of the parent Resource Pool.  This still does not guarantee that ‘prod’ will have access to resources. This setting is just a guarantee that in a contention scenario, that ‘prod’ will be given higher priority to resources.

How can you guarantee resources to ‘prod’?  Reservations for the ‘prod’ Resource Pool or setting a limit on the ‘development’ Resource Pool.

Importing vApp from vCenter Error

When importing a VM or vApp from vCenter into vCloud you may see an error like this:

    A specified parameter was not correct.

    spec.deviceChange.device.port.switchUuid

This can happen when comign from a Cisco Nexus 1000v Portgroup, to solve this either remove the network adapter for importing into the cloud, or move the network card to a portgroup that is not a Nexus 1000v Portgroup.

If you are using a dVS and see this error move the NIC to a standard vSwitch or delete the NIC and re-add it once it is in the cloud.

vSphere 5 vRam Licensing Changes

When VMware released the vRam pricing models and limits last week there was quite an uproar over the limitations being set to low.  It appears that VMware was listening and has responded with a hefty increase in vRam allocations.

vSphere edition

Previous vRAM entitlement

New vRAM entitlement

vSphere Enterprise+

48 GB

96 GB

vSphere Enterprise

32 GB

64 GB

vSphere Standard

24 GB

32 GB

vSphere Essentials+

24 GB

32 GB

vSphere Essentials

24 GB

32 GB

Free vSphere Hypervisor

8 GB

32 GB*

 

That is nearly double the entitlements.

Also changing is how the usage is calculated for a high water mark (think bursting for a day) to a rolling 12 month average!

Finally the last news and some major news at that, is that a single VM can only consume 96GB of vRam, no matter how you set the VM up!  (i.e. give a VM 1TB of RAM but only have 96GB count against the vRam pool!)


How to change the UUID of a VM via the PowerCLI

In vCloud director it may be necessary to deploy an identical vApp, when this happens everything is identical (including UUID’s), to help vSM differentiate the VMs changing the UUID may help.

$date = get-date -format “dd hh mm ss”

$newUuid = “56 4d 50 2e 9e df e5 e4-a7 f4 21 3b ” + $date

$spec = New-Object VMware.Vim.VirtualMachineConfigSpec

$spec.uuid = $newUuid

$vm = get-vm <VM Name>

echo “VM: ” $VM.name “New UUID: ” $newuuid

$vm.Extensiondata.ReconfigVM_Task($spec)