Tag: vsphere

Host Isolation Response Settings

Over the past few weeks I have had some people ask what I would recommend for host isolation addresses and responses, when using vSphere x.x with high availability (HA) enabled.

First off as with most everything in computers ‘It depends’.

Items to consider with host isolation response is what type of storage is being utilized? Is it considered an outage if the vSphere hosts can talk to each other but the virtual machines (VMs) cannot? How stable is my management network? Is datastore heart-beating setup correctly? The list can of things to consider can be quite long.

Here is what I generally recommend…

NAS Based Storage Options (iSCSI or NFS)

First, set one of the host isolation response addresses to the virtual IP (VIP) or the IP of the storage array. This will tell you if you can at least talk to the array; chances are that is you cannot talk to the array your machines have already failed.

That brings me to the isolation response, since the VMs should already be dead, remember the NAS array and host are no longer able to communicate, I recommend ‘Power Off’. If you were to leave the response to ‘Leave Powered On’ you can run into a chance of corruption on the guest disks. Think about it, the disk goes away for 60 seconds, then comes back. Do you think that VM is going to be able to recover, what are the odds of it corrupting that vital disk? What about ‘Shutdown Guest’, well that has a chance of the same exact thing happening.

Power Off in this setting is the safest setting, not only will it allow for that VM to be brought up elsewhere, you run a less chance of corruption or VM failure.

Fiber based storage (SAN)

With fiber the network isolation address does not matter as much, reason being the storage is not relying on the network for communication.

General Host Isolation Techniques

If it is considered a down time for the VM network to fail but not the management network, is there a way we can mitigate this?

Of course, put a vmkernel port on the virtual machine network, enable it for HA traffic, then place an isolation address of that networks gateway in to the HA configuration.

netstat for ESXi

With the removal of the service console with vSphere 5, in the form of ESXi, netstat went by the way side.  The way to get that information is by executing this on the console.  Either in tech support mode (SSH) or at the console.

esxcli network ip connection list

Nesting Resource Pools; Good Idea or Bad?

I was asked recently on my opinion on nesting resource pools with VMs inter-mixed, well what should you do?

Well here is an easy example; if I have a Resource Pool with 9 VMs and 1 nested Resource Pool, what level of resources are each given?  10% each.

Think about that, each of the VMs have 10% of the resources and the ENTIRE nested Resource Pool has 10% of the resources.  

How many VMs are in that nest Resource Pool?  Lets assume 10 VMs.

What entitlement to the resources does a single VM in the nested Resource Pool have?  10% of the nested Resource Pools resources, which is 10% of the parents Resource Pool. So this one VM will get 1% of the parent Resource Pools resources.

How can this problem be fixed? A better practice is to not have Resource Pools and VMs to exist at the same level of the tree. The above example would be changed to have 2 Resource Pools under the parent Resource Pool. One Resource Pool could be named ‘prod’ and the other Resource Pool named ‘development’. The ‘prod’ Resource Pool would be given normal shares and the ‘development’ Resource Pool would be given low shares.  

So now what is the resource allocation to each?

The ‘prod’ Resource Pool will have 2/3 of the total parent Resource Pool, and the ‘development’ Resource Pool would have 1/3 of the parent Resource Pool.  This still does not guarantee that ‘prod’ will have access to resources. This setting is just a guarantee that in a contention scenario, that ‘prod’ will be given higher priority to resources.

How can you guarantee resources to ‘prod’?  Reservations for the ‘prod’ Resource Pool or setting a limit on the ‘development’ Resource Pool.

How to become a VCDX… Or at least have a less painful defense

I have been asked many times about the VCDX Defense process.  Especially, what to do and what not to do? What is the defense really like? Are the panelists really mean? Do they throw flaming daggers at candidates? and the such… So here are some pointers, tips and other information.

Design Defense

First off, the panelists do NOT have flaming daggers that get thrown at the candidates.  The panelists use soft squishy Nerf balls. Now kidding aside what is a defense like?

You are given 75 minutes to defend the vSphere design that you submitted a few months prior.  During this 75 minutes it is generally okay (and recommended) to give a short 15 minute presentation on your design and who you did the design for.  The key point here is SHORT, do not plan on taking 50 minutes to present your design to the panelists, there are questions and items that the panelists need answered in order to complete the scoring guide.  If you run out of time and do not cover a section (i.e. networking) then the panelists will be unable to give you a score for that section, and your score will suffer.

At end of your slide deck it is very helpful to have all of your Visio diagrams pasted into a few slides, so that they can be referenced as needed during the defense.  Yes the panelists have your application and diagrams in front of them, but it is really handy if you can point to an item on a large screen as opposed to a piece of paper, or waste time recreating the diagram on the whiteboard.

Make sure that the question you are answering is the one asked; if the question is about the technical merits of the storage design, do not answer with something about the logical setup of the storage.

Focus on the question, keep the answers detailed but short so that you can get to more questions and more of the scoring guide filled out.

Remember the panelists did the defense once as well, and just like you the panelists experienced the pain and anxiety, it is fine to be nervous just do not let it become a distraction from your great work.

Before you come into the room to complete your defense you should ask to bring a glass of water or a can of soda (liquor is highly discouraged until after completing the entire defense).  75 minutes of talking can be taxing, so feel free to take sips as needed.  Just remember the clock does not stop for your water breaks.

The design you submitted is scored on both the technical merits and the logical merits of the design.  Just because you can talk circles around the technical aspects, does not mean you will pass; you must be able to talk circles around the logical portions as well.  I will say that again, being an expert in design does not make a VCDX and neither does being an expert in technical troubleshooting.  A VCDX must posses a high level of knowledge, confidence, and the ability to convey both of these attributes.

Break Time

Now you get a 20 minute break… use the break to hit the bathroom, get a new glass of water, and RELAX.  The hardest part is over (well… hardest in my opinion).  Up next are the design scenario and the design troubleshooting.  The panelists at this point have magically transformed into a customer that needs a new design and a design fixed.

Design Scenario

This portion is all about your thought process, the panelists do not expect a completed design or for the solution to the troubleshooting scenario.

The design scenario is 30 minutes of a customer that needs a vSphere design with certain restrictions, contraints, requirements and assumptions.  No, you will not be given all of the information needed, you have to ask for it.  Remember this is about your thought process, and no the panelists can NOT read your mind, so make sure to think aloud.  There is always a whiteboard, feel free to use it.  Fill it up, make pictures, anything to help show your thought process.

The design scenario is scored much like the defense of the design that you submitted.  Make sure that you cover as many topics as you can (technically and logically).

Design Troubleshooting

This section just like the first is just like the previous design scenario.  The panelists want to see and hear your thought process on how to fix the problem.  Anything you want or need the panelists can and will get the information for you, if you need to know the CPU RDY, or WAIT times the panelists have that information.  If you want to know the serial number of the CPU on the 3rd blade server… well the panelists do not have that nor should you need it.

Is there a solution to the design troubleshooting? Yes, there is a solution to the design troubleshooting.  No, you will not be given the answer at the end and you will not find out if you actually solved it.

After that, you get a few minutes to comment on the process, and then you are done.  About a week after the defense you will get an email with your results.

I wish you all good luck!  Remember the panelists are not there to beat you up or throw items at you, all the panelists have been where you are and know the stress you are under.

vCloud 1.5 Bios.UUID Changes

With vCloud 1.x and even back in Lab Manager 3.x and 4.x the default behavior for capturing and deploying VMs to the Library (LM term) or Catalog (vCloud term) has been to keep the BIOS.UUID the same for all subsequent VMs.  This allows those products that tie their license to the machine serial number to keep workign when re-deployed on vCloud or LM.  

Let’s back up what is the BIOS.UUID?  The BIOS.UUID is randomly generated when the VM is created on vSphere, this value can be duplidated on a vSphere or vCenter instance, meaning it is not a unique identifier.  The BIOS.UUID is what the VM or guest queries when it looks for the ‘machine serial number’.  Hence when a software product (Microsoft Windows, SQL, and other vendors products) look for a way to identify what machine it is installed on it uses the machine serial or BIOS.UUID.  If we go changing this value we can break software licenses or installations since the value is no longer the same as when it was installed.

Problems arise from this when software products report back to a central database with the machine name and machine serial number for compliance checks or other needs.  For example if I have a domain controller and SQL server with the same BIOS.UUID (they may have been deployed from the same vCloud Catalog entry) with Trend Micro (or most other A/V products) installed.  When that product reports the status of the installed A/V back to the control center, there is a conflict: I have two macines with different names but the same serial number reporting back.  Most products will only report the last machine to report in for its status, so I could have a system that has a very old A/V definitions but never know it due to the conflict in serial numbers.

With vCloud 1.5 there is a database entry that can be changed to allow for vCloud on deployment from catalog entries that will change the BIOS.UUID of each VM; but this will cause problems with software products that will either require re-activation or re-entering the license key since that product may indicate that the underlying machine is different.

How to make the change:

  • In the vCloud database (either SQL or Oracle), in the CONFIG table there is an entry named: backend.cloneBiosUuidOnVmCopy
  • Change the default value of 1 to 0 to force vCloud to start changing BIOS.UUID’s
  • Restart the VMware vCloud Service on all vCloud Director cells

This will NOT change currently deployed VMs and is a cloud wide setting affecting ALL VMs/vApp’s being deployed from now on (or until changed back to 1).

vSphere Disk Write Process

Recently I have seen quite a bit of question and discussion of how vSphere handles disk writes from the guest OS and performance questions about limiting the writes.

The question started off innocently as “How can we gaurantee vSphere disk writes happen and are not cached?”

First lets analyze where this may have started SQL Solutions posted this article, in which they tested SQL server on a VMware virtual machine and tested the write caching of the DB and the virtual machine.  The fatal flaw with this test is that they used VMplayer which is NOT vSphere and works nothing like vSphere does with disk writes.  VMplayer (as well as Workstation and Fusion) does cache results (which is what they have shown), where as vSphere behaves completely differently.

vSphere writes are handled differently since vSphere is an Enterprise server class software product.  Each and every write that a guest does is not confirmed in the operating system until it has been confirmed by the underlying storage array.  vSphere since ESX 3.x has behaved like this.

This is true for NFS and SCSI based storage.

Does that mean that the data actually made it to a spindle?  NO

vSphere only knows that the storage array has confirmed the write has happened, it could still be in the cache of the storage array, this could be the cache of the RAID controller or the SAN storage array.  Now most of these enterprise storage class arrays have built in batteries to write all items in the cache to disk system in the event of a power failure; but that is out of the vSphere control.

So how do you maintain storage consistancy and data integrity?  The simple answer is use enterprise storage and ensure that the battery backed cache or UPS for the array can either outlast the power outage (i.e. generator power up or utilities being restored).

Options like FUA (Force Unit Access) are not feasible since modern HBA’s, RAID controllers, SAN’s and file systems strip this control bit from the IO, also to use FUA for this each and every I write IO would require the FUA bit to be set.

vSphere 5 vRam Licensing Changes

When VMware released the vRam pricing models and limits last week there was quite an uproar over the limitations being set to low.  It appears that VMware was listening and has responded with a hefty increase in vRam allocations.

vSphere edition

Previous vRAM entitlement

New vRAM entitlement

vSphere Enterprise+

48 GB

96 GB

vSphere Enterprise

32 GB

64 GB

vSphere Standard

24 GB

32 GB

vSphere Essentials+

24 GB

32 GB

vSphere Essentials

24 GB

32 GB

Free vSphere Hypervisor

8 GB

32 GB*

 

That is nearly double the entitlements.

Also changing is how the usage is calculated for a high water mark (think bursting for a day) to a rolling 12 month average!

Finally the last news and some major news at that, is that a single VM can only consume 96GB of vRam, no matter how you set the VM up!  (i.e. give a VM 1TB of RAM but only have 96GB count against the vRam pool!)