Thursday, July 9, 2015

Troubleshooting Performance Bottlenecks with Per-VM Monitoring

Nimble Storage has offered Per-VM monitoring within InfoSight since April, included without requiring the purchase of an additional option or component. All that is required to enable Per-VM monitoring is to register your Nimble Storage array with vCenter (Administration > vCenter Plugin using the array management interface if you have not already) and enable Stream Data in InfoSight (Administration > Virtual Environment in InfoSight). Per-VM monitoring, or Virtual Environment, can be found under the Manage menu item in InfoSight.

The first thing you will notice is an inventory tree on the left with icons for Hosts and Clusters, Virtual Machines, and Storage.

Next, you will notice the content section with headers for Host Activity, Top VMs, Datastore Treemap, Inactive VMs, and Nimble Arrays.
  • Host Activity provides a list of your vSphere hosts and their recent performance metrics
  • Top VMs lists the ten busiest virtual machines over the past 24 hours by I/O and latency
  • Datastore Treemap displays heat maps to compare the performance of virtual machines
  • Inactive VMs lists all virtual machines that have not generated any I/O in the past seven days
  • Nimble Arrays provides a list of Nimble Storage arrays registered with vCenter
All of the reports are pretty self-explanatory, but Datastore Treemap may be the most unique and beneficial of the bunch. The heat map design sizes virtual machines by total I/O, then colors the unit based on observed latency and groups virtual machines by datastore.

Each square represents a virtual machine. This enables us to see which virtual machines are producing the most I/O and easily compare them to the other virtual machines with which they share a datastore. The more red the square, the higher the average latency; hovering the cursor over a square displays a popover with the detailed figures, and clicking on the virtual machine name in the popover will provide the historical performance details of the specific virtual machine.

Now we can adjust the timeframe to do something like narrow down to a time of reported slowness. In this example, we see that the primary factor for the spikes in latency is network bottlenecks. As we look at the spikes, we also notice that they always occur on a Saturday - which also happens to be the day that we perform full backups of our environment.

Below the graph of Virtual Machine Latency, we also see graphs for: Host Performance, Datastore Analysis, and Active Neighbor Analysis.

Datastore Analysis

Active Neighbor Analysis

1 comment:

  1. Really good info. I've mentioned the same results in my analytics. By the way, have you ever tried using some dataroom services - a cloud technology for business ?