Archive for December 10th, 2008

Virtual Instance Performance Revisited

Virtualization, Fine, Well Sort Of? – Chapter 08

This is a revisited article, not because of a correction or change of view, but advance a topic that I have always indented to revisit, but never seemed to have time to do so, until now. A loyal reader of this blog reminded me of this fact recently, so I am honor bound to resolve the gap or lack of continued discussion on this topic. Oh, I am speaking of virtual instance performance of course, as the title notes. Unfortunately, virtual instance performance is a complex topic that gets into the swampy weeds full of tangles and hidden snags, faster than water and dirt make mud.

In part I of this topic, I discussed the context of performance, to which, Peer-to-Peer or Inter Virtual Instance Performance, which is what the host infrastructure reports. I will not rehash this topic here, but it is important to note that only the host infrastructure can accurately report virtual instance performance. Also in part I of this topic I referenced, Host to Instance Performance, Host to Host Performance, and Host to Cluster Performance. Respectively I will summarize each concept, but for more detail, refer to part I of this topic. In brief, Host to Instance Performance, is overhead of the host or host impact to performance, what does your hardware and hypervisor cost in reference to performance. Whereas, Host to Host Performance is which host executes which virtual instances the best, all things beyond individual instance deltas being equal. Moreover, Host to Cluster Performance is one step short of cloud or grid computing modeling, focusing on which host in a given cluster is the most efficient given a known set of virtual instances. This is important, when you consider data center globalization, to which hosts should be consistent, and so should clusters of hosts, across different datacenters, for example.

Now if you are tracking all these This-Versus-That models above, then you will realize that one model is missing, which it is? Give up? Cluster to Cluster Performance! There is a good reason for this; I neglected it in part I, my bad. As life cycle and management tools have approved over the last year or so, this is a viable and significant performance model, especially when you have heterogeneous hyper-visor based environments. Consider VMware versus Hyper-V, or Xen versus VMware, or Xen versus Hyper-V? Obviously if you offer a class of service aspect to your virtualization, you need to be able to compare different virtualization infrastructures in real time, with little or no explicit normalization. I for one, hate normalization, it is often abused and biased to a specific or narrow criteria set, so normalization devalues the analysis and results. But I digress, Cluster to Cluster Performance is beyond the scope of this specific article, but will be discussed in the future, did someone say Virtual Instance Performance Part III?

But the title is Virtual Instance Performance Revisited, and so the key to all performance evaluation starts and ends with the virtual instance, this is the corner stone of virtualization, be it application instance, virtualization container, or operating system isolation based. The vast majority of tools available for virtualization performance evaluation focus on the virtual instance of course, since the goal is, to always have the fastest instances possible, given the constraints of the associated infrastructure. The last comment begs the question, what constraints? Well, these are discussed extensively by virtualization gurus over and over, including processor context switching or processor cycle loading, memory IO, disk IO and network IO. It is quite common for various hardware vendors to focus only on one or two of these constraints and publish misleading or flat-out inaccurate statistics declaring they have the best or fastest virtual instances in the known universe, only on their respective hardware of course. Bah Humbug! They even normalize their results in comparison to their competitors to prove the point that they have the best hardware. Bah Humbug, Again!

Unless you evaluate virtual instances only under severe load for all four (4) constraints, and inclusive of these constraints, you are not doing your clients or yourself right. A classic garbage-in gospel-out (GIGO) scenario if there ever was one. Virtualization abuses hardware, it is and always does this, this is by design, and after all, virtualization is attempting to fully utilize resources that are often unused or wasted, no? So the single most import issue with virtualization instance performance evaluation is the selection of the tools, not I stated, tools, to do the evaluation, Ah Ha! Bet you did not see, or read, that one coming now did you? VMware VMark, vCompute, IOMeter, etc., all have their weak points, you must understand these limits or issues before you design your evaluation criteria and methodology. Consider this, if your specific testing for virtual instance performance testing is only looking at processor loading and memory IO, are your clients not going to be unhappy when their network IO and disk IO results are horrible? Did you analyze your environment right? Did you evaluate your proposed environment right? If the virtual instance evaluation for performance is skewed, then your entire environment performance evaluation for Host or even Cluster scope performance will be horrible.

Now it is time to get into the weeds, and get mud in between the toes. Now that we know that we must test all constraints explicitly and inclusively, and we must test at the virtual instance scope before all else, what do we do? The virtualization gurus will argue over this, but below is what works for me.

  1. Establish a control. Establish a performance history baseline. If you are testing virtual instances on a new hyper-visor, or new hardware vendor, do the exact same test on an environment you already understand. If you have HP, and test on Dell, don’t normalize your results, just make sure you understand that HP and Dell are different, and make sound inferences based on the raw results. If you can test on 3 or more hardware vendors at the same time, or have historical data using the same tools and methods, you don’t need to normalize the data. Normalization is for management and others that do not know how to analyze resultant data.
  2. Processor and memory differences, including changes in caching speed and size of buffers, are often a shifted scale comparison, so normalization is not needed. This is also true of power consumption curves. This is rational and logical, since network and disk sub-systems should remain consistent for a longer period, so by definition the number of factors to be compared can be reduced if the sub systems remain consistent, including, of all things, the PCI bus architecture per host. Production performance data always trumps lab data. So if you have HP and Dell in production, and are evaluating IBM, use the production data as the baseline or control, then test HP and Dell with the newer tools or methods, or processors and memory, etc., then and only then, test IBM. Bingo! No normalization is required. I can just hear the slick stylized marketing types for all the various vendors crying over their iced-mocha-lattes, when they find out I always reject normalization based evaluations by default.
  3. Always run the same test, in the same environment, at the same time, with the same characteristics. This is just basic common sense. However, don’t be surprised when you see something that does not make sense. Iterations are key to the entire evaluation effort. Remember that basic statistical analysis requires that a sample size of 30 or more is needed to get to any standardization and variance deviations accuracy. Every time a change is done, changing the experiment is done, and performance evaluation is an experiment. Think scientific method all the time when doing any performance evaluation, be it in the lab or otherwise.
  4. Make sure you understand where and when you can introduce error into the results. The only way to do this is through peer review, getting more eyes on the proposed test plan, is the significant objective. Everyone sees the same process with or from a different perspective, whereas tunnel vision is evaluative death. Sometimes eating crow at the beginning is better than getting heart-burn while coughing up feathers at the end of an evaluation effort.
  5. Control expectations. Data often goes around the world faster then the executive summary. Expect that someone, somewhere, will take the evaluation tools and methods, as well as the results out of context. Results will be challenged, be prepared for it. Don’t defend results, only explain how results are generated and analyzed. Vendors hate this, and often forget this point, when they sponsor or quote so called independent analysis, focusing the resultant explanations as the authoritative final qualitative statement, when the raw data objectively discounts or obviously points to other conclusions. Normalization often hides the true results.
  6. The developer of the given virtualization environment is the start of the process not the end. Do not rely on the developer tool set, nor what a given vendor demands as the only acceptable tool for analysis. Of course the vendor has tuned the given tool or methodology to illustrate the strengths of the platform in question. Would it not be a wonderful world if HP performance tools worked on Dell and IBM, and Dell tools for same, worked on IBM and HP, etc., etc. Would make for some interesting evaluations no? Or Fabric tools worked on FCoE infrastructure, and iSCSI tools worked on FC infrastructure? Sounds insane? Not so. Generic tools sets exist, independent tools exist, use them. Even if every vendor in the world has used VMmark, VMmark means nothing to Hyper-V.
  7. Repeat, repeat and repeat, change only one thing at a time, for example, only change loading of one constraint at a time, be it processor loading versus memory IO, versus disk IO or versus network IO. Use the same dataset or streamed sequence for each test. Never change the dataset or streamed sequence between iterative testing for a given factor. Complete an entire set of tests before mucking with the variables beyond the planned test set. Could be considered a repeat of the point above, about running the evaluation in a consistent manner, but it is so important, it if it is a repeat, so be it.

Well, at this point, I am sure someone is yelling…But he has not told us anything useful yet? What the Heck?! Not true. It is true, I have not spelled out an explicit methodology for evaluation, as a do this, do that, then do this scenario. To do that, cough, would be to create a bias that should, no, must be avoided. But to be fair, I will summarize things a bit, and recommend a best practices approach.

  • Analyze the virtualization environment, focus on the virtual instances first, look at processor loading, memory, network, and disk IO loading, create and execute tests that stress all constraints as applicable to your expected needs, and well beyond your expected needs. If the majority of your virtual instances are encoding unique video data creating results, expect lots of disk IO, if your virtual instances are web servers, expect lots of network IO, etc. Be smart in your evaluation design for performance.
  • Remember, virtual instances are the corner stone of all evaluation, Hosts and Clusters have their own performance characteristics, but they are impacted or resultant based on the virtual instances. Dynamic resource sharing, high-availability, etc., are wonderful features, but mean nothing if individual and grouped virtual instancing performance is not understood. The goal is to have most of the virtual instances perform well, most of the time, nothing more. The number of instances, the number of hosts, the number of clusters, even the number of virtualized datacenters, if it comes to that scale or scope of evaluation, will be obvious and straight-forward, if the methodology and tools used are sound according to the virtual instance modeling.
  • Performance evaluation is a living breathing animal, and should be viewed as dynamic and experienced based, no pun intended. Nothing in virtualization is static, so allow and expect the methods and tools to be flexible and adaptive to the effort at hand. This is not to say that change is good for the sake of change. Only change tools and methods when it makes sense to do so. Never change technique in the middle of an evaluation effort. To do so is statistical resultant evaluation suicide.

Add comment December 10th, 2008




Calendar

December 2008
M T W T F S S
« Nov   Jan »
1234567
891011121314
15161718192021
22232425262728
293031  

Posts by Month

Posts by Category