Okay, so let's go ahead and get started. Others can join and participate as they want. I'm thinking, depending on the Q and A that comes out of this today, this may not take the full half hour. So, might be able to give you a few minutes back at the end of this. So welcome, everyone, to the DPA: Beyond the Basics - Episode One. And for today's installment, we're going to be looking at the virtualized view deep dive. So, just so everyone's aware, this is actually the first session in a deep dive series that we're going to offer for different features for DPA. And we're hoping to show you— the idea was to give you, give you a better idea how to get the full value out of the product, and I personally hope this helps you achieve that potential. So, that's the idea.
A little bit of housekeeping up front. We are— I think it makes sense to kind of focus everything, or put all questions, into the Q and A section. I know there's a chat section as well, but I'd rather just monitor one of them. So let's just use Q and A, and that way, everybody else can for sure see it and maybe get— have the advantage of seeing the question and hopefully the answer as well. I will try to pay attention to it, so I want to— if I see that anybody has a question, then I will certainly try to answer in real time. If I don't get to it real time, I'll certainly follow up after the webinar here.
Okay, so real quick before we start the actual webinar, there's a question in the polling that I'd like everybody to look at, and if you wouldn't mind just letting me know if you have database infrastructure on anywhere other than, let's say, physical servers. So could be VMware, Hyper-V, Oracle Virtual Machine, and then also cloud-based. We'll throw those into virtualized infrastructure as well. So I'll give you a minute to vote on those, and please select all that apply. Okay, so hopefully everybody's voted or is voting on those.
So first of all, thank you very much for stepping away for a few minutes to take a deeper dive into the virtualization view that DPA provides into your infrastructure. And really, in this case, specifically we're talking about VMware Hypervisor. So, my name is Rob Mandeville, and I'm currently the PMM, the Product Marketing Manager, for Database Performance Analyzer, DPA for short. So even though I recently moved over to marketing, I've been at DBA doing production support, development, data modeling, you name it, for over like 15 years, probably closer to 17 at this point. The most recent five years, I've really been focused on database performance. So that's what's kind of near and dear to my heart right now.
So as I mentioned, if anybody has any questions, go ahead and throw them into the Q and A section, and I'll try to get to them real time. So if you are like most DBAs in today's world, you're likely running your databases on a virtual machine. Now there might still be some legacy, like physical servers out there and stuff. But actually, when I sat down and started thinking about this, I asked myself, "When was the last time I actually went and touched a physical server?" So for me, it's been--I think it's over 12 years now. So it's definitely been a while. Even though they might have been running on a physical server, the last time me, as a DBA, had to walk down and actually interact with it or install software or do anything like that— yeah, long, long time.
So, I recently was doing some research on this and found some statistics that showed when IT shops did an audit of their physical servers, many were actually running at an average of less than 10% CPU utilization. Many of these were also way underutilized when it came to active memory. The article mentioned server sprawl was a real thing. It was getting crazy there for a while, and it was really driving up cost. So along comes VMware and other hypervisors, and due to many factors, it took off like crazy, right, just great success. And some of the primary reasons for that were the underutilization of physical resources, as I mentioned, the speed to provision new VMs, the commoditization of physical hardware beneath the hypervisors. So, it really got a lot cheaper to purchase that physical hardware and scale up. The centralization and ease of administration.
So, talking about that server sprawl and stuff, it was way easier now to get your arms around kind of what all's out there in the wild. You get built-in high availability and fault tolerance. So, just a side benefit. And you get some raised floor consolidation. So, for any of us that are kind of green-minded and stuff, there's definitely a benefit to the carbon footprint. But probably, and this is just a guess, but probably the biggest reason is cost, right? Through all these other reasons, you could actually get your arms across cost and lower them for a total cost of ownership of your data center infrastructure. Going back to, really quickly, to the speed of provision.
So I've been in the industry for a while, and I remember on the physical host, you could ask for— well, first of all, it took forever, and like an act of God to get new hardware approved just because it was expensive and took a while and stuff. But once you put your requisition in for new hardware, it could take the vendor two weeks to actually deliver on that and have it shipped to you. Then when you had it shipped to you, anybody that's been around the industry for a while probably remembers, then you put in your change request. So, you've got to get this window of opportunity where you're going to take an outage to add this new physical hardware. In doing so, now that'd be the late night, early morning, and kind of change window, Friday or probably actually Saturday at like 1 AM was mine. Just made for a long, long weekend, and that had to be scheduled out way ahead of time to make sure that everybody got notice. Hey, we're going to be taking the systems offline and stuff. So really from the time of requisition to the time of implementation, could have been a month. And you think about that compared to provisioning additional hardware resources in a virtualized world, it's in a matter of seconds, right? So huge advantages to going virtual.
On the topic of virtualization, if anybody out there in our audience does not have access to see past the guest operating system or the virtual machine operating system, I strongly encourage you to either get access--and I'm going to talk in terms of VMware--but get access to vSphere or vCenter or some other method of seeing into that infrastructure. Because this is going to be really key in areas of making sure that you're not being impacted by, let's say, a noisy neighbor. Another guest operating system that's running on your physical host that's consuming more than its fair share of resources. You need that information.
Okay, so, that's a little background on why virtualization took off. And again, most if not all of our database infrastructure is likely living on a guest operating system. I believe that DBAs were likely the last holdouts to go P-to-V, so physical to virtual, mostly because when early versions of hypervisors came out, they really didn't do that well with IO-intensive applications, and well, guess what? Databases are just that. They're very high IO intensive. So, the days of poor IO performance for hypervisors, that's really in the past for the most part, right? Now we're back to limitations of the disk architecture itself. So, you know, the different Raider RAIDs and stuff like that. That's probably more likely the— more likely to be the performance impact hit with regards to IO performance. And when I say that, I'm thinking like, the parody calcs that come into play with riot operations on RAID 5 and 6. But with the advent of SSD technology, IO performance in general is becoming much less problematic. So, again, hypervisors just make sense.
So if anyone out there has actually implemented that, and let's say you have had access or gotten access to look at vSphere or vCenter— one thing that I noticed when I started poking around in there, there are a ton of counters and metrics to look at. So if anybody's been on a Windows box, open up, like, the WMI counters. On the Windows servers, you got all the classes; you got all the individual counters underneath all the classes. Now basically double that, because now that you're going to have metrics for both the virtual machine, so the guest operating system, and the physical ESXi node. So there is just a lot of stuff to sort through.
So a while ago, a team of DBAs partnering with VMware and some other DBAs that existed out there in our customer base, so out there in the wild, they took at all these metrics, and they actually determined which ones would be key to focus on as a DBA. So thankfully and luckily, they've boiled the ocean a bit and helped to zero in on which counters would likely tell us, as DBAs, that we have a performance issue. And even further, what layer that issue exists at. Right. Is it at the database tier? Is it at the virtual? Is it at the physical, or is it at the storage tier? So all the way down to the data stores. Let me flip over my view here real quick and share my other screen.
Okay, so hopefully everybody can see my other screen here, and this is the virtualized view within Database Performance Analyzer. So, we still stick to the waits as our primary fact. You know, it's kind of our focus that's, that's the wait-based analytics, the response time analysis. And then under that though, we've implemented this layered, or tiered, correlation to database instance-specific metrics. The next tier down is the virtual machine or guest operating system metrics. Underneath that, we've got our physical host, so that's our ESXi node. And then last, but certainly not least since we are IO intensive, we've got our storage tier. So that represents all of our data stores. So the idea here being that we want to understand if I'm having performance issues, like let's say a real, you know, a significant increase in waits, like, let's say around April 23rd or 24th, what's the driving factors? What's causing it? Is it something at the database tier that's driving it? Is it something at the virtual machine operating system? Or am I being impacted by maybe whoever I'm sharing the physical host with? You know, are they, are they causing some of the other metrics or some of the other performance, basically, for my end-users of that database or that application to experience these issues?
You'll also note that we've got the different categories here. So we've got the summary, we've got CPU. And I can toggle from one to the other to clean it up a little bit and get a better idea of, you know, if I suspect it's actually a disk problem, I can just kind of clear up the other CPU and memory metrics and get the, get a good view of my disk metrics at all layers again. So some things to focus on, and I'm just going to talk through these a little bit, but when I mentioned this team of people working with VMware as a partner and getting some great education from them to come up with these metrics that I want to focus on, here's the ones that we came up with. So from a CPU perspective, you really want to be able to tell if you're experiencing CPU ready time. And some people may say, "Well what is CPU ready time?" CPU ready time is when your guest operating system, your VM, is asking for CPU cycles from the physical host and not getting it. Right, it's waiting in a queue to get actual CPU cycles. So that means your physical host is over-provisioned, right?
Other metrics we can look at are CPU utilization as a whole both on the guest operating system and also at the physical host, the ESXi node. You want to understand how your CPU is configured and if there's any reservations. So in other words, this is more of an issue with memory. So I'll tell you what, I'll hold off and talk about reservations when we get down to memory. You also want to make sure that there aren't any artificial limits placed on your CPU. And there's this concept of shares. Now, shares actually represent-- you can think of it like voting shares or whatever. It basically represents the prioritization of your virtual machine to get access to specific physical resources. That's the best way to think of it. So we're not going to kick anybody off of CPU cycles if they're actually processing. But what we will do is if our CPU starts to back up and there starts to-- CPU requests or processing requests start to queue up or backlog, then if I have the highest number of shares or the highest prioritization, I get to cut to the front of the line. Right, so again, we're not going to actively kick anybody off, but we do get to go to the front of the line.
So with memory, a few metrics that really want to key in on and things that are important, ballooning and swapping, right? Swapping is actually worse. ‘Ballooning’ is when you grow outside, probably just temporarily, but you grow outside your initial memory footprint, so you're requesting from the physical host to allocate more to your virtual machine. Swapping is a little bit worse, not if you're the one that's grabbing it, but if you're the one losing it. So swapping means the hypervisor will actually steal memory even if it's active to give to somebody who has a higher prioritization, higher shares.
Okay. The configured and reservations that I mentioned with CPU, this one's probably good from a perspective of making sure that... You want your reservation there because you don't want your database server to start up unless it's guaranteed a certain amount of memory. Right? So we've all probably configured our caches and our, our buffer sizes and stuff like that appropriate to our database to get maximum performance, but you want to make sure that your VM can actually grab that physical resource from the ESXi node when it instantiates. Limits are another good one. Again, if you artificially-- you could configure memory to any amount that you want, but if you place a limit on it, what you're actually telling your VM is you can't grow beyond this. So even though I have it configured at, let's say eight gig or something like that, if I have a memory limit on my virtual machine at two gig, it's kind of misleading, right? Because if I'm using 1.99 gigs, then if I thought that I had eight gigs to grow into because that's my configuration, I should only be like 25% utilized, but actually I'm real close to 100% utilized. So got to be careful with those limits. And again, you have the prioritization, the shares. From a disk perspective, we'll kind of... Whoops. Should've toggled on the memory there a little bit.
Really, probably the biggest concern for DBAs is latency. Right? We could talk volume and usage, but unless we're hitting some kind of physical limitation or ceiling, the latency is probably the biggest one that I would want to focus on. You want to make sure that when you do have to do physical IO that it's performant. That when you're asking for that read or write operation to occur, that it's coming back to you in a decent amount of time. So, and really when you think of it, when we're talking about a virtualized platform, everything is a file; everything has to be a file on shared storage.
So, slow IO means slow VM. Okay. From a network perspective, this one's pretty simplistic, but probably the one that I would want to focus on most would be anything that's dropped. So any kind of dropped packets. Okay, so as you can see, we-- after boiling the ocean, this is how we kind of determined what would be important to bring into DPA so that a DBA could quickly and easily see any correlations that might be occurring between any kind of metrics regardless of the layer and performance issues. So again, if I go back to Summary here, we see that on April 23rd and 24th, our wait times increase dramatically, pretty significantly. And let's go ahead and maybe drill into the 23rd and get maybe a little closer view of the day view. And we can start to see where things were kind of spiky, right? We had some database metrics especially Round Trip Time. And if I click on that, it'll bring up the actual values. So here's where we were probably experiencing, you know, some internal network issues. I'm not sure exactly what the issue was, but you can see there that we are over a second in round trip time. So that's, that's purely network latency. And I don't know about you guys, but in computer terms, a second for network latency, that's an eternity. So probably a red flag there and one that would prompt me as a DBA to probably go have that discussion with my network admins.
Right? I know, I know, we don't get to blame the network guys that much anymore because, well, network bandwidth has grown huge over the past decade or so. GigE, 10 Gig. I mean it's ridiculous the amount of bandwidth that we get, but it does occur. You know, we've got-- you've got all kind of switch and router failures and ASIC card failures, flapping, link failures, stuff like that--so could definitely still be an issue in the environment.
Okay. So again, we start with our instance level. We start with the waits at the top. Right? And you get to see--okay, there might be physical resource issues, but we might not be impacted by them. So it doesn't mean that I'm not going to go talk to the network guy or the storage guy or my sysadmin or anything like that. But if I am experiencing some kind of resource spike, in latency especially, or some kind of pressure, then yes, I'm going to go talk to them, even if my end-users aren't necessarily feeling the impact. And the reason I'm going to go talk to them is that I want to be proactive. Right? I want to make sure that my end-users don't start suffering as a result of this spiky behavior. Okay. And then under that we've got our DB instance kind of metrics here. So we've got Wait Time, we've got-- or sorry, IO Wait Time, Page Life Expectancy, CPU Utilization, Round Trip Time, stuff like that. That's all detected from the, the database tier-- so, from our RDBMS engine.
Under that, we've got our guest operating system. So a lot of the things that I mentioned earlier-- the Swap Rate, the Active Memory Usage, CPU Ready Time, things like that. Underneath that, we've got our physical host so we can tell how our ESXi is performing-- especially from, let's say, a network perspective-- CPU Memory, and Write Rates and Read Rates-- so, kind of disk operations. And then, depending on what data stores you have formed up to your virtual machine, to your ESXi node, how they're performing. Right? So you can do it even at the, the volume level. Okay, perfect.
So hang on one sec. I'm going to jump over to Q and A real quick. See if there were any questions. Oh, okay. So one question in Q and A was, 'What screen version is being shown?' And I think it's right at the very top here. So let me scroll up. Okay. So this is version 11.0.384. So DPA 11 just came out, it was a little over a month ago now. So this is GA, and it's available in everyone's portal if they want to download and upgrade. All right, thanks. Yeah, so this really allows you to see and determine, 'Am I being impacted from an end-user perspective by a noisy neighbor?' Right? Again, if I'm at the physical host, is there somebody else consuming maybe a lot of IO bandwidth or throughput? And is it something that I need to be concerned with? So this is a great way to tell at which layer I'm being impacted by.
So it looks like I've got another, let me extend this window. There might be another question here. My operations group refuses to give me access, read only access, to vCenter. Any pointers to good arguments I can use to convince them? Yes. So I've actually run into this myself personally, and one thing that I always said was you don't necessarily want me running to you every time there's a performance issue so you can check to see if you're at fault. Wouldn't it be much more efficient, much better, to be able to have me self-determine where my issues are, or if we even need to have a conversation, right? Because that can really streamline the workflow. Plus, there's another view here that I'm going to show you, and it's this VM Config. This is something that's really nice. It's one of the, my favorite views within DPA, the virtualization side of it, other than the layered stack. Because here I can see what my virtual machine has been provisioned with as far as resources go. So I can see my CPU, my memory, stuff like that. I can see what's been allocated, and I can also see what shares I have. I can see what's available, at my disposal, from a physical host perspective. But this Host VMs--this is my favorite. This shows me what all I'm sharing with. So all these other virtual machines are living on my physical ESXi node. So are they playing nice? Maybe, maybe not. How are they prioritized? So do they have a higher number of shares than I do? And if they do, if I've got a mission critical database server--I want to make sure I'm prioritized pretty high, right--both from a CPU and a memory perspective. It can show you how everything is allocated here, whether it's powered on or powered off.
And if you look here, if I go back to host real quick, I can see that I've got, you know, 24 logical processors. So when I go back to Host VMs, you know what, if I add up all these vCPUs, I'm pretty sure they're going to tally up more than 24. And that's what the idea that a lot of times it will be over-provisioned. You've got to be a little careful with that in production. But the idea being that not all VMs are going to max out at the same time. Right? There's going to be different peaks and valleys throughout the workday that hopefully won't push my physical hardware to some kind of contention or physical limit. Okay. But this--now at this view, I get to see everything. Right? And if I do have a performance issue that is related to the hypervisor or some kind of physical resource, then this makes the conversation that I have to have with the VM admins so much more intelligent. Right? We can both look at metrics. We can both have a very meaningful conversation. And the problem becomes, or the proposition now becomes, less as much, 'Hey, where is the performance problem?' And it becomes much more, 'Hey, we know where the performance problem is. Now what are we going to do to address it?' So that's a very different conversation, and I'd much rather have the second one.
Okay. So another question that came in was' 'I noticed that Page Life Expectancy shows a certain setting in DPA graphs, but a different value in Windows.' So if I go back to the layered look here... So here if I look at Page Life Expectancy... If this is a question specifically for something within your environment, I would probably recommend--if there's, if there's any kind of discrepancy between what you're seeing in DPA and maybe what you're seeing within the Windows operating system, like maybe a WMI counter or something or a PerfMon counter-- we're viewing it from the standpoint of, let's say, and I'm going to go out on a limb here, but I'm assuming this is SQL server since we're talking PLE... We actually query SQL server for that metric. So what we get back from SQL server is only as good as what the answer comes back with from the, the RDBMS. Okay, perfect. So it is SQL server. So having said that, yes, there may be differences, but that's probably more of a Microsoft question because again, we're just pulling what SQL server's telling us. Now, what I would say is go ahead and open a support case or even, you know, chime in on there, out there on THWACK and stuff, which I'm very active in the THWACK community, so if I see that, I'd be more than happy to share our PLE calculation or query with you, and that way you can go after the raw data to help support your case. And maybe take that back to Microsoft and say, 'Hey, I'm running your database on your operating system, and they don't agree.'
All right, cool. Next question: Because I'm using Hyper-V, does that mean I will not benefit from this part of the monitoring of the host? Currently, that is correct. So currently, DPA only supports VMware as a hypervisor. Now we do have other products, specifically SAM, Server and Application Monitor, and Virtualization Manager who do support Hyper-V. So there are other SolarWinds products that do support that hypervisor but not currently DPA, unfortunately.
Okay. 'So give me SA access, and I'll give you read only to vCenter.' Okay, I think that was kind of a joke, but... How about not SA access? How about just read only access to the database? Actually, even then if you're not familiar with writing SQL, that can even be a dangerous proposition. You know, a poorly constructed SQL with lots of joins and, or maybe not joins, right? Maybe the joins are not there, and it becomes a Cartesian product or maybe there's no WHERE clause. You could still do a lot of damage with even read only access.
So another question: 'Can I change the scale of the graphs for Page Life Expectancy and other counters to match a realistic scale for my environment?' So... Okay, so, it looks like your PLE ranges from 500 to 2,000 milliseconds. Is that correct? That's probably seconds, right? I mean if it's 2,000 milliseconds for PLE, that's, that's actually pretty bad. So just so everybody's on the same page, Page Life Expectancy is really how long a page is expected to, or does, live in cache. And... So the lower the number, the worse. Now, ideally, everything would live in cache 100% of the time. But since most of our companies won't just give us unlimited credit card or unlimited expenditure... I know, it's silly, but since we don't have all the money in the world, we probably are limited to a certain size of memory. So we have to do some IO back to disk, right? We certainly have to do it with the transaction log, but even with the data files within the buffer pool. So having said that, the longer it lives in memory, the better. If we're constantly aging out pages to read new data in, you probably want to look at adding memory, or expanding the max server size, or increasing the buffer pools, depending on your database platform.
Okay, cool. So I think we've had... oh, you're welcome. Let's see. 'I want to tie DPA with SAM so I can see a fuller picture of what is going on in my environment. Do you have an example of how to do that?' Yeah, thank you for clarification on the seconds. That was correct. Yes, in fact... let me go out to our demo environment real quick. And if anybody does have other products on the Orion platform and wants to integrate the two-- now since this is a demo environment, I probably don't have admin access, so I might not be able to actually do this, but we can see.
So under Settings, if you go to All Settings, and then--just so everyone's aware, DPA is a standalone product currently. It is a Java application so we were an acquisition a few years ago, a couple years ago, by SolarWinds. And having said that, it's not-- it doesn't actually ride on the Orion framework. So here within the settings, we have a plugin module, or an integration module. And where's my database? There we go. So if I click on this, okay. So it's going to kick me out, but basically, once I click on that, it'll bring me into an Integration Wizard that allows me then to enter information like metadata about my DPA instance, including the IP or server name as well as the port. And we default to an SSL port, so an encrypted port, which is 8124. And then you do have to supply it with some admin credentials.
Okay? But once you do that, Orion or SAM will go ahead and make SOAP calls over to, to the DPA instance and pull back data. So you'll start to see database information integrated to within your Orion platform. So I know that we have some great Success Center articles on doing just that, and How-to. And in fact, I think there-- well, currently, we have scheduled another one of these deep dives, another one of these Beyond the Basics here. Scheduled for June 20th. So the topic for that one right now is going to be Orion integration. So we'll cover exactly how to do that and we'll show you the advantages of doing that.
So that brings me to a good point. I know we're kind of at the top of the hour here or top of the half hour, whatever it is. So I want to be respectful of all of your time for sure. But I do want to give our plug for our next sessions. So we're currently planning one in May that is going to be a deep dive on alerts. I'm going to have one of our existing customers join me and cohost it, which'll be pretty cool. So Ed is going to show us how he uses alerts within his environment and also give some stories about how he was able to really leverage them to move from more of a reactive team to a proactive team-- and really start to change the culture or how the DBA team was viewed within their company. So a very, very positive story. So we got that one coming in May. We've got another one coming in June that I just mentioned on Orion integration.
So I think at the end of this webinar, there's going to be chances to register for those. But if not, or if you don't have time to do that now, we will be sending out emails also so please look for those and we'd love to have you back for another deep dive. So at that, I would like thank you all for attending. There will be a brief survey to help us tailor these a little bit better in the future or see what you might be interested in hearing about, from a DPA perspective. I think one of the questions is redundant. So it was also done in a poll, so sorry for that, you just get to fill it out twice. Okay, so at that, I want to thank you very much for stepping away with me for a few minutes and listening to this deep dive on virtualization view. All right, thanks, everyone. Hope you have a great day.