Monitor your cloud infrastructure
With Cloud Infrastructure Monitoring configured and cloud instances selected, begin monitoring your cloud environment using the Cloud Summary page and instance details pages. The Cloud Summary page includes information to quickly pin-point cloud instances encountering issues with alerts and events,
Resources on these views provide real-time and historical data for monitored instances.
- Monitor metrics through a Cloud Summary page.
- Drill-down into instance-specific pages for system information, performance metrics, and troubleshooting issues.
Review attached volume health and general data displays for the environment.
Volume details pages are not available at this time.
- Create and maintain alerts to track issues and events in your environment.
- View your hybrid environment for cloud, virtual environment, and on-premise data in one place. In the Orion Web Console, AWS cloud instances display in the Cloud page and virtual environments (including VMware, Hyper-V) display in the Virtualization page.
Manage cloud instances as nodes to use additional Orion Platform features. Managed nodes consume VMAN licensed sockets and SAM licensed component monitors. Use Orion features to manage alerts, reports, and resources.
When managing cloud instances as nodes, SAM includes additional options to further manage cloud instances, applications, and OS:
- Manage cloud applications and OS as nodes on cloud instance nodes. These nodes also display in the AppStack for at-a-glance troubleshooting across your environment.
- Poll specific OS metrics beyond basic AWS metrics including cloud instance memory (RAM) and additional metrics using SAM application monitors.
- Use SAM application monitors and templates to poll applications deployed in the cloud.
- Develop and deploy custom script monitors for Powershell, Nagios, Linux/Unix, and Windows.
For information on managing cloud instances as nodes, see Manage the cloud instance as a node.
Application and OS data seamlessly displays for health status, alerts, and metrics returned for SAM templates and component monitors SAM. For details, see Create cloud application monitors and templates.
VMAN recommendations do not monitor and apply for cloud infrastructure monitoring. Recommendations trigger and provide actions for virtual environment specific systems including vCenter and Hyper-V.
Review and troubleshoot cloud metrics
Select My Dashboards > Cloud to review overall AWS infrastructure data and cloud details. Use the Cloud Instances Status Summary and Cloud Server Infrastructure resources to review status and health at-a-glance. To quickly review cloud status, metrics, and node management for a cloud instance, hover over any cloud instance name in the Orion Web Console.
The tooltip provides quick information for the cloud service and status.
When managed as a node, the tooltip provides enhanced data.
Any cloud instances encountering issues display in the following resources with linked instances and nodes to investigate:
- Active Cloud Alerts lists all active alerts affecting monitoring and managed cloud instances.
- Cloud Applications with Problems lists all applications with issues on cloud instances managed as nodes in SAM.
Select a cloud instance to view the Cloud Instance Details page. This page displays for monitored nodes, or as a Cloud tab in a cloud instance managed as a node. Any exceeded thresholds show in warning and critical values, charts and graphs with hover over points to compare all collected data, and linked alerts.
The following resources provide important data for determining issues and tracking performance and usage trends:
- Active Alerts lists all active alerts affecting the cloud instance.
- Min/Max/Average of Average CPU load displays average CPU load collected and calculated for the cloud instance.
- Min/Max/Average of Network Utilization provides a chart of the minimum, maximum, and average bits per second transmitted and received over a cloud instance for a custom period of time.
To better manage and troubleshoot your cloud instances and volumes:
- Troubleshoot cloud monitoring with Performance Analysis comparing metrics, data, and logs across collected for nodes and Orion Platform products.
- Create cloud application monitors and templates for SAM managed applications, OS, and cloud instances as nodes with out-of-the-box component monitors and custom scripts.
Cloud service APIs, such as Amazon APIs, capture data for instances, volumes, and OS specific metrics. These metrics differ with OS metrics due to the fluid nature of cloud computing.
- Cloud allows you to allocate resources as needed and on-demand such as partial CPU processing and disk space across multiple. These resources can change through direct interactions and automation. When EC2 reports data, it calculates the percentage of assigned resources shared between instances.
- OS metrics directly capture values from the core system, not the assigned amounts. This data does not calculate shared resources or other users attached to the instances and volumes. This data directly displays the actual usage at a polled point in time.
Both values provide insight into potential and actual issues with performance and resources. Metrics report vastly different information to the cloud and OS based on how allocated resources and metric calculations.
CPU steal is an example of cloud vs OS metrics. When CPU usage and metrics spike in a cloud environment, multiple processes and instances in the cloud may access the CPU as multiple owners. Typically looking at OS metrics, the spikes tend to look like noisy neighbors. The cloud metric data better represents the data as shared resources usage across multiple owners with metrics broken down by owner.
To better define resource usage and alerts, SAM and integrated VMAN display cloud instance metrics throughout all cloud resources in Orion Web Console views, resources, hover-over data, and reports. These metrics include calculated health status, CPU load, and IOPS data. Cloud metrics are also used when applying global cloud thresholds for triggering alerts and status.
When managed as a node, cloud infrastructure monitoring pulls specific OS data for memory and provides additional data through Orion agent, WMI, and SNMP polling methods.
Best practices for cloud monitoring
Migrating applications, resources, and data to the cloud can cause visibility gaps in your environment. Cloud Infrastructure Monitoring integrated with Orion node management, alerting, and reporting gives an extensive view into your hybrid environment including cloud instances and volumes and on-premises systems.
Manage your hybrid environment metrics and status through a single console
Displaying your on-premises, virtual, and cloud systems together helps you compare performance, locate bottlenecks, and better plan capacity and resource allocation.
Monitor applications in the cloud to track end user and business context for performance
Cloud instance and volumes metrics do not give the depth of metrics and events needed to monitor applications hosted on and consuming resources. Monitoring applications with SAM provides visibility into end user experience and business transactions for cloud and on-premise systems.
Dynamically monitor cloud instances to better handle resource churn
Cloud environments can undergo significant churn when provisioning and removing instances and volumes as needed to support expanding environments or performance peaks.
Deploy agents to monitor all cloud instance types
Some cloud instances may not support agentless monitoring. Deploy Orion agents for Windows or Linux to monitor the cloud instance and applications and OS. The Orion Platform also supports WMI and SNMP monitoring.
Monitor cloud resource consumption to determine usage trends and troubleshoot issues
Captured metrics over time provide historical references to not only track trends for resource consumption (such as space and CPU spikes and lulls) but also help determine when those trends become issues. With Cloud Infrastructure Monitoring data, Orion alerts, and the Performance Analysis dashboard, walk back through historical performance to pinpoint when significant usage changes begin to the point of triggering issues.