Alerts and events for cloud monitoring
Alerts trigger according to event data captured through the Orion Platform, trigger conditions, and threshold settings to notify you when an issue or error occurs in the monitored environment. SAM and integrated VMAN provide cloud infrastructure monitoring alerts for polling issues, throttling, and exceeded polling requests.
Events completed by Orion user accounts for AWS cloud instances track in Alerts & Activity > Events. These events include triggered alerts and management actions:
- Stop: ends all actions and access to the instance until restarted. The cloud service erases all data. No polling occurs while stopped. If managed as a node, SAM and integrated VMAN licenses remain consumed for the instance node.
- Reboot: restarts the cloud instance and preserves data.
- Terminate: removes the instance permanently from cloud services. If managed as a node, select the option to remove the node from Orion. Cloud services delete all stored instance data in their systems. The instance no longer displays in available instances for an account through the Orion Web Console.
- Unmanage/Manage: toggles between managing as a node. When unmanaged, the instance is no longer a managed node and releases consumed licenses. Unmanage when you need to perform maintenance for the instance. This does not affect data through the cloud service. When you manage, you add the instance as a node, consuming product licenses and activating a polling method.
- Poll Now: initiates an immediate poll for data to CloudWatch APIs.
If you complete actions through the AWS Console, SAM and integrated VMAN do not record those actions in auditing actions or list them in the Events interface.
To review the available cloud alerts, select Alerts & Activity > Alerts and click Manage Alerts. Enter cloud in the search field for a list, including the following alerts.
|Cloud instance is in a warning or critical state||A cloud instance encounters polling or access issues triggering a warning or critical state. The alert triggers based on global cloud thresholds.|
Alert me when AWS throttling is applied for cloud account
|The alert aggregates throttling applied issues for any instances or volumes into a single alert. The alert checks every minute for any throttled instances or volumes. The email notification indicates the number of affected instances and volumes. Affected instances and volumes display in an Unknown - AWS Throttling Applied state.|
|Alert me when AWS throttling is applied for cloud instance||Disabled by default, this alert checks every minute if throttling is applied to cloud instances. Conditions check for the instance status of Unknown and AWS Throttling applied through EC2 API calls.|
|Alert me when AWS throttling is applied for cloud volume||Disabled by default, this alert checks every minute if throttling is applied to cloud attached volumes. Conditions check for the attached volume status of Unknown and AWS Throttling applied through EC2 API calls.|
|AWS CloudWatch polling limit threshold exceeded||AWS CloudWatch provides a 1 million free polling requests limit per month for all API metric polling. If the polling limit threshold is exceeded, this alert triggers. Amazon Web Services does not halt polling or CloudWatch metrics. They charge an additional cost for the month based on the exceeded polls for the remaining time frame.|
As alerts trigger, they display in through the Alert Manager and resources in the Cloud Summary page and Cloud Instance Details page displays. For detailed information on events that trigger alerts, review cloud events through event and cloud event resources on those same pages.
Events display with warning and critical indicators based on errors and exceeded thresholds. If a number of events trigger for an instance or volume, SAM and integrated VMAN aggregate the events into a single event without overwhelming the event list and essential monitoring. The following example displays an aggregated critical event:
For more information on troubleshooting issues and alerts with cloud infrastructure monitoring, see Troubleshoot Cloud Infrastructure Monitoring .
Create and modify an alert copy
Orion cloud infrastructure monitoring does not import alerts or alarms from your cloud service. Through SAM and integrated VMAN, you can modify the existing alerts or create new alerts with custom triggers, conditions, and actions.
To quickly create new alerts, duplicate an existing cloud alert and customize the copy. You cannot edit existing conditions and actions for out-of-the-box alerts. For these alerts, you can enable, disable, and add triggers and actions. Use out-of-the-box alerts as an example for defining triggers and actions.
- Click Alerts & Activity > Alerts, click Manage Alerts.
- In the search field, enter Cloud.
Select an alert and click Duplicate & Edit.
A duplicate alert creates and opens with the copied configurations to modify. Change the name of the alert and add a description to better detail the intent and troubleshooting when triggered.
For example, duplicate the alert for a cloud instance in a warning or critical state to add conditions for specific polled metrics and actions to stop the instance and send notifications.
- Follow the alert wizard prompts to set conditions and actions.
- Review and save the alert when complete.
For detailed information on creating and editing conditions and actions, see Create new alerts to monitor your environment.
Conditions: set the triggers for alerts. Create as many conditions as needed for multiple scenarios where one or all conditions are met, including custom properties.
Example: Trigger alert when CPU load spikes above 90% for over 5 minutes.
Reset conditions: configure the event that resets the alert.
Example: If an alert triggers when the power state is off, set it to reset when the cloud power state is on.
Actions: set the actions and escalation steps completed by the Orion Web Console when an alert triggers. Create as many actions and escalations as needed.
Example: Send an email notification every 10 minutes until the alert is acknowledged. If the alert is not acknowledged within 10 minutes, send an escalation email to management. Use a management action as needed such as stop or reboot.
Reset Actions: configure the actions completed when the alert is reset.
Example: Write an event and data to the log when the alert actions complete.
- Alert Integration: triggers the alert to other SolarWinds products integrated with the Orion Platform including ServiceNow Integration and WebHelpDesk.