References > Template Reference > Microsoft Windows Server > Microsoft Windows Server 2012 Failover Cluster

Microsoft Windows Server 2012 Failover Cluster

This template assesses the status and overall performance of a Microsoft Windows 2012 Failover Cluster by retrieving information from performance counters and the Windows System Event Log. For more information, refer to the following Microsoft article: http://technet.microsoft.com/en-us/library/cc720058%28WS.10%29.aspx.

Prerequisites

WMI access to the target server.

Credentials

Windows Administrator on the target server.

All Windows Event Log monitors should return zero values. Returned values other than zero indicates an abnormality. Examining the Windows system log files should provide information pertaining to the issue.. Detailed information about these events can be found here: .http://technet.microsoft.com/en-us/library/dd353290(WS.10).aspx.

Monitored Components

For details on monitors, see SAM Component Monitor Types.

You need to set thresholds for counters according to your environment. It is recommended to monitor counters for some period of time to understand potential value ranges and then set the thresholds accordingly.

Service: Windows Time

This monitor returns the CPU and memory usage of the Windows Time service. This service maintains date and time synchronization on all clients and servers in the network. If this service is stopped, date and time synchronization will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.

Service: Cluster Service

This monitor returns the CPU and memory usage of the Cluster service. This service enables servers to work together as a cluster to keep server-based applications highly available, regardless of individual component failures. If this service is stopped, clustering will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.

Network Reconnections: Reconnect Count

This monitor returns the number of times the nodes have reconnected.

The instance field is installation-specific. You need to specify the hostname of your cluster node (for example: node1). By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.

Network Reconnections: Normal Message Queue Length

This monitor returns the number of normal messages that are in the queue waiting to be sent. Normally this number is 0, but if the TCP connection breaks, you might observe it is going up until the TCP connection is reestablished and we can send all of them through.

The instance field is installation-specific. You need to specify the hostname of your cluster node (for example: node1). By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.

Network Reconnections: Urgent Message Queue Length

This monitor returns the number of urgent messages that are in the queue waiting to be sent. Normally this number is 0, but if the TCP connection breaks, you might observe it going up until the TCP connection is re-established, thereby allowing all messages to be sent.

The instance field is installation-specific. You need to specify the hostname of your cluster node (for example: node1). By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.

Messages Outstanding

This monitor returns the number of cluster MRR outstanding messages. The returned value should be near zero.

Resource Control Manager: Groups Online

This monitor returns the number of online cluster resource groups on this node. The returned value should be above zero at all times.

Resource Control Manager: RHS Processes

This monitor returns the number of running resource host subsystem processes (rhs.exe). The returned value should be above zero at all times.

Resource Control Manager: RHS Restarts

This monitor returns the number of resource host subsystem process (rhs.exe) restarts.

By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.

Resources: Resource Failure

This monitor returns the number of resource failures. The returned value should be as low as possible.

Resources: Resource Failure Access Violation

This monitor returns the number of resource failures caused by access violation. The returned value should be as low as possible.

By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.

Resources: Resource Failure Deadlock

This monitor returns the number of resource failures caused by deadlock. Deadlocks are usually caused by the resource taking too long to execute certain operations. The returned value should be as low as possible.

By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.