WFC (Windows Failover Clustering or FCM Failover Cluster Manager) is used to configure the failover cluster for a clustered SQL Server database or SQL Server Always On use. If you need High availability with SQL Server then you will need a cluster. If you are a DBA, hopefully the cluster is created for you by the Sys Admin. If you need to create the cluster there are many blogs on how to create the cluster. Here is a good starting link on how to create a cluster:Microsoft page on Creating Clusters
What do you do when the cluster fails
What do you do when the cluster fails and you (the DBA) need to troubleshoot the issue.
- Open Windows Failover Cluster and get the node names and server names.
- Go to PowerShell and create the cluster log on each node (Server). This log needs to be manually created. The command is “Get-ClusterLog -Node NodeName”
- Review the created log (C:\Windows\Cluster\Reports\Cluster.log) for information on the time of failover and the reason why the cluster failed over. The log will be on the actual server for the node you ran the log for.
- Go to Failover Cluster Manager and review the Cluster Events – Same on all nodes
- Go to Administrative Tools\Event Viewer for each node and review the events for the time period of the cluster failover in these files:
- Windows Logs\Application
- Windows Logs\System
- Applications and Services Logs\Microsoft\Hyper-V-High-Availability\Admin
- Applications and Services Logs\Microsoft\FailoverClustering-Manager
- Applications and Services Logs\Microsoft\FailoverClustering\Operational
- optional: Create an excel spreadsheet with each meaningful event from the log files. This helps me to see the events and order them. I also like to send this to management as documentation.
- Cluster Node
- Cluster Server
- Where found (example: Event Viewer\Hyper-V-High-Availablity\Admin)
- level (Error,Information, Warning, or Critical)
- event source
- Task Category
- Go to (C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Policies\DatabaseEngine\1033\Windows Event Log Cluster Disk Resource Corruption Error.XML) on each node and review.
Gather all your data, review, and send an email out with the information on why the cluster failed. If you find the error is Always-On or SQL Server related, more research is needed into the SQL Server logs.