This article is 2 of 4 articles that cover creating, testing and maintaining a SharePoint DR plan. These articles are from a chapter extract from the book: Microsoft SharePoint 2013 Disaster Recovery Guide.
Identifying Threats to your SharePoint Environment
Now that you have an inventory of each of the components of your SharePoint environment, the next step is to identify threats to the key components that should be included in your SharePoint DR plan. The primary threats that could affect your SharePoint environment and put you in a DR situation are typically related to your physical architecture as opposed to the logical architecture. This section focuses on threats to your physical architecture although you should be aware that threats to your logical architecture such as issues with web application, service applications and apps could affect your SharePoint environment and put you in DR situation.
Any disruption or failure in the physical architecture of your SharePoint environment can cause downtime which would necessitate activating the SharePoint DR plan if the issue could not be resolved through normal troubleshooting. Such a situation could be a natural disaster such as a flood, hurricane or earthquake that would knock your primary datacenter offline.
The following identifies the key components of the physical architecture and the threats to each that should be considered in your SharePoint DR plan.
A SharePoint farm consists of any number of servers from a single server farm to a large scale multi-server enterprise farm. A failure of any of these servers can have a dramatic effect on the farm from degraded performance to complete failure. Typically the biggest failure at the server level is hardware related. Whether it is a bad NIC, a failed drive, a drive that’s run out of space or some other hardware issue, a failure of any one of these key components can cause downtime in your SharePoint farm and should be accounted for in your DR plan.
It is recommended that on-going monitoring and periodic testing of your hardware health can go a long way to prevent the kind of hardware failure that could cause a disaster in your SharePoint farm. Items that should be monitored and tested would include:
- Hard Drives
The heart of any SharePoint implementation is the database. A failure at the database level would have a significant effect on the performance and/or availability of the related SharePoint farm. Failures that typically occur at the database level involve a drive failure (local or SAN), a corrupted database or transaction log file, a full transaction log, a hung database transaction, a hung database lock or the SQL Server service has stopped running.
Database administrators should set up monitors and jobs to identify and eliminate issues that could pose a risk to the health of the database that could cause a disaster in your SharePoint farm. Items that should be monitored would include:
- Drive Space
- Log Size
- Disk I/O
- Database Locks
If your SharePoint servers cannot communicate with each other or cannot be reached by end users this is a certain a recipe for disaster. Preparing for a network failure including a network hardware failure such as switches, routers, load balancers or a network software failure such as DNS or Directory Services such as Active Directory will be a key piece of your SharePoint DR plan.
Setting up monitoring of your network and key components of your network infrastructure can help in identifying potential disasters in your SharePoint environment before they occur. Items that should be monitored should include:
- Network Latency
- Network Speed
- Server Load
For more information please refer to: “Plan for monitoring in SharePoint 2013” http://technet.microsoft.com/en-us/library/jj219701.aspx