SharePoint 2013 Disaster Recovery- Creating, Testing and Maintaining the DR Plan- Part 4

By Danielle Arad - December 15, 2013

This article is the fourth and final article that cover creating, testing and maintaining a SharePoint DR plan. These articles are from a chapter extract from the book: Microsoft SharePoint 2013 Disaster Recovery Guide.

 

Testing your Disaster Recovery Plan

After taking the time to create a SharePoint DR plan, the last thing you want to do is file it away (outside of your SharePoint environment) and hope you never have to use it. You need to establish the plan works as expected so if or when the time comes that you need to activate the plan you and the stakeholders have confidence that the plan, if executed properly, will deliver the expected results.

As a best practice you should test your DR plan on an on-going basis. It is recommended that larger organizations test their DR plan at least twice a year. Smaller organizations should test their DR plan annually.

Planning your Test

Testing your SharePoint DR plan will help identify any missing steps, potential problems with existing steps, missing dependencies, and potential bottlenecks. Testing will also help you determine the timings associated with each step of the recovery plan so you know whether the plan will meet your RTO and RPO goals.

It is important to determine when and where your tests will be conducted. You should try to conduct your tests in an environment that resembles your normal production environment so you can get a realistic feel for the plan and how it will work if your production environment went down.

Determine your Test Scopes

In order to test your SharePoint DR plan you need to define the scope of the tests you will be conducting. Begin by identifying the types of outages your SharePoint environment may experience. Some examples of common types of outages include:

  • Configuration Database Corruption
  • Content Database Corruption
  • Server Failure
    • Application Server
    • Database Server
    • Web Front End
  • Virtual Host Failure
  • Datacenter Failure

For each test, the appropriate resources will be needed to help conduct the test and determine if the test was a success. For example if testing the scenario of a failed application server your plan calls for the relevant services that were running on the failed server to be moved to a server that is still up and running in the farm. Once the services have been moved and configured the appropriate resources would validate the services are up and running and the SharePoint farm is behaving as expected.

Performing the Test

Once you have finished planning your test and your test scopes have been defined, you need to perform a full test of your SharePoint DR plan.

Your tests should be conducted in the context of the overall BCP so you get a feel for how the plan fits in and works with your company’s BCP. Involve your key stakeholders from both IT and the business and be sure to include the communications plan.

All tests should be thoroughly documented by creating a checklist to record the following information for each test and each step within a test:

  • Test ID (sequentially numbered, e.g. 001, 002, 003, etc.)
  • Test Name
  • Test Description
  • DR Plan Reference
    • Step ID
    • Step Description
    • Expected Results
    • Actual Results
    • Expected Duration
    • Actual Duration
    • Pass/Fail
    • Comments

Analyzing the results

Once you have completed all of the tests in your test scope, it is time to go back and analyse the test results. In all cases, you will be measuring the results of the test against the defined success criteria including the RTO and RPO goals.

Regardless of whether the test passed or failed, all test results should be well documented and shared with the key stakeholders so everyone has an understanding of what worked and what did not.

Maintaining your Disaster Recovery Plan

Your SharePoint DR plan should be considered a living document, meaning it will continue to evolve over time as your farm continues to grow and new technologies are introduced into the environment.

Over time you may find that budget constraints that at one point limited your RTO and RPO goals may no longer be a concern and your RTO and RPO goals can be adjusted according as new funding is made available for things such as a standby datacenter.

You should schedule periodic reviews of the plan and adjust the plan as necessary. It is important to continue to test your SharePoint DR plan as it continues to evolve over time.

For example, you should plan to review your SharePoint DR plan at least once a quarter. You should review all aspects of the SharePoint DR plan including your recovery targets, RTO, and RPO.

You should plan on performing a full test of your SharePoint DR plan at least once or twice a year depending upon the size of your organization, number of systems to be tested, amount of data to be recovered and the complexity of your SharePoint DR plan.

After each test, you will need to update your SharePoint DR plan according to the results of your test. This will assure your SharePoint DR plan is properly maintained and will be ready in the event of a disaster.

Summary

This article and the previous related ones has defined what it takes to create, test, and maintain a SharePoint DR plan. As the reader can see, it is not a 30-minute exercise where one person creates, test and maintains the DR plan alone, but an on-going activity that has a number of stakeholders and key individuals that will responsible for ensuring your organization is continually working to have the best plan in place in case of a SharePoint disaster.