SQL Server Recovery Point Objectives and DBA Tools

dba challenges newsletter Jan 20, 2024
DBA Challenges

Edition: Saturday, January 20th, 2024

Welcome to this week's edition (actually, this is the very first edition).  We'll break down the most popular posts ranked by two categories: Most Comments and Most Impressions.

Most Comments - 39

Day 2

Let's take a look at the first challenge.

Challenge: Define a solution, or set of solutions, using features available in SQL Server to meet the RTO (recovery time objective) of 15 minutes if the following events occur.

  • User error (i.e., someone ran an update but forgot to provide a where clause).
  • Drive hosting the database data files becomes unavailable.
  • The database becomes corrupted and is unavailable.
  • Server hosting the SQL Server instance crashes (motherboard is toast).

How can your solution be tested to ensure the RTO can be met

Bonus:  What edition(s) of SQL Server are the proposed features available in?

Solution: Many of you provided excellent solutions to each scenario.  Here's how I'd tackle these.  

User Error

 

Sometimes, this is unavoidable.  People will make mistakes, and it will happen unless you are not allowing anyone to make data changes directly to the database.  I've done it before.  It's an uneasy feeling hitting F5, in SSMS, and almost immediately knowing you messed up.  In this situation, I'd take the following steps:

  1. Let the business stakeholders know about the mistake.  They need to know data isn't correct now, but you're working on a plan to fix it.  There's no need to try to hide it.
  2. Verify the database is using the full recovery model.  If it's not, you'll need to rely on the latest differential or full backup likely not meeting the RTO.  As a DBA, make sure all databases requiring point in time recovery are using the full recovery model and the proper backups are being taken to support your organizations RTO.
  3. Restore the database as a different name, up to the point in time I made the mistake.  This would require restoring the latest full backup, without recovery, the latest differential and all transaction log backups up to just before the error occurred.
  4. Compare the data between the two tables (original and restored copy) and write an update script to revert the changes.

Loss of a drive, server hosting the SQL Server instance crashes, or the database becomes corrupted.

I'd suggest using an Always On Availability Group here.  Here's why.

Disks:

A Failover Clustered Instance would typically utilize shared disks, which is one of the limitations of FCIs.  Now, this should be a rare scenario given that most production servers should be using storage that provides redundant disks, allowing for hot-swapping failed disks out for new ones without any impact on the server.

Be sure to setup alerts MSSQLSERVER errors 823, 824, and 825.  These can provide as an early warning indicator for impending hardware failures.

Server crash and unavailable:

 

An FCI would have worked to recover from a failed server.  However, adding in the loss of a drive and database corruption requirements could rule this option out. 

Some may say you can use a file share, storage spaces direct, or a similar method that doesn't require shared storage to host your database files. 

However, I don't have enough experience with those solutions to suggest that.

Database corruption:

 

Database corruption on an FCI (using shared storage or not) would require a fast restore to get up and running within 15 minutes.  Depending on the database size and storage performance, this could be doable.  An Availability Group would be ideal here. 

Check out the topic "Automatic Page Repair (Availability Groups: Database Mirroring).

FCIs are available in the Standard edition, whereas Availability Groups are only available in the Enterprise edition.  Basic AGs in SQL Server 2016 and later are available in the standard edition.  Check the list of limitations here.

Before we move on to the most popular post by impressions, take a look at a few of these links that provide additional information and how to recover if you ever find yourself in these scenarios.

Recovery Models

Restore

Always On Failover Cluster Instances (SQL Server)

What is an Always On availability group?

Most Impressions: 16,032

Day 6

 

Challenge:  DBAs use many tools and scripts to do their job effectively.  What are some of your favorite tools?  Why?

There were a lot of great suggestions here and several that I use personally.  Here are just a few of the top tools.  

  1. DBATOOLS - Learning Powershell has been almost as crucial as learning T-SQL.  These tools provide a fantastic way to perform migrations, maintenance, patching etc.  Too much to list here.  If you're a DBA, I'd highly recommend becoming familiar with this toolset.
  2. Plan Explorer - This tool makes it much easier to review query execution plans vs SQL Server Management Studio.
  3. sp_whoisactive
  4. Brent Ozar's First Responders Kit
  5. Ola's SQL Server Maintenance Solution

 

 

That's it for this week.  Thanks a lot for reading!

 

Whenever you're ready, there is one way I can help you gain hands-on experience:Ā 

Automated Sandbox Fundamentals: I teach how to build a virtual lab using automation in this course. Learn how toĀ create golden images, using both Windows and Linux, to easily spin up and add additional machines to your sandbox.Ā  It's packed with 8 modules and the scripts you'll need to build your environment.Ā  Start small, andĀ scale as needed by easily changing the configuration file included with the course.

Give Me the Details