Edition: Saturday, February 17th, 2024
I recently had a conversation about the placement of SQL Server native backups. Backups were stored on the same drive as the database log files, and I mentioned that they needed to reside elsewhere.
Today, we will look at what happens when you lose a drive that hosts your transaction log and backup files. One of my main goals is to present topics in a way that you can recreate in your sandbox. For this issue, you'll only need SQL Server installed and a method for taking a disk offline (I'm using an iSCSI disk). I have installed SQL Server 2022 Developer Edition, but it doesn't matter your version. The results will be the same if you lose a drive hosting your transaction log files (disaster).
Knowing where to store your backup files isn't just best practice—it's your first line of defense against data loss. Ignoring this can be as detrimental as neglecting backups altogether.
Here's what I'll cover today:
- How SQL Server utilizes a transaction log file
- Errors encountered when you lose a drive
- How to ensure your backups aren't on the same drive as your data or log files
Table of Contents
How SQL Server utilizes a transaction log file:
SQL Server uses a transaction log file as a critical component of the database architecture to ensure data integrity and support database recovery. Here's a simplified explanation of how it works:
-
Recording Transactions: Every transaction made in a database—such as insert, update, or delete operations—is first recorded in the transaction log. This ensures that all modifications are tracked sequentially.
-
Recovery Mechanism: The log file plays a crucial role in database recovery. In case of a system failure, SQL Server can use the transaction log to replay transactions that were committed but not yet written to the data files, or to undo transactions that were in progress but not completed, ensuring the database is returned to a consistent state.
-
Support for Transactional Integrity: SQL Server uses a write-ahead logging (WAL) strategy, where changes are first written to the log before they are applied to the database. This guarantees that all the steps in a transaction are completed successfully and committed to the database, or none of them are, maintaining the atomicity and integrity of transactions.
-
Log Truncation: To manage the size of the transaction log and ensure it does not grow indefinitely, SQL Server periodically truncates the log (dependent on the database recovery model). This process involves removing inactive portions of the log that are no longer needed for recovery processes, such as after a successful database backup.
-
Support for High Availability and Disaster Recovery: The transaction log also supports various high availability and disaster recovery solutions, such as log shipping, database mirroring, and Always On Availability Groups. These technologies rely on the transaction log to synchronize data across databases or instances, ensuring data is up-to-date and consistent across environments.
-
Point-in-Time Recovery: With the help of the transaction log, SQL Server can perform point-in-time recoveries, allowing a database to be restored to a specific moment. This is particularly useful in scenarios where you need to recover data just before an erroneous operation was performed.
To dive deep into transaction log management, check out Tony Davis's "Stairway to Transaction Log Management In SQL Server" series on SQLServerCentral.
Errors encountered when you lose a drive hosting a transaction log file
SQL Server will begin displaying, or writing to the error log, the following event IDs when the disk hosting a transaction log file has become unavailable.
- Event ID 9001 - Indicates a serious failure where the database log file becomes unavailable. This situation requires either restarting the database or restoring from a backup to resolve. The cause could range from hardware failures, a full transaction log, to issues with file access due to encryption errors. Resolving preceding errors and possibly restarting the SQL Server instance are recommended steps for recovery. However, in the case of a disk being lost, restarting SQL Server won't help. Worse, if your backups are stored on the same drive, you've lost your ability to restore the database to a different one.
- Event ID 3314 - Indicates an error during the undoing of a logged operation, which may place the database into a SUSPECT state due to failure in rolling back uncommitted transactions. This error suggests potential damage to the transaction log file, the primary filegroup, or other filegroups, making the database unavailable. For resolution, reviewing the Windows System Event log and SQL Server error log preceding errors is recommended to identify and address the underlying cause.
When you encounter these errors, you may also see databases in a Recovery Pending state.
How to ensure your backups aren't on the same drive as your data or log files
The situation described above is not something you want to see. It's only exacerbated when you realize the disk lost also stored your only copy of backups. Fortunately, there's an easy method for checking for this condition. I prefer to use sp_Blitz, part of the SQL Server First Responders Kit. However, some environments may not allow for installing the kit. If you can, check for ID 93. If you can't, a query similar to the one below can be used.
SELECT
bs.database_name,
bmf.physical_device_name AS BackupLocation,
mf.physical_name AS DatabaseFileLocation
FROM
msdb.dbo.backupset bs
JOIN
msdb.dbo.backupmediafamily bmf ON bs.media_set_id = bmf.media_set_id
JOIN
master.sys.master_files mf ON bs.database_name = DB_NAME(mf.database_id)
WHERE
bs.backup_finish_date > DATEADD(day, -7, GETDATE()) -- Checks backups from the last 7 days
AND LEFT(bmf.physical_device_name, 1) = LEFT(mf.physical_name, 1) -- Compares if backups are on the same drive as data or log files
If you find backups are being written to the same drive, I'd recommend moving these to another location. Preferably to a remote share or, at the very least, to a separate drive.
Conclusion:
Losing a critical drive is rough. Losing a disk and your backups at the same time is even worse. Make sure you're not in this situation; if you are, take steps to remedy it as soon as possible.
If you haven't done so, I invite you to join the weekly DBA Challenges newsletter - DBA Challenges - Newsletter (automatesql.com)
That's it for this week. Have a great week!