Beyond Manual Patching: Automating SQL Server AG Updates with Ansible
Apr 15, 2025
Over the past few weeks in the AutomateSQL Insiders Community, we've been discussing SQL Server patching. More specifically patching Availability Groups in a rolling fashion. This technique keeps your replicas up to date while maintaining minimal downtime.
Automated patching is nothing new. You've probably heard of Microsoft's own Windows Server Update Services tool, Altera, or even set up VMs in Azure, which hosts SQL Server and looked at using Automated Patching or Azure Update Manager. However, none of these are designed to handle your unique needs of performing failovers, failbacks, or rolling updates for Availability Groups.
Let's look at rolling updates, how we might achieve these using Ansible, and what prerequisites may need to be in place before patching.
What are Rolling Updates?
Rolling upgrades or updates to the latest CU version are typically performed to reduce downtime (notice I didn't say "no downtime" or "zero downtime"). Usually, the following steps are performed (after your prechecks to ensure all databases have been backed up, there are no long-running transactions in flight, integrity checks are clean, and you've tested the update thoroughly in a non-prod environment). This is a very high-level overview. We'll get into a specific scenario in a second.
- Users should be notified that maintenance is going to be performed.
- Place any monitoring in maintenance mode to avoid sending alerts to support personnel.
- Validate the state of the Availability Group. You'll want to ensure it's in synch and healthy before making any changes.
- Set synchronous secondary replicas to manual failover (vs automatic).
- Update secondary replicas first. Any replicas that reside in remote locations or other data centers separate from the primary replica should be updated first.
- Set the synchronous secondary replica to automatic failover (if it was set to automatic previously).
- Wait until the synchronization state is SYNCHRONIZED.
- Failover the AG to a secondary replica.
- Update the old primary replica instance.
- Return to the original primary (if required) once the synchronization state is SYNCHRONIZED.
- Validate the state of each SQL Server instance and of the AG.
- Re-enable monitoring
- Notify users that maintenance is complete.
This process is highly dependent on how your specific AG has been configured.
For instance, you may have multiple secondary replicas, some of which may be configured as synchronous with automatic failover or asynchronous with manual failover. The idea is to keep interruptions to a minimum. See here for more details.
Brent Ozar has a fantastic list of recommendations as well (for instance, he suggests not patching all replicas at the same time, allowing for an easy rollback plan if issues arise). Check out the list here. As we continue on in this series, we'll look at methods you can use within Ansible to achieve some of the recommendations he mentions.
Manual vs. Automated:
Performing each step can take an hour or more to perform manually. Not to mention the post-validation of reviewing state, etc. Let's look at the pros and cons of each method.
Manual Patching:
- Greater control for specific scenarios: In highly unique or complex environments, manual patching might offer more granular control over each step, allowing for immediate adjustments based on real-time observations that might not be easily captured in an automated script.
- Familiarity for experienced DBAs: DBAs who have performed manual patching for a long time may feel more comfortable with the process and have a deep understanding of potential issues and troubleshooting steps. This can also be a crutch. If you're the DBA who is always doing the patching, this limits your team's ability to function without you, and you end up having to always be the one doing the patching after hours (I don't know about you, but I'd rather be doing other things than patching).
- No initial setup overhead: Manual patching doesn't require the initial investment of time and effort to set up and configure an automation framework like Ansible, write playbooks, and test them.
- Direct observation and immediate feedback: During a manual upgrade or patch, administrators can directly observe the progress on each server and respond immediately to any issues that arise.
- Understanding the underlying processes: Performing a rolling upgrade manually, as described in this Microsoft article, provides a thorough understanding of the sequence of steps involved in minimizing downtime during updates.
However, it's important to note that while manual patching might offer these perceived benefits, it generally comes with a higher risk of errors, potential for longer downtimes, and less consistency compared to a well-implemented automation strategy, especially in complex environments with multiple AGs.
Automated Patching:
- Ensures smooth and consistent patching: By using automation tools like Ansible and PowerShell, you can establish a repeatable and predictable process for updating your SQL Server AGs and FCIs, as well as stand-alone instances. Add in the Ansible Automation Platform, and you can wrap a web UI around your automation.
- Minimizes downtime: The goal of automating the patching process is to reduce the interruption to your SQL Server services.
- Reduces the risk of errors: Manual processes are more prone to human error. Automation through Ansible allows for predefined steps and validations, lowering the chances of mistakes during the patching process.
- Provides a comprehensive checklist: Build your checklist into an Ansible role to complete pre-validation, installation, and post-validation steps. No more manual review of AG state, SQL Server builds, reboots, etc. and wondering if other DBAs on your team are doing these steps just as you would.
- Incorporates validation and safety checks: Automated playbooks can include tasks to verify the integrity of the patch file using checksums, assert the SQL Server version is valid for patching, and validate the AG health state before and after patching.
- Offers a step-by-step automated process: Ansible playbooks can be written to automate key steps such as identifying the primary replica, patching secondary replicas (including changing the failover mode to manual and back to automatic after patching is complete), rebooting, performing controlled failover, patching the old primary, and optionally failing back to the primary at the end.
- Facilitates the use of specific Ansible and PowerShell modules: Automation leverages specific Ansible modules and PowerShell modules designed for Windows and SQL Server to perform tasks efficiently. This saves a tremendous amount of time by enabling the use of existing code.
- Cleanup tasks: Automation can also handle post-patching cleanup, such as deleting temporary files and the patch exe from managed nodes.
- Enables idempotency: Ansible facts can be used to ensure idempotency, meaning that running the playbook multiple times will result in the same system state without unnecessary changes.
There's a lot more you could build into your Ansible playbook, as it's not just limited to orchestrating the installation. For example, you could build in tasks to perform log backups and system database backups, turn off / on monitoring, and send notifications when the patch starts and ends.
For now, I'm going to focus solely on the patching.
Rolling Updates - The Ansible Way:
There are numerous ways an AG can be configured, so I'll limit our scope to one specific scenario for now (start small with your automation, then expand, is my motto). I'll use this scenario for all examples and recommendations throughout this blog series.
Test Environment:
🧩 General Settings:
- SQL Server Version and Edition: SQL Server 2022 Developer Edition
- Availability Group Name: SANDBOXAG
- Cluster Type: Windows Server Failover Cluster
- Required synchronized secondaries to commit: 0
📚 Databases:
- Only one database (AGTest) is currently part of this availability group.
🖥️ Availability Replicas:
Goal:
To automate the installation of SQL Server cumulative updates on all replicas.
Checklist:
Before you start automating a process, it's paramount that you have an end goal in mind. I like to start with a checklist of everything I want to accomplish. Of course, this is likely to expand as you begin building the automation. We'll assume the Ansible control node has already been created, WinRM or SSH is enabled on the Windows-managed nodes, and you have an account with the required permissions to perform these steps.
Prerequisites and Validation:
- Verify the SHA256 checksum of the patch file: We need to ensure the exe isn't tampered with after downloading it to a file share.
- Ensure the target SQL Server version is compatible for patching: Idempotency is key. So, the first thing I'll do is make sure the target instances I'm patching meet the following criteria.
- We define the build version in a variable - desired_sql_version: "16.0.4185". In this example, I'm applying SQL Server 2022 CU18.
- Does the major version match? 16, in this case, represents the major version.
- Is the target instance at a lower patch version than what is requested? For example, if a target instance has already been patched, then skip it. I'll check to ensure the SQL Server version is less than 4185.
- If both of these checks pass, then we will move on to the next step.
- Identify the primary replica in the target AG: I'm limiting the number of failovers in the process. To do that, we first need to identify which replica is the primary. In Ansible, we can store that value in a fact to be used later and avoid patching the primary until the end.
- Validate the current health state of the Availability Group prior to installing updates: I'm using the Test-SqlAvailabilityGroup to perform this step. We'll cover how to utilize the SqlServer PowerShell module to avoid writing additional code.
- We define the build version in a variable - desired_sql_version: "16.0.4185". In this example, I'm applying SQL Server 2022 CU18.
Update Secondary Replicas:
The following process will be executed on each secondary. By default, Ansible will update each replica in parallel. You may not want that to happen. We'll review how you can change this default behavior in a later post.
- Change the Failover Mode from Automatic to Manual. Do nothing if the secondary is already using the manual failover mode.
- Check for a pending reboot (reboot if needed).
- Install the update.
- Reboot.
- Wait for SQL Server to become available.
- Change the Failover Mode from Manual to Automatic. Do nothing if the failover mode was originally set to manual.
Update Primary Replica:
If specified, the process will update the primary. You may want to hold off on this step for a week or two if that's part of your standard operating procedure now.
- Failover to one of the secondary replicas. The chosen replica must use synchronous commit and be in a healthy state. Both of these are checked prior to choosing the new primary.
- Capture the new primary instance name in an Ansible fact. This will be used to fail back to the original primary.
- Now that the old primary is a secondary, the same process above for updating the secondary is followed.
- Failback if required.
Post Validation
In this phase we'll validate the state of the availability group and the SQL Server version to ensure it matches our desired version.
- Validate the final state of AG.
- Verify versions are now up to date.
- Remove any files copied to the replicas - Such as the update exe and SqlServer PowerShell module (this could be left in place).
Moving Forward:
Throughout this post, we've explored the concept of rolling updates for SQL Server Availability Groups and how Ansible can transform this complex, manual process into a streamlined, repeatable operation. We've outlined the key steps and considerations that should be incorporated into your automation strategy, from pre-installation checks to post-validation tasks.
This approach serves as a foundational framework for your SQL Server patching procedures.
While the actual implementation examples—the Ansible playbooks, roles, and PowerShell commands—will be covered in subsequent posts in this series, understanding the overall process and benefits is the critical first step.
What about you? What would you add to your automation process?
Conclusion:
Automating SQL Server patching with Ansible transforms what was once a tedious, error-prone process into a streamlined, repeatable operation. By implementing the rolling update approach we've outlined, you can minimize downtime, reduce human error, and free up your DBA team for more strategic work.
Remember, start with the simple scenario we've discussed, test thoroughly in your non-production environment, and gradually expand to handle more complex configurations.
The initial time investment in setting up these Ansible playbooks will pay dividends with each patching cycle.
SQL Server environments deserve more than ad-hoc patching processes.
With tools like Ansible at your disposal, there's never been a better time to level up your database maintenance practices and avoid using tools that really weren't built for complex SQL Server environments. Whether you build your own solution or join our community for playbook examples, the most important step is moving from manual processes toward automation.
Get free access to my "SQL Server Automation: Your First Steps with Ansible" Guide
Get started with Ansible using this free guide. You'll discover how simple Ansible is to use, understand core concepts, and create two simple playbook examples.
When you signup, we'll send you periodic emails with additional free content.