Moving the ConfigMgr site database to an Always On Availability Group

This post has been republished via RSS; it originally appeared at: Device Management in Microsoft articles.

First published on TECHNET on Jun 10, 2017
In a previous post we started talking about how we're moving our workloads into Azure, this post continues that conversation. One important design decision is (or should be) High Availability. We've been running our SQL servers in failover clusters and using a SAN for storage, but this isn't possible in Azure. It is possible, however, to use Always On Availability Groups (I’ll refer to this as an “Availability Group” or an “AG” from now on) in Azure.

This post will explain how to efficiently move the CM DB to an Availability Group. This process will be the same whether the AG is in Azure or not. However, if you’ve got a small database this method could be overkill and perhaps not worth the effort (depends on how "small" it is), but if you’ve got a large database this should be quite helpful.

Prerequisites


Naturally, the SQL servers should be created and configured already, including the permissions. I highly recommend creating the AG ahead of time with a dummy database so you can ensure everything in the AG is working and setup correctly without having to do that troubleshooting when trying to move your production database. Thus, I’m going to assume that the AG has already been created. As for permissions, because the real work is performed by site recovery (a database move) check that the proper permissions are in place – just like you’d do for any recovery.

The last item before we get into the steps to move your CM database into an AG, is to make sure you have the server level settings (for SQL) properly configured for CM. Specifically, the server must allow CLRs and use a defined replication text size.* To do that run the following on each of the AG nodes.
USE [master];
GO
EXECUTE sp_configure 'show advanced options', 1;
RECONFIGURE WITH OVERRIDE;
GO
EXECUTE sp_configure 'clr enabled', 1;
RECONFIGURE WITH OVERRIDE;
GO
EXECUTE sp_configure 'max text repl size (B)', 2147483647;
RECONFIGURE WITH OVERRIDE;
GO

*If I get my way CM will also accept a value of “-1” for the “max text repl size (B)” since that’s much easier to tell people to use and is how to tell SQL to use the max size of each data type.

When I said we’d move the CM database into an AG “efficiently” what I meant was “with as little downtime as possible”. That means we’re going to do as much as possible before even starting our downtime – this could be days before but I don’t recommend waiting more than a couple days.

Pre-Downtime Activities


The first change we need to make on the CM database is to put it into the FULL recovery model. This is a requirement for being in an Availability Group. Something not for this blog post but worthy of mentioning is the need for SQL log backups for a database in the Full recovery model. If you’re going to use an Availability Group then you need to brush up on SQL backups (full and log backups specifically) and make sure to have a plan in place.
ALTER DATABASE [CM_xxx] SET RECOVERY FULL;

Because we’ve changed the recovery model we need to take a new backup (even if you had one right before updating to full recovery model. So, take a Full backup of the CM database at this point. If you have a job that does this you can kick this job off or run the backup manually (see sample code below). However, if you do have a backup job you’ll want to disable it. After we take this backup we don’t want any new full backups taken (at least until we’ve fully moved the CM database to the AG).
BACKUP DATABASE [CM_xxx] TO DISK = N'E:\**YOUR DESIRED BACKUP LOCATION**\CM_xxx_Full.bak'
WITH COMPRESSION, STATS = 1;

Now that we’ve got a new full backup we need to backup the log as well. Take a log backup of the CM database using the wizard, a job, or use something like this:
BACKUP LOG [CM_xxx] TO DISK = N'E:\**YOUR DESIRED BACKUP LOCATION**\CM_xxx_Log1.trn'
WITH COMPRESSION, STATS = 1;

In the next step you’ll be restoring the database using the backups which were just taken. The restore can be performed locally or from a network location. If you’re going to do it from local files, copy the backup files to the nodes. If you’re going to restore the database over the network (using a UNC path rather than a local path) you can skip this copy step.

Restore these backups to both of the AG nodes. In this step you have to restore the full backup first and then the log. And this is very important, you must use the NORECOVERY option! If you don’t use the NORECOVERY option in the restore then all of this is for naught. So make sure this is not forgotten. Trust me…I forgot one time and lost 6 hours of preparation.
RESTORE DATABASE [CM_xxx] FROM DISK = N'**THE LOCATION OF THE BACKUP FILES**\CM_xxx_Full.bak'
WITH NORECOVERY, STATS = 1;
RESTORE LOG [CM_xxx] FROM DISK = N'**THE LOCATION OF THE BACKUP FILES**\CM_xxx_Log1.trn'
WITH NORECOVERY, STATS = 1;

Now, and this is also very important, you must continue to take log backups until you’re ready to make the official move to the AG. So, if you’re going to wait a day or two it would be best to have a job scheduled to take a log backup every couple of hours unless you want to remember to do this yourself.
This is important to do for several reasons. The biggest reason being, if you don’t the log will continue to grow and perhaps fill up your disk, which means SQL stops working (for this DB at least). Another reason is because taking them more often will create smaller log backup files to copy and/or restore.

You will need to name each log backup something different than what was previously used. You’ll notice that the example log backup file has a “1” appended to the filename. That was intentional so that as you take additional backups you increment that number (or do something else to make the name unique).

These log backups will need to be restored to each of the AG nodes just like the previous backups were. So, you can either copy the backups locally or perform the restore over the network just as previously done (again, making sure to use the NORECOVERY option). The restores can either wait until right before taking the downtime for the move or can be done throughout the time so there is less to do when it comes time to take the downtime. Oh, and if it isn’t clear, the restores will need to be performed in ‘oldest to newest’ order.

Downtime Activities


If you’re familiar with SQL then you’re realizing that we’ve essentially just created our own log shipping routine for our CM database – from the production server to our new AG nodes. Now that we’ve got our ‘log shipping’ activities happening and have waited until our downtime window we’re ready to begin the real move. If you have a job running to perform the log backups, it’s time to disable that job. That’s because we want to make sure to control exactly when the last log backup is taken.
Turn off the SMS services and wait 10-15 minutes. We’ll let things stop processing and wind down before calling things ready.

Once you’re comfortable with where things are at, take one last log backup and restore it to both nodes (just like previously done). At this point both AG nodes will have the latest data from the CM database. Both nodes should show the database in the “RESTORING” state. If not…you’re not ready and should turn the services back on and start over…and don’t forget to use NORECOVERY in the restore statements next time :). You can check the status in Object Explorer or with the following query.
SELECT name
,state_desc
FROM sys.databases
WHERE name LIKE N'CM[_]___';

If things look correct at this point, then you’re ready to make one of the nodes the primary server for the CM database. To do this, on one node AND ONE NODE ONLY, run the following:
RESTORE DATABASE [CM_xxx] WITH RECOVERY;

Now this database should be in the “ONLINE” state and you can join the database to the Availability Group which you’ve already created (with the dummy database). Use the “Add Database…” wizard under “AlwaysOn High Availability --> Availability Groups --> [Your AG Name] --> Availability Databases”. In the screenshot below showing the location of this wizard you can see the dummy database I used to create the AG is named “CM_AAG” (for Azure Availability Group). I’ll be adding the CM database “CM_EA1” (for future screenshot reference).



When going through the “Add Database” wizard it is very important to choose “Join only” on the “Select Initial Data Synchronization” screen (should be the third screen to show up). If this option isn’t chosen then all the work we’ve done so far to minimize downtime is useless since SQL will have to take a backup and restore it to the other node. Since we’ve already done this via our ‘log shipping’, SQL does not need to perform a new backup and restore (a time consuming activity for larger dbs).
Finish going through the wizard and at the end your “Results” screen should look something like this:



At this point the database is part of the Availability Group on the new servers, and almost ready for the site recovery activity (database move). We’re not completely ready to perform this action because CM requires some database settings to be set for CM to work – and these settings aren’t kept in the backup/restore actions.

To ensure the required settings are set on the database, connect to the primary node and run the following SQL statements. We need to update these database settings on each database in the Availability Group because these settings aren’t replicated by SQL. Therefore, you’ll need to failover to each secondary (using SQL Server Management Studio, NOT the Failover Cluster Manager!) and run the same statements while the node is the “Primary” node. After each database has had these statements run against it while it was the “Primary” in the AG you can failover to whatever your preferred node is.
USE [master];
GO
ALTER DATABASE [CM_xxx] SET TRUSTWORTHY ON;
ALTER DATABASE [CM_xxx] SET HONOR_BROKER_PRIORITY ON;
ALTER DATABASE [CM_xxx] SET ENABLE_BROKER WITH ROLLBACK IMMEDIATE;
GO

The last step is to perform the database move via a Configuration Manager Recovery. You’ll specify the listener name for the SQL instance. ConfigMgr will take care of the rest and since we’ve configured all the database and server settings the site recovery process will not try to do something it can’t do to a database in an Availability Group – namely, put it into single user mode to change the configurations.

There you have it, nice and easy right?

One thing to note about running CM in an Availability Group: at the time of writing this blog post the AG must be in the “manual failover” mode when performing a CM upgrade, however, before and after you can (and should) run the AG in “automatic failover” mode.

Checklists


Looking for an easy checklist for your own activities? Well, look no further!
Pre-Downtime Checklist

  • Run the following on each node that will be in the Availability Group


USE [master];
GO
EXECUTE sp_configure 'show advanced options', 1;
RECONFIGURE WITH OVERRIDE;
GO
EXECUTE sp_configure 'clr enabled', 1;
RECONFIGURE WITH OVERRIDE;
GO
EXECUTE sp_configure 'max text repl size (B)', 2147483647;
RECONFIGURE WITH OVERRIDE;
GO


  • Set permissions as in any site recovery process

  • Create the Availability Group using a “dummy” database

  • Optional: On the current CM DB server install SQL backup jobs - full and log backup job

  • Change the CM DB to FULL recovery model

  • Take a Full database backup (then disable this job for now)

  • Take a Log backup (and ensure the job is running every couple of hours)

  • Restore the first full and log backups to both of the nodes in the Availability Group (“WITH NORECOVERY”!)

  • Continue to backup/restore logs until ready for downtime


Downtime Checklist

  • Disable the log backup job

  • Turn off SMS services

  • Wait 10-15 minutes

  • Take one last log backup and restore this on both the nodes as previously done

  • Check that the CM database is in a status of "Recovering" on both nodes

  • On one node (and one node only!) run the following statement


RESTORE DATABASE [CM_xxx] WITH RECOVERY;


  • Add the database to the Availability Group via the Availability Group “Add Database” wizard (JOIN ONLY!)

  • Run the following script on the primary node


USE [master];
GO
ALTER DATABASE [CM_xxx] SET TRUSTWORTHY ON;
ALTER DATABASE [CM_xxx] SET HONOR_BROKER_PRIORITY ON;
ALTER DATABASE [CM_xxx] SET ENABLE_BROKER WITH ROLLBACK IMMEDIATE;
GO


  • Failover to the other node (via SSMS) and run the previous script again

  • Perform a DB Move via Configuration Manager Recovery

  • Make sure you have jobs or at least a plan for SQL backups including log backups!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.