Supporting and maintaining a very large database can come with it’s challenges. One of them being the backup and restore strategy. The backup sizes are huge and the duration of the full backup is very long, often too long. A common approach to improving that is to split the database into multiple files and filegroups and setting the filegroups that have old or archived data to read-only, since the data is not changing anyway. Although this adds some complexity to the backup and restore process, it reduces the ongoing backup duration and sizes.
A database with this design will require one-off or very occasional filegroup backup for each of the read-only filegroups. The data isn’t changing for these read-only filegroups, so we don’t need to back it up regularly. Then the remainder of the database – the primary filegroup and all read-write filegroups – can be backed up by a regular partial backup. A partial backup by design, backs up the data in the primary filegroup and every read-write filegroup. If the database has no read-only filegroups, then it’ll just be a regular full backup.
Let me walk you through a demo of how to backup and then restore using a partial backups and filegroup backups as your backup / restore strategy.
A partial backup can be used on databases in all recovery models, including simple. For this demo though, my database will be in full recovery model and will include transaction log backups.
First, I create the database. The database I will be using is one I’ve called PartialBackupsDemo. It has a primary filegroup, a filegroup called PartialBackupsDemo_ReadWrite and a filegroup called PartialBackupsDemo_ReadOnly, which will eventually be set to read-only…
USE [master]; CREATE DATABASE PartialBackupsDemo ON PRIMARY (NAME = N'PartialBackupsDemo', FILENAME = N'F:\MSSQL\Test\PartialBackupsDemo.mdf', SIZE = 50MB , FILEGROWTH = 50MB), FILEGROUP ReadWriteFilegroup (NAME = N'PartialBackupsDemo_ReadWrite', FILENAME = N'F:\MSSQL\Test\PartialBackupsDemo_ReadWrite.ndf', SIZE = 50MB , FILEGROWTH = 50MB), FILEGROUP ReadOnlyFilegroup (NAME = N'PartialBackupsDemo_ReadOnly', FILENAME = N'F:\MSSQL\Test\PartialBackupsDemo_ReadOnly.ndf', SIZE = 50MB, FILEGROWTH = 50MB) LOG ON (NAME = N'PartialBackupsDemo_log', FILENAME = N'F:\MSSQL\Test\PartialBackupsDemo_log.ldf', SIZE = 50MB, FILEGROWTH = 50MB); ALTER DATABASE PartialBackupsDemo SET RECOVERY FULL;
Next, I will continue to prepare the database by creating three tables: 1. dbo.BigTable: this will hold most of the records and we want to archive some of it’s old data. 2. dbo.BigTableArchive: this table will hold the old data moved from dbo.BigTable. 3. dbo.OtherStuff: this table holds other data that is still critical to the database, but isn’t big – it’ll be stored on the primary filegroup…
USE PartialBackupsDemo; GO -- This is the big table which we want to archive old records CREATE TABLE dbo.BigTable ( ID INT NOT NULL, [Value] NVARCHAR(100) NOT NULL ) ON ReadWriteFilegroup; -- This is the archive table that stores the old records from dbo.BigTable CREATE TABLE dbo.BigTableArchive ( ID INT NOT NULL, [Value] NVARCHAR(100) NOT NULL ) ON ReadOnlyFilegroup; -- This is the table that stores other data we may need, like reference data for example. CREATE TABLE dbo.OtherStuff ( ID INT NOT NULL, [Value] NVARCHAR(100) NOT NULL ) ON [PRIMARY];
We will load some initial data. Then we will simulate an archiving process by copying some data into the dbo.BigTableArchive table, deleting from dbo.BigTable, then setting the ReadOnlyFilegroup to read-only. Setting it to read-only requires an exclusive lock, so note the “Rollback immediate” arguments added in. Be careful doing that in production, users may be unhappy! I’ve also added in some queries to show what has happened…
-- Insert initial data INSERT INTO dbo.BigTable VALUES (1, N'Initial load'), (2, N'Initial load'), (3, N'Initial load'); INSERT INTO dbo.OtherStuff VALUES (1, N'Initial load'); -- Now archive some old records in a typical insert/select into the archive table, then delete from the main table INSERT INTO dbo.BigTableArchive SELECT ID, [Value] FROM dbo.BigTable WHERE ID <= 2; DELETE FROM dbo.BigTable WHERE ID <= 2; SELECT * FROM dbo.BigTable; SELECT * FROM dbo.BigTableArchive; SELECT * FROM dbo.OtherStuff; -- Now set the filegroup where the archive table is on to read only. Needs an exclusive lock to do so ALTER DATABASE PartialBackupsDemo SET SINGLE_USER WITH ROLLBACK IMMEDIATE; ALTER DATABASE PartialBackupsDemo MODIFY FILEGROUP ReadOnlyFilegroup READONLY; ALTER DATABASE PartialBackupsDemo SET MULTI_USER; SELECT [name], is_read_only FROM sys.filegroups;
Which results in…
Now the database is setup how we need it. We can insert more data, and take a partial full backup. Note the READ_WRITE_FILEGROUPS argument in the partial full backup command. This is key. This tells SQL to backup all read-write filegroups. If that isn’t added, it’ll just be a regular full backup instead…
USE PartialBackupsDemo; -- Insert data before partial full backup INSERT INTO dbo.BigTable VALUES (4, N'Insert before partial full backup'); INSERT INTO dbo.OtherStuff VALUES (2, N'Insert before partial full backup'); SELECT * FROM dbo.BigTable ORDER BY ID; SELECT * FROM dbo.BigTableArchive ORDER BY ID; SELECT * FROM dbo.OtherStuff ORDER BY ID; -- Take a partial full backup, which will backup the dbo.BigTable and dbo.OtherStuff tables BACKUP DATABASE PartialBackupsDemo READ_WRITE_FILEGROUPS TO DISK = N'F:\MSSQL\Test\PARTIAL_FULL_PartialBackupsDemo.bak' WITH INIT, FORMAT;
In the Messages tab, it shows it’s backing up the primary, read-write filegroup and the log file. Not the read-only file.
So no we will take a filegroup backup of the read-only filegroup. See that it’s only taking a backup of the read-only filegroup, plus the log file.
-- Take a filegroup backup of the ReadOnlyFilegroup filegroup BACKUP DATABASE PartialBackupsDemo FILEGROUP = 'ReadOnlyFilegroup' TO DISK = 'F:\MSSQL\Test\FILEGROUP_PartialBackupsDemo.bak' WITH INIT, FORMAT;
Now that the partial full backup and filegroup backup is out of the way, we have the whole database backed up. But of course a database doesn’t stay one the same state for long, more data will be added to the read-write filegroups. So we can take a partial differential backup. A partial differential is the same as a regular differential backup, but only for the primary and read-write filegroups. The key parts to note here are again the READ_WRITE_FILEGROUPS and the DIFFERENTIAL arguments. Lets run that now:
-- Insert data before differential backup INSERT INTO dbo.BigTable VALUES (5, N'Insert before partial diff backup'); INSERT INTO dbo.OtherStuff VALUES (3, N'Insert before partial diff backup'); SELECT * FROM dbo.BigTable ORDER BY ID SELECT * FROM dbo.BigTableArchive ORDER BY ID SELECT * FROM dbo.OtherStuff ORDER BY ID -- Take a partial differential backup BACKUP DATABASE PartialBackupsDemo READ_WRITE_FILEGROUPS TO DISK = N'F:\MSSQL\Test\PARTIAL_DIFF_PartialBackupsDemo.bak' WITH DIFFERENTIAL, INIT, FORMAT;
Again, in the Messages tab, it shows it’s backing up the primary, read-write filegroup and the log file. Not the read-only file.
For these very large databases you may still want to have point in time recovery using transaction log backups. There is nothing special about these log backups. Just use the usual transaction log backup syntax as always. Lets show that now…
-- Insert data before log backup INSERT INTO dbo.BigTable VALUES (6, N'Insert before log backup 1'); INSERT INTO dbo.OtherStuff VALUES (4, N'Insert before log backup 1'); SELECT * FROM dbo.BigTable ORDER BY ID; SELECT * FROM dbo.BigTableArchive ORDER BY ID; SELECT * FROM dbo.OtherStuff ORDER BY ID; BACKUP LOG PartialBackupsDemo TO DISK = N'F:\MSSQL\Test\LOG_PartialBackupsDemo_1.trn' WITH INIT, FORMAT;
-- Insert data before log backup INSERT INTO dbo.BigTable VALUES (7, N'Insert before log backup 2'); INSERT INTO dbo.OtherStuff VALUES (5, N'Insert before log backup 2'); SELECT * FROM dbo.BigTable ORDER BY ID; SELECT * FROM dbo.BigTableArchive ORDER BY ID; SELECT * FROM dbo.OtherStuff ORDER BY ID; BACKUP LOG PartialBackupsDemo TO DISK = N'F:\MSSQL\Test\LOG_PartialBackupsDemo_2.trn' WITH INIT, FORMAT;
That’s the backups demonstrated. Not too complicated right? But a backup strategy is only as good as the restore strategy, so lets go into how to restore all these.
Lets get straight into doing a restore. I’m going to restore the database side by side, with the restored database being called PartialBackupsDemo_Restore.
The first step is to restore the partial full backup. The key thins sto note here again is the READ_WRITE_FILEGROUPS and also the PARTIAL argument. Also for this we want to restore with NORECOVERY so we can restore the diff and log backups after…
USE [master]; RESTORE DATABASE PartialBackupsDemo_Restore READ_WRITE_FILEGROUPS FROM DISK = N'F:\MSSQL\Test\PARTIAL_FULL_PartialBackupsDemo.bak' WITH NORECOVERY, PARTIAL, REPLACE, MOVE N'PartialBackupsDemo' TO N'F:\MSSQL\Test\PartialBackupsDemo_Restore.mdf', MOVE N'PartialBackupsDemo_ReadWrite' TO N'F:\MSSQL\Test\PartialBackupsDemo_Restore_ReadWrite.ndf', MOVE N'PartialBackupsDemo_log' TO N'F:\MSSQL\Test\PartialBackupsDemo_Restore_log.ldf'; GO
See it’s only restoring the primary, read-write filegroup and the log file. Not the read-only file. Just like the partial full backup from earlier.
The next backup to restore would be the partial differential backup. But before we do that, lets see what exactly has been restored. We can do that by putting the database into standby mode and query the tables like so…
RESTORE DATABASE PartialBackupsDemo_Restore WITH STANDBY = N'F:\MSSQL\Test\Standby.tuf'
Here is what the database looks like in Object Explorer…
SELECT * FROM PartialBackupsDemo_Restore.dbo.BigTable ORDER BY ID; GO SELECT * FROM PartialBackupsDemo_Restore.dbo.BigTableArchive ORDER BY ID; GO SELECT * FROM PartialBackupsDemo_Restore.dbo.OtherStuff ORDER BY ID; GO
This shows us the results of the dbo.BigTable and dbo.OtherStuff in the Results tab:
But because the ReadOnlyFilegroup filegroup hasn’t been restored yet, the query on the dbo.BigTableArchive table fails with this error…
It’s basically telling us that the backup we took of the ReadOnlyFilegroup filegroup needs to be restored before we can query it. Fair enough. I’m not going to do that yet. I’m going to continue restoring the partial differential and log backups first to get the database more up to date. After each restore, I’ll put it into standby and run the same queries to see how it’s looking.
RESTORE DATABASE PartialBackupsDemo_Restore READ_WRITE_FILEGROUPS FROM DISK = N'F:\MSSQL\Test\PARTIAL_DIFF_PartialBackupsDemo.bak' WITH NORECOVERY; GO
Cool, so we restored the partial differential backup and the queries show that the data we inserted before we took that partial diff backup is now restored. Note again the READ_WRITE_FILEGROUPS argument? Now lets move onto restoring the first log backup.
RESTORE LOG PartialBackupsDemo_Restore FROM DISK = N'F:\MSSQL\Test\LOG_PartialBackupsDemo_1.trn' WITH NORECOVERY; GO
Yep, that first log backup restore has restored exactly the data we expected. Now for that 2nd log backup…
RESTORE LOG PartialBackupsDemo_Restore FROM DISK = N'F:\MSSQL\Test\LOG_PartialBackupsDemo_2.trn' WITH NORECOVERY; GO
Restores looking good so far. So our read-write data is fully up to date. Lets restore the read-only filegroup now. Note in the below restore command the filegroup name is specified with the MOVE argument too so it knows where to restore that file.
RESTORE DATABASE PartialBackupsDemo_Restore FILEGROUP = 'ReadOnlyFilegroup' FROM DISK = N'F:\MSSQL\Test\FILEGROUP_PartialBackupsDemo.bak' WITH NORECOVERY, MOVE N'PartialBackupsDemo_ReadOnly' to N'F:\MSSQL\Test\PartialBackupsDemo_Restore_ReadOnly.ndf'; GO
Awesome! We got our read-only data back! But before we are finished, I might want to take a final/tail log backup of the PartialBackupsDemo database and restore that too. I’ll do that now:
INSERT INTO dbo.BigTable VALUES (8, N'Insert before log backup 3'); INSERT INTO dbo.OtherStuff VALUES (6, N'Insert before log backup 3'); SELECT * FROM dbo.BigTable ORDER BY ID; SELECT * FROM dbo.BigTableArchive ORDER BY ID; SELECT * FROM dbo.OtherStuff ORDER BY ID; BACKUP LOG PartialBackupsDemo TO DISK = N'F:\MSSQL\Test\LOG_PartialBackupsDemo_3.trn' WITH INIT, FORMAT;
See there is another row inserted into both of the read-write tables. Now I’m going to restore that last log backup and then recover the database.
RESTORE LOG PartialBackupsDemo_Restore FROM DISK = N'F:\MSSQL\Test\LOG_PartialBackupsDemo_3.trn' WITH NORECOVERY; GO
Lovely. Now to finally recover the database.
RESTORE DATABASE PartialBackupsDemo_Restore WITH RECOVERY; GO
The database looks online in Object Explorer.
So that is the process to restore using a partial backup and restore strategy. This is a great way to break down a very large database, where taking a frequent full backup would be too large and simply not practicle. But as Rocky once said, the world ain’t all sunshine and rainbows…employing a partial backup and restore strategy has it’s downsides.
Things to Look Out For
The major downfall of this strategy is the management of the filegroup backups. The partial full, partial diff and logs all tend to take care of themselves without any problems. It’s the filegroups backups which get taken much less frequently – maybe monthly, maybe biannually, maybe annually.
In this extended time various things can happen. One is the mismanagement of the backup files themselves: if files site around too long, accidents can happen with them like DBAs cleaning up occasionally and deleting them thinking they are old and surely no longer required… yeah.
Another thing that can happen is where a filegroup is switched to read-only, but then not backed up. Due to you potentially having loads of read-only filegroups and therefore filegroup backups, you may not realise that one of the read-only filegroups hasn’t been backed up yet! I’ve seen it before…
Your database may look a bit different to my demo database. You might have a table with both read-write and read-only filegroups, all in the same table. This adds a slight extra complexity, but is the same backup and restore strategy. If you do this, and then don’t/can’t restore some of the old read-only filegroups to the table, in many normal cases it’s fine because the users are only querying and inserting to the read-write filegroups. But badly written code will result in a table scan or clustered index scan in the execution plan and that is bad news, since a scan will require all filegroups to be present and your query will throw an error 😦
Automation To Make it Work
If you do employ this backup and restore strategy, there are a few things you could consider implementing to ensure these downsides don’t come back to bite you
- Create a daily alert to check all filegroup backups against all your read-only filegroups. If they match, then great. If there is a mismatch, then you’re missing a backup and should fire an alert
- If you can have small window on a regular basis where filegroup backups can be taken, then create a scheduled filegroup backup job to automatically backup one filegroup, or a few, or maybe all of them regularly. This way, if you have forgotten to take a filegroup backup when you switch the filegroups to read-only, then it’s ok. Hey, you could even use that job as your method to automatically backup the recently switched read-only filegroups. The less time you spend running manual backups, the more time you have for doing fun stuff!
- As with all backups, you should be testing these backups by restoring them frequently. A backup is only as good as the restore. Create a scheduled job to frequently automatically restore the partial full, partial diff, some log backups and all the filegroups. Ensure they all restore, together. (If you aren’t already doing this for all your other databases, then you really really should do so)