Category Archives: Maintenance

VSS and the SQL Writer Service

What is VSS?

Volume Shadow Copy Service (VSS) is a set of Microsoft APIs. It allows users to perform backups or snapshots on files and volumes even when they are in use. The VSS provides a consistent interface that allows coordination between user applications. Even if they update data on disk (writers) or back up applications (requestors).

How it works

  1. Gather the writer metadata and prepare for shadow copy creation.
  2. Each writer creates an XML description of what is getting backed-up. These XMLs are provided to the VSS. The writer also defines a restore method for all components. The VSS provides the writer’s description to the requester. It then selects the components that is getting backed up.
  3. The VSS notifies all the writers to prepare their data for making a shadow copy.
  4. Each writer prepares the data as appropriate. When finished, the writer notifies the VSS.
  5. The VSS tells the writers to freeze application write I/O requests. Read I/O requests are still possible. This allows it to create the shadow copy of the volume or volumes. The application freeze is not allowed to take longer than 60 seconds. The VSS flushes the file system buffers and then freezes the file system. This ensures that the file system metadata logged and is in a consistent order.
  6. The VSS tells the provider to create the shadow copy. This period lasts no more than 10 seconds. During which all write I/O requests to the file system remain frozen.
  7. The VSS releases file system write I/O requests.
  8. VSS tells the writers to thaw application write I/O requests. At this point applications are free to resume writing data to the disk.
    • The shadow copy creation can abort if the writers stay frozen for longer than 60 seconds. Or, if the providers take longer than 10 seconds to commit the shadow copy.
  9. The requester can retry the process (go back to step 1) or notify the administrator to retry at a later time.
  10. Upon creation, the VSS returns location information for the shadow copy to the requester.

More detailed information here.

SQL Writer Service

When it relates to SQL server, most of the time any errors on backups are not caused by SQL. But when they are, it is likely that the SQL Writer Service is usually not enabled. The service installs without user intervention when installing SQL Server.

This service provides added functionality for backup and restore of SQL Server through VSS.

SQL Writer supports:

  • Full database backup and restore including full-text catalogs
  • Differential backup and restore
  • Restore with move
  • Database rename
  • Copy-only backup
  • Auto-recovery of database snapshot

SQL Writer does not support:

  • Log backups
  • File and filegroup backup
  • Page restore

Limitations/Problems

  1. Point-in-time recovery – It is possible you will not meet your RPO for your business. As usually, you can only recover to the last point of your backup not a specific time.
  2. Transaction Log Clearing – VSS backups will not clear the SQL log file. This needs to happen through the Naive SQL commands.
  3. Pauses in I/O – As mentioned in the “How it works” section. During a backup, it is possible that database I/O will pause for just under 70 seconds. This can lead to users complaining about performance issues or kick-outs.

Common troubleshooting

A few things to check when getting errors:

  1. The service needs to be running at the time of the backup.
  2. This service also needs to be running as the Local System account.
  3. NT Service\SQLWriter login needs to be active inside SQL server. This account is designated as no login, which limits vulnerability.

Next steps

If none of these are causes of your problem, it is a good idea to reach out to your backup solution vendor. You will want to include the following in the email:

  1. That you checked the three steps listed above
  2. The event viewer errors or warnings
  3. The detailed description of what you are having
  4. How often the problem occurs

Transaction log Q&A

This week I am going to take a break from the database modeling series. Instead, I want to discuss some of the confusion around the transaction log. The transaction log is one of the most important things to understand in SQL server. Especially when referring to High availability or Disaster recovery. In these features, SQL uses the transaction log as a key component. Afterall, without your transaction log your database is unusable.

What is a transaction?

A transaction is a single unit of work that complies with A.C.I.D. standards. A.C.I.D stands for:

  • Atomic – It either all works or none of it does
  • Consistent – It complies with the “Rules”, meaning constraints, triggers, datatype adherence, etc.
  • Isolated – It will not affect other transactions that are running at the same time
  • Durable – Changes are permanent and will survive a power outage, crash, and/or memory dump

Additionally, SQL server uses implicit transactions. This means that it will apply the begin and commit/rollback for you if not specified. The line shown below is a single transaction.

DELETE FROM person WHERE lastname='Billy'

Whereas, this line is two transactions.

DELETE FROM person WHERE lastname='Billy'
DELETE FROM person WHERE firstname='Timmy'

To make this durable you need to wrap it with a begin transaction and end with commit or rollback. Otherwise known as an explicit transaction.

Transaction log Operation

To outline the transaction log operation I always found it easier

  1. User executes a transaction
  2. Pages that are going to change go to the SQL cache on RAM
  3. The log file records the details of the transaction. This process also assigns the transaction a Log Sequence Number (LSN).
  4. After storing the intent of the transaction, SQL server then modifies the pages on RAM.
  5. The pages are, at some point, written back to the disk.

So the now that we understand the basics of the transaction log. We can answer one of the first questions I receive.

What happens during a crash?

If your server crashes, it is the responsibility of the log file to help you recover. The dirty pages (pages that were in RAM that did not make it to disk) are now lost. But, the transaction log contains a full description of the intent of each transaction.

Upon restart, SQL begins to check integrity and consistency utilizing the log. To do this, it compares the LSNs in the log to those on disk. If the LSN in the log is newer it updates the file on the disk, known as the REDO phase. If the LSN in the log is older, it then knows it needs to rollback, known as the UNDO phase. These make sure that when the database comes online, it is consistent.

This segways into the new question I often receive.

Why does my log file keep growing?

To understand this, you need to first understand how the log file handles information. To start, the transaction log file is a circular file. This means that it has a set amount of space and continues to create new log records sequentially until the end. Once it reaches the end of the file SQL server faces two options.

  1. Circle back to the beginning and override older log files.
  2. Grow the bounds of the log file and continue to consume space.

With Option 2, it requires extra space and needs the overhead to grow the file. When the log file grows, it does so by creating new virtual log files (VLFs) with the log. You can consider these as groups of transactions within the log. Thus using the transaction log with option 1 is preferable.

So since this is preferable, why does sql keep growing? Well it is simple. SQL server needs to keep log records around until no purpose remains for their storage. These purposes vary, but the main reason is the records are critical to the REDO and UNDO phase. SQL does this by marking them as “Active”. This means that SQL is not allowed to reuse the space in the log until all the transactions are “Inactive” in the VLF. Then and ONLY then can SQL server reuse the space allocated to the VLF.

So in short, your SQL server is leaving the transactions as active. But don’t freak out, this is normal for the FULL recovery model. In this model you need to take transaction log backups to fix this.

Unfortunately, this conversation usually doesn’t happen right away. This comes up when the log file is using a massive amount of space on the drive. This leads to the next question…

How do I Manage the log file?

There are many different articles about this. But, they do not always have the best/correct answer. Most people just recommend to either:

  • Switching the database to Simple/Shrink the file/Switch back to full
  • Truncating the log with NO_LOG/TRUNCATE_ONLY

Both of these solutions kill me, luckily, starting in 2008 option two is no longer possible. Yet, Option 1 is still TERRIBLE. When you look at what we talked about earlier, I hope this makes you cringe.

To explain:

When you shrink the file this way you completely discard everything in the log file. That means all the work since the last backup is gone if your database crashes or fails for whatever reason. This violates the whole point of the FULL/BULK_LOGGED recovery models. The point of this model is to preserve the transaction log so a database can recover to a specific time.

Besides, if you shrink the file, it will grow again. This growth is likely to create log fragmentation in the VLFs. This is a performance concern much like disk fragmentation is. Though it is not as much of a problem in SQL 2014 and above but it should still be on your mind. Paul Randall has a good article about the creation of VLFs listed here.

Additionally, this will pause your workload while the log file is growing.

There are only 2 solutions that are acceptable in my opinion:

  1. Run in Simple mode – If you don’t mind the possibility of losing data in the event of a disaster, this is a good option.
  2. Running regular log backups. Especially, if you are looking for point-in-time recovery.

Finally, after I explain this to them, I always get…

How do I get it back to a reasonable size?

It is always tricky to give the “Reasonable” size answer. The truth is that it varies based on your workload and size of your database. I would always suggest to do the following:

  • If it is your company’s proprietary database speak to the developers. They should have a staging environment where you can track the log growth.
  • If you are using another company’s database reach out to their support. They might have an answer but they are more likely to give you the “It depends” response.
  • Guess… I have seen guestimates from 20-25% of the MDF to 1-2% on Multi-terabyte databases.

The good news is that it is easy to change. To resize the log file I always use the following steps:

Wait for an inactive time of day.  It would be best to put the database into single user mode first but it is not required.

ALTER DATABASE databasename
SET SINGLE_USER
WITH ROLLBACK IMMEDIATE;

Run a transaction log backup if you are in full recovery mode. Or if you’re using the simple recovery just clear the transaction log by running a checkpoint.

BACKUP LOG databasename
TO DISK = 'File path and extension (.TRN)';

Shrink the log to the smallest possible amount.

DBCC SHRINKFILE(TlogLogicalFileName, TRUNCATEONLY);

You will need the filename of the log to complete this step. Run the following script if you do not know it.

EXEC SP_HELP

Alter the database to change the transaction log file

ALTER DATABASE databasename
MODIFY FILE
(
NAME = TlogLogicalFileName
, SIZE = NewSize
);

If you set it to single user, Change it back

ALTER DATABASE databasename
SET MULTI_USER;

SQLSkills  suggests to grow the logs in 8GB chunks using 8000MB as the NewSize variable. This creates VLFs that are 512MB. These smaller chunks make it easier to maintain the smaller log file size.

Maintenance Tasks

Every day in the IT field people are thrown to being the accidental DBA. While this is a mistake for most companies to make it happens, especially with the change in the healthcare field and the adoption of Electronic Medical Record (EMR) software. I wanted to make sure to get some knowledge out there on the subject of how to maintain a database and avoid a Resume Producing Event (RPE).

Before getting started on the maintenance it is important to understand and discuss about your Recovery point objectives(RPO) and recovery time objectives(RTO).

RTO & RPO By Paul Randal

Core Maintenance Tasks – SQL server Jobs

Backups – One of the key things to remember is that backups should be ideally performed to a local disk for performance AND then copied to another location for storage. Far to often these are found on the same disk in my experience and is terrible idea for disaster recovery.

Recommended (mission critical)

  • Full database backup 1 week
  • Nightly differential
  • Transaction logs every 15 minutes

Remember FULL recovery mode for mission critical databases where data loss is not acceptable as this allows for point-in-time recovery and log backups. For more information feel free to check out the Previous Post Here.

Consistency Checks – It is usually assumed that SQL server does consistency checks and identifies Database corruption immediately, but in reality, SQL server does not notice the corruption until the corrupt page is accessed from disk again. This could be weeks or even months away if you are unlucky and at that point you will likely not have the backups prior to the corruption.

Recommended – (Based on maintenance windows)

  • Run at least one time a week

Please note to remove the dependency for a maintenance window it is possible restore a database to another server and run a system integrity check on the database from there. If corruption is found, you would have to identify if it is your backup or if it is in your live/production server. This process would then have to call a maintenance window in some circumstances.

Index Maintenance – Rebuilding your indexes is a key part of maintaining performance in a database. This can generate a high amount of log information and high amount of I/Os on the environment as all the tables are updated with a full table scan.

Recommended –

  • Run one time a week

As a note to this, There is an option to “Reorganize” indexes, this should NOT be scheduled at the same time of the Rebuild. This task reorganizes the indexes but does not include an update of the statistics. In order to complete the same thing as a rebuild you would also need to run a update statistics task. Still, reorganizing indexes is a fully online operation and can be a good interim step between regular executions of the rebuild index task.

Recommended –

  • Run one time a week – Halfway between your executions of the Rebuild.

For additional information, I would always recommend looking into the series from SQLskills.com Accedental DBA.

Recovery Model Behavior

Full Recovery Model – This recovery is the most common recovery model used for SQL Server. This model is required for database mirroring and availability groups.

  • Allows log backups and zero-to-minimal data loss after a disaster
  • All changes to the database are fully logged to the transaction log file
  • The transaction log will not clear data until a log backup has been performed, meaning that the log backup is what clears the transaction log
    • This means that log backups are required, otherwise the transaction log will continually grow until your drive is full

Simple Recovery Model – Commonly used when log backups and point-in-time recovery are not required.

  • Some operations can use minimal logging
    • All other operations are fully logged, as in the FULL recovery model
  • The transaction log will not clear data until a log backup has been performed, meaning that the log backup is what clears the transaction log
  • Log backups are NOT possible with this recovery model

Important -Disaster recovery is only possible using full and differential backups, so data loss will occur back to the most recent data backup. Recovery is not an appropriate choice for production systems where loss of recent changes is unacceptable.

Bulk_Logged Recovery Model – Commonly used to minimize log growth during bulk operations, while preserving the ability to take log backups.

  • Some operations can use minimal logging
    • All other operations are fully logged, as in the FULL recovery model
  • The transaction log will not clear data until a log backup has been performed, meaning that the log backup is what clears the transaction log
    • This means that log backups are required, otherwise the transaction log will continually grow until your drive is full
  • The time running in this mode should be minimized to reduce potential loss of data

Important – As mentioned above, the Bulk_Logged recovery model should not be used if there is the possibility of losing user transactions.