DBCC CHECKDB WITH DATA_PURITY
What it is. How to fix the errors.

By Richard Fryar

In this article I will describe what the data purity check means, how to run it, how to interpret the output, how to fix the errors, and how to tell whether or not the data purity has ever been checked for a given database.

What is meant by data purity?

In versions of SQL Server up to 2000, it was possible for data stored in columns of certain datatypes (see below) to contain out-of-range values.

The most common reason for this was normally the use of bcp to bulk load data, which didn't carry out comprehensive range checks on the source data.

This issue was resolved in SQL Server 2005, and an option was added to the DBCC CHECKDB command so that this out-of-range data can be detected.

In fact, if a database is created in a version from 2005 onwards the data purity is checked automatically without the need to specify it as an option. However, if a database is migrated from a version of SQL Server before 2005, there is a chance that it could contain out-of-range data, and so to prevent unexpected errors when CHECKDB is run it is necessary to specify the option explicitly. Once a successful check has run, it will thereafter be performed automatically.

Which data types are affected?

This table shows the datatypes that may be affected, along with the range of valid values. Values outside these ranges will be detected by the data purity check.

DatatypeValid Range of Values
Unicode types (nvarchar, nchar)Each unicode character uses 2 bytes, so valid values should always be a multiple of 2 characters
floatZero
Positive values between 2.23E-308 and 1.79E+308
Negative values between -1.79E+308 and -2.23E-308
realZero
Positive values between 1.18E-38 and 3.40E+38
Negative values between -3.40E+38 and -1.18E-38
decimal and numericThe valid range depends on the precision and scale specified. So, for example, a decimal(5,2) has a valid range of -999.99 to +999.99
datetimeJan 1st 1753 to Dec 31st 9999
The time portion must be between 00:00:00.000 and 23:59:59.999

How do I run the data purity check?

To run a data purity check against a database, without displaying informational messages, use the following syntax:

DBCC CHECKDB ('YourDatabase') WITH DATA_PURITY, NO_INFOMSGS

What does the output mean?

Well, most of the time the output is clear and all future runs of DBCC CHECKDB will perform data purity checks automatically (unless the PHYSICAL_ONLY option is specified).

But if you are unlucky, here is a snippet of what you might see:

Msg 2570, Level 16, State 3, Line 1
Page (3:1802329), slot 10 in object ID 2048726351, index ID 1, partition ID 72057597199843328, alloc unit ID 72057597264396288 (type "In-row data"). Column "Quantity" value is out of range for data type "decimal". Update column to a legal value.
CHECKDB found 0 allocation errors and 92 consistency errors in table 'OrderLine' (object ID 2048726351)

This shows that there are 92 rows in the OrderLine table in which the Quantity column is out of range for a decimal.

In this case, let's say that the decimal has been defined as decimal(10,4), so the values are less than -999999.9999 or greater than 999999.9999. Actually it is possible that an invalid value lies within the allowed range, and I'll explain that in more detail in a moment.

How do I identify the rows that need fixing? (the easy way)

In most of the cases I've seen, it has been fairly easy to identify the rows that need fixing, as they are outside the allowable range. This means that a simple select statement will find the rows:

SELECT Id, Quantity
FROM OrderLine
WHERE Quantity < -999999.9999 OR Quantity > 999999.9999

For other datatypes you will have to modify this SELECT statement - refer to the table of valid ranges, above, or look at this Microsoft KB article for more examples.

How do I identify the rows that need fixing? (the harder way, but sometimes necessary)

From time to time you will come across cases where SELECT statements like the above return no rows.

How can this be? The DBCC CHECKDB WITH DATA_PURITY has told you there are 92 rows with invalid values yet you can find none that are outside the allowable range!

Well I don't pretend to understand the exact reason here, but I've see it happen. So here is how you go about identifying these pesky rows:

Have another look at the output from the CHECKDB command. You can see at the beginning of the second line "Page (1:18506), slot 10" This is all the information you need. Run the following command:

DBCC PAGE (YourDatabase, 3, 1802329, 3) WITH TABLERESULTS

There is a common misconception that you have to enable trace flag 3604 to use DBCC PAGE; that is not true, provided you specify WITH TABLERESULTS.

Now look for the row containing the slot number from the CHECKDB output and the columns you are interested in (the primary key and the out-of-range column).

Slot 0 Offset 0x60 Length 187 Slot 0 Column 1 Offset 0x4 Length 8 Length (physical) 8 ID 6373902
...
Slot 0 Offset 0x60 Length 187 Slot 0 Column 11 Offset 0x20 Length 13 Length (physical) 13 Quantity INVALID COLUMN VALUE

Mmm..., well that tells us the primary key (6373902), so we can now tell which row is affected, but all we have is "INVALID COLUMN VALUE" for the value. If we are to correct this, it would be good to have some idea of the value that should be there.

Fortunately, another SELECT gives us this:

SELECT Quantity
FROM OrderLine
WHERE Id = 6373902

And from this we get 73.4400

How do I fix the data?

A couple of important things first:

Firstly, don't make any changes without first performing a backup.

Secondly, don't make any assumptions about what value should be used to fix the data. Just because the above investigation showed a value of 73.4400, it doesn't mean that's right. And what about values outside the allowable range, or unicode values with an odd number of bytes?

The answer is to use one of the methods described above to get the current values for each invalid row, and then get together with someone who knows the system or application well; perhaps a developer or experienced user; and agree exactly what values to replace them with.

Performing the data update is, of course, obvious. You have the primary key and the name of the column to update, so:

UPDATE OrderLine
SET Quantity = 73.4400
WHERE Id = 6373902

This is fairly quick and painless if only a few rows are affected, but you may come across databases with several thousand rows of out-of-range data. I have submitted a script to SQL Server Central, to automate the retrieval of the primary keys, affected columns and current values. Here is the link www.sqlservercentral.com/scripts/DBCC+CHECKDB/100463

How can I tell if a database has previously been checked for data purity?

If you run DBCC CHECKDB regularly against all your databases (and of course you do) the easiest option is to add WITH DATA_PURITY and wait to see if any of them fail.

SQL CoPilot also highlights databases that have not been checked...

CHECKDB status in SQL CoPilot

...this is a section of the Database Overview report, showing that the database has not had a successful data purity check, or even a successful CHECKDB since December 2012.

But if you don't want to purchase SQL CoPilot, you can do it manually. Here's how, using a command called DBCC DBINFO.

DBCC DBINFO ('YourDatabase') WITH TABLERESULTS

Look for the rows containing dbi_dbccFlags and dbi_dbccLastKnownGood. The latter is the date and time of the last successful CHECKDB, and the dbi_dbccFlags contains 0 if no data purity check has ever successfully completed.

Here is the output for the SQL CoPilot snippet I've just shown you:

DBCC DBINFO CHECKDB Bad

And here's a good one:

DBCC DBINFO CHECKDB Good

OK?

I hope this description of DBCC CHECKDB WITH DATA_PURITY has been useful.
If you have any questions, please feel free to contact me.