Category Archives: Corruption

SQL Corruption Post-Mortem

“No matter what method you use to recover from corruption, you should always determine why it happened to avoid future problems.” (Paul Randal) The most likely causes are outlined below, in order of likelihood:

  1. Run I/O Subsystem and server memory diagnostics (Almost always this is the cause)
    • Windows OS
    • File system filter drivers
      • EG Antivirus, defraggers, encryption
    • Network cards, switches, cables
    • Memory Corruption
      • Bad memory chips
      • Scribblers
    • SAN controllers
    • RAID controllers
    • Disks
  2. Examine the SQL Server error log and windows event logs for clues
    • SQL Error logs
      • 823:A hard I/O error (When windows can’t return the data to SQL)
      • 824: A soft I/O error (When SQL detects there is a problem with the data it was given by windows)
      • 825: A read-retry error (These show as information alerts however they are critical because they lead to impending doom warning signs)
  3. Check that the firmware is up-to-date
  4. Investigate NTFS filter drivers

It is important to remember that corruptions are not caused by the following:

  • Anything an Application can do
  • Anything you can do in SQL Server with supported, documented commands
  • Interrupting a Database shrink, index rebuild, or long-running batch
  • Shutting down SQL Server

If this happened, it is likely that certain steps are not being done inside your organization to protect your data to the best of SQL Server’s abilities. To help, I made a list of what can be done to help detect early signs.

  1. Implement Error Alerts for Severity 19 and above errors
  2. Implement Page Verify – Checksum option
  3. Implement Backups with Checksum option
    • All databases INCLUDING system databases
    • Allows you to use a restore verifyonly with CHECKSUM to validate your backups
  4. Either Implement the Integrity check script (Widely Used) or at minimum a — DBCC CHECKDB with no_infomsgs — job to check integrity