How do I know if my SQL database is corrupted?

If you suspect your SQL database may be corrupted, there are several signs you can check for. Here are the top ways to identify SQL database corruption:

Unexpected Errors

One of the most common signs of database corruption is getting unusual errors when trying to access or modify data in the database. These errors often include cryptic messages that don’t clearly explain the problem. Some examples include “database page invalid” errors, IO errors indicating the database file itself is corrupted, and generic “database corruption” errors.

If you start seeing errors when running queries that normally succeed, that’s a red flag something is wrong with the database. The specific error message can sometimes hint at the type of corruption, e.g. an IO error points to a damaged database file on disk.

Incorrect Query Results

Another sign of corruption is getting obviously incorrect results from queries that should be returning proper data. For example, a query may leave out some rows, return duplicate rows, or show data that doesn’t match what’s actually in the database tables.

This can happen when the database’s indexes, used to optimize queries, become corrupted. The index data structure no longer accurately points to the right table rows. Running the same query repeatedly may return different results each time as the corruption causes instability.

Database Consistency Issues

SQL databases enforce strict consistency rules through transactions, foreign key constraints, and other safeguards. Database corruption can cause these consistency mechanisms to fail.

You may notice database inconsistencies like orphaned child records where the parent is missing, duplicate primary keys, missing indexes, or referential integrity violations without a clear cause. The corruption has caused invalid data to enter the database that bypasses the consistency protections.

Database File Errors

SQL database programs rely on being able to read and write database files on disk reliably. File I/O errors can be a sign of corruption issues:

  • Errors opening existing database files
  • Errors reading or writing data within a database file
  • Database file checksum mismatches
  • Zeroed out or unreadable portions of a database file

This can happen due to bad sectors on a disk, unexpected system shutdowns during writes, or buggy database software. The database management system itself may report file-related errors that indicate corruption.

Database Validation Failures

Most database systems include command-line tools you can run to check the structural validity of the database. For example, SQL Server offers DBCC CHECKDB, MySQL has mysqlcheck, and PostgreSQL provides pg_validate.

Running these commands will check database objects like tables and indexes for correct formatting and consistency. Any corruption issues will be reported, like a malformed b-tree index structure.

DBAs should periodically validate their databases proactively to find potential corruption early. But running validation after seeing other suspicious symptoms can confirm if corruption is the cause.

Log File Errors

The database log files record all transactions and database modifications. They are essential for crash recovery and restoring consistency after a failure. Errors writing to the log files, unexpected file growth, or checksum failures indicate potential corruption:

  • Log checksum failures
  • Errors or stall writing transaction log entries
  • Log file growing excessively large
  • Corrupted log entries that crash recovery fails on

This type of corruption is serious since the log is needed to roll back incomplete transactions and recover the database after a crash. Log file corruption also signals issues that may exist outside the log itself.

Crashes and Hardware Issues

Database corruption can result from underlying system problems that affect database files stored on disk:

  • Power failures or unexpected system crashes during writes
  • Faulty memory or storage drivers buggy code
  • Hard disk failures and bad sectors
  • Network problems interrupting replication

If you observe database corruption patterns after a specific type of system crash, buggy driver, or hardware malfunction, that points to the root cause. For example, corruption after power failures likely means the database isn’t properly transactional or the underlying disk system fails to write data properly on power loss.

Manual Inspection of Database Files

As a last resort, DBAs can dig into the database files themselves to inspect for corruption:

  • Scanning files for readable database objects like tables and indexes
  • Checking file headers and “magic numbers” used by the database engine
  • Looking for obviously malformed or nonsensical data entries

This requires expertise on the internals of the specific database system. It also runs the risk of exacerbating any corruption problems if faulty inspection code damages the files further!

But manual inspection can find obvious errors like texts files concatenated onto a database file by mistake. It should be done with care as a final deeper investigation when other corruption signs appear.

Mitigating Database Corruption Issues

Once database corruption is identified, there are steps DBAs can take to mitigate and prevent future occurrences:

  • Restore clean backups made before the corruption – the most direct fix.
  • Isolate/discard faulty hardware, drivers, etc. triggering the corruption.
  • Update database software to patch bugs causing the issues.
  • Improve transactional integrity and crash recovery procedures.
  • Increase database validation checks through cron jobs or monitoring.
  • Strengthen infrastructure redundancy and failover capabilities.

For safety, sensitive databases should be replicated. Corruption on replicas can be discarded and sync repaired from a known good master. Deploying database corruption detection aids like ECC memory can also help nip issues in the bud.

No database system is immune to corruption problems. But being proactive about prevention, detection, and recovery methods can minimize downtime and data loss when they inevitably do occur.

Example Scenarios of SQL Database Corruption

To better understand how database corruption happens, here are some real-world examples:

Index Corruption from Power Failure

A busy e-commerce store’s product database lost power during peak holiday shopping traffic. When powered back on, queries for products weren’t returning results correctly. The primary key index on the Products table was found to be corrupted after the outage.

The likely cause was the B-tree index structure being modified when the outage hit, leaving it in an inconsistent state. Running DBCC CHECKDB reported index corruption errors. Restoring indexes from a backup resolved it.

Torn Writes from Driver Bug

A SQL Server installation using a faulty SAN system would periodically return I/O errors and database inconsistencies. The application vendor found torn page writes were silently failing.

The SAN vendor confirmed a bug in the driver could lead to partial page writes being read back corrupted. Upgrading the drivers fixed the torn page errors. The app vendor added checksums and logging to detect reoccurrences.

Replication Sync Corruption

A MySQL master-slave replicated pair suffered replication halts and performance issues. Investigation found the slave had corrupt indexes that prevented application of the replication log.

Network connectivity issues between the master and slave were likely causing partial writes of the log to disk on the slave. Checksum mismatches then caused replication to halt. Network improvements solved the corrupt log writes.

Bit Flip Memory Errors

A PostgreSQL database had sporadic crashes and reports of data inconsistencies. Diagnosis found memory bit flip errors in the server RAM were corrupting data structures.

Upgrading to ECC RAM with automatic error correction resolved the bit errors before they could corrupt in-memory buffers. Reliability improved and further corruption was prevented.

Storage Media Decay

An aging SQL Server database started showing IO errors and checksum failures on one of its data files. Attempts to repair the file failed and returned inaccessible data errors.

Further investigation revealed physical bad sectors had developed on the magnetic hard disks holding the database files. Migrating the database to new SSD media resolved the physical media issues before total failure.

Conclusion

SQL database corruption can happen for many reasons, but the symptoms typically include unexplained errors when reading or writing data, checksum failures, incorrect query results, and failures during recovery or validation operations. Being able to quickly detect and diagnose corruption issues is crucial to restore database integrity and availability. Leveraging transactional capabilities, replication, backups, logging, and sound infrastructure goes a long way to mitigating impacts and improving recovery time.