Friday, February 7, 2003

Microsoft Access Corruption

I had a great question come up today on Microsoft Access Databases and corruption. Why does an access database become corrupted, how do you fix it, and how do you prevent the issue from happening in the future?

In order to answer these questions, first there needs to be an explanation of how MS Access works. MS Access at its core is a single user database with lots of features for reporting, form design, and automation. However, the database is still just a file, and not an application. That means there are very few checks in the JET Database Engine that keep users from running over each other when more than one user is using the same database. This makes Access cheaper to purchase (<$200) than something like MS SQL Server for the backend and Visual Studio for a front end (>$2000). MS Access is also much more user friendly, and has a great IDE for designing forms, and linking the form to underlying data. These all make the cost of entry into making an application in MS Access very low, which is why MS Access is so prolific in some environments.

Long story short, MS Access is a great tool for doing adhoc reporting, short term data recording, or in some instances huge applications where only 1 user is using the application at a time. The ugly corruption issue appears when lots of folks are using MS Access in ways that work, but where a multi-user database application is the better solution. In other words, if more than 1 person is using the database at a given time, MS Access is not the correct solution.

If you already have an MS access application that keeps getting corrupted, you can do some things to remediate the database itself in the short term. Access databases corrupt in 3 major ways, though some happen more often than others. All the below examples assume that it is imperative that the database stop corrupting, and the correct solution (a multi-user application) is not yet available.

Most Often: Index Corruption
The most often corruption cause is when 2 machines try rebuilding the same index (clustered or not) at the same time. This breaks the index for that table in access, and the table becomes unreadable. The way to tell if this is the root cause is when the database becomes corrupt, you can recover all data, forms, & modules except for 1 table. This can be solved by eliminating all indexes on the table, and limiting the indexes to only one for the primary key. This will obviously slow the database down, but it will be able to support many users over a network on the same table. Composite keys are the 'best' candidates for becoming corrupted and destroying a table.

2nd Most Often: System Table Corruption
The next most often is when the system tables become corrupted. Objects (tables, forms, modules, reports) are stored in system tables in the access database. Sometimes the indexes or data in those tables can become corrupted and unreadable. You can tell if this happens because all objects after a certain alphanumeric point are unrecoverable. For example, all tables that start with 'H' or a later letter are unrecoverable. To remediate this, split the database between the data (tables) and 'non-data' (queries, modules, forms, reports, etc). Use links from the 'non-data' to the 'data' database. If corruption of this type still occurs, split the 'data' database further into pieces. If you have to, the 'data' database can be split so that there is only 1 table in each 'data' database, where the 'non-data' database links to multiple 'data' databases.

As part of this solution, each user should have a separate Access database on their machine that contains the queries, forms, reports and modules that links to the 'data' database(s). That way multiple people are not trying to open and modify the data in the system tables.


3rd Most Often: Row Corruption
The last type of corruption is when a single data page gets corrupted. This is manifested where 1 table has a few rows that can not be accessed, 'phantom' rows that have blank primary keys, or a table that shows '#error' in every cell. This is caused by 2 users trying to update the same row at the same time, and the JET engine not keeping the 2 users separate. This one takes a lot more work to solve, and it is much better to just move to some other solution besides access. Solving this requires large changes to how data is written to the tables that the JET engine can't handle. I ran into the problem several years ago with a application that was being used by about 70 people simultaneously. Solved this by writing an 'intent' log similar to how any multi-user databases work.

First, the forms were divorced from the underlying data. The form the user is working with is populated entirely though DAO calls. I used DAO because DAO is actually faster when utilizing the JET database Engine, which is what MS Access runs on. It is about the only thing DAO does better than ADO.

Then, the user tells an Access form that a row needs an update or addition though a save button or the like. That command is then processed though a module and a row added to a flat transaction file that is kept on the network somewhere. That row in the transaction log states that a user wants to update or add a row in the database table. The user's system waits 250ms and checks the flat file again to make sure no one else is trying to update the same data. If all is ok, the row is written. If there is a contention, then the client that asks last removes the intent and retries it 500ms later. The reason 250ms was chosen as the wait time is because the network latency was about 25ms, and I wanted to be darn sure that network latency wasn't an issue.

Recap
Obviously a better solution is to use a multi-user database if more than 1 user will be in a database at a time, and the system is being created or there is time for a rewrite. But using the above methods, I was able to almost entirely eliminate MS Access database corruption on a 70 concurrent user application.