meta data for this page
Database housekeeping and garbage collection
Garbage
Garbage, for want of a better name, is the detritus that Firebird leaves around in the database after a rollback has been carried out. This is basically a copy of the row(s) from the table(s) that were being updated (or deleted) by the transaction prior to the rollback.
Because Firebird uses multi-generational architecture, every time a row is updated or deleted, Firebird keeps a copy in the database. These copies use space in the pages and can remain in the database for some time.
In addition to taking up space in the database, these old copies can lead to increased transaction startup times.
There are two types of garbage:
- Remnants from a committed transaction.
- Remnants from an aborted (rolled-back) transaction.
These remnants are simply older copies of the rows that were being updated by the respective transactions.
The differences are that:
- Whenever a subsequent transaction reaches garbage from a committed transaction, that garbage is automatically cleared out.
- Rolled-back garbage is never automatically cleared out. This means that on a database with a lot of rolled-back transactions, there could be a large build up of old copies of the rows that were updated and then rolled back.
Firebird will automatically sweep through the database and remove the remnants of rolled-back transactions and this has two effects:
- The database size is reduced as the old copies of rows are deleted.
- The performance of the database may be affected while the sweep is in progress.
Note: One other method of clearing out old rolled-back transactions' garbage is simply to carry out a database backup.
In the Superserver version of Firebird 2.0, garbage collection has been vastly improved. There are now three different ways of operation and these are configurable by setting the GCPOLICY parameter in the firebird.conf configuration file. By default, Superserver uses combined' while Classic Server uses cooperative. The other option is background.
Note: Classic Server ignores the setting and always uses cooperative garbage collection.
Cooperative garbage collection
This is the default setting, indeed the only setting, that Classic Server uses. In this mode, the normal operation - as described above - takes place. When a full scan is performed (perhaps during a backup) old versions of the rows are deleted at that point in time.
Background garbage collection
Superserver has, even since before version 1.0, performed background garbage collection where the server informs the garbage collector about old versions of updated and deleted rows when they are ready to be cleaned up. This helps avoid the need to force a full scan of each record in the database tables to get the garbage collector to remove these old versions.
When all rows in a table are read by the server, any old record versions are flagged to the garbage collector as being ready to be cleared out. They are not deleted by the scanning process as in the cooperative method. The garbage collector runs as a separate background thread and it will, at some point, remove these older record versions from the database.
Combined garbage collection
This is the default garbage collection method for Superserver installations. In this mode, both the above methods are used together.
Setting the sweep interval
The default sweep interval for a new database is 20,000. The sweep interval is the difference between the oldest interesting transaction or OIT and the next transaction number.
Note: This doesn't mean that every 20,000 transaction a sweep will take place. It will take place when the difference between the OIT and the next transaction is greater than the sweep interval.
An interesting transaction is one which has not yet committed. It may be still active, in limbo or may have been rolled back.
The sweep facility runs through the database and gets rid of old rows in tables that are out of date. This prevents the database from growing too big and helps reduce the time it takes to start a new transaction on the database.
Note: If you find that starting a new transaction takes a long time, it may be a good idea to run a manual sweep of the database in case the need for a sweep is causing the hold-up.
You can check if a manual sweep may be required by running the gstat utility to check the database header page and extract the oldest and next transaction numbers from the output. If the gap is small (less than the sweep interval) then a manual sweep may be in order. Alternatively, the SHOW DATABASE command in isql will also show the details you need.
A manual sweep can be run by using the -s[weep] command. (See below.)
To alter the database's automatic sweep interval, use the following command:
gfix -h[ousekeeping] INTERVAL database_name
The INTERVAL parameter is the new value for the sweep interval. The database_name parameter is the database upon which you wish to alter the setting for automatic sweeping. The following example shows the setting being changed from the default to a new value of 1,000:
linux> gfix -h 1000 my_employee linux> gstat -header my_employee | grep Sweep Sweep interval: 1000
Manual garbage collection
If automatic sweeping has been turned off, or only runs rarely because of the sweep interval setting, the DBA can manually force a sweep to be performed. The command to carry out this task is:
gfix -s[weep] [-i[gnore]] database_name
This command will force the garbage left over from old rolled-back transactions to be removed, reducing the database size and improving the performance of new transactions.
The -i[gnore] option may be supplied. This forces Firebird to ignore checksum errors on database pages. This is not a good idea and should rarely need to be used, however, if your database has suffered some problems it might be necessary to use it.
The following example shows a manual database sweep being implemented:
linux> gfix -sweep my_employee
Disabling automatic sweeping
If you set the sweep interval to zero then automatic sweeping will be disabled. This implies that there will be no automatic housekeeping done so your database performance will not suffer as a result of the processing requirements of the automatic sweep.
If you disable sweeping you are advised to run a manual sweep at regular intervals when the database is quiet. Alternatively, simply make sure that you take regular backups of the database and as this is something you should be doing anyway, it shouldn't be a problem.