TSM Database reloading

Summary

The database at the core of a TSM instance is prone to fragmentation, increasing its' size. (as of Mar 2005) There are no online utilities available to correct this problem. The increased size and fragmentation are reflected in expiration time and backup speed, eventually presenting an obstacle to normal operations.

This document describes a detailed procedure for the current recommendation to solve this problem: Unloading and Reloading the TSM database.

Evaluating the benefits of a reload

Before you set about taking down the server for an unload and reload, it would be wise to estimate wether the size reduction which will follow the procedure is worth the effort. The unload and reload can take rather a long time, so a reduction of small stature is probably not worth it.

There is a query recommended by the TSM listserv which purports to estimate the degree of fragmentation which your database is experiencing.

      SELECT CAST((100 - (CAST(MAX_REDUCTION_MB AS FLOAT) * 256 ) / -
(CAST(USABLE_PAGES AS FLOAT) - CAST(USED_PAGES AS FLOAT) ) * 100) AS -
DECIMAL(4,2)) AS PERCENT_FRAG FROM DB

should generate a number by which you can estimate the amount of benefit would accrue from your unload/reload.

FIXME: In this paragraph I will calibrate the returns from the query and suggest when is a good time.

Performing the unload-reload

  1. Prepare your environment for recovery

    You're will essentially destroy your TSM database as you perform the unload. You would be well advised to make preparations for a smooth disaster-recovery before you begin. You should, at least:

    • identify the device class to which you intend to unload the DB. In this example I am going to call it DBUNLOAD.
    • Ensure that the device class in question has capacity adequate to receive the unload. If you have enough space to sustain your total DB volume, plus 10-20 percent, you should be fine. You expect, of course, that the unload will be substantially smaller than the live DB.
    • Backup your VOLUMEHISTORY and DEVCONFIG
    • perform a database backup, full or incremental.
    • Locate and read the TSM documentation on DSMSERV LOADFORMAT, DSMSERV AUDITDB, DSMSERV UNLOADDB and DSMSERV LOADDB . There is a reference to the IBM and Tivoli documentation presences on the web at the Administrator Documentation page of this site.
    • You might wish to disable sessions in the dsmserv.opt with the disablescheds option. This will avoid interference as you bring the server up again, Just In Case.
    • Double-check the characteristics of your database and server. Are you in rollforward mode or normal? Are your volumes mirrored as you expect? Are the volumes in the locations you expect? Do you use any server-to-server communications? You'll need to know these things at the end of your reload, if you are to ensure that they are all working properly again.
  2. Halt the server.

    When you stop the TSM server for this process, you will want to do so with the 'quiesce' parameter, which will make it possible to perform the unload and reload without auditing the database thereafter.

  3. Perform the unload

    This will probably be the longest duration of any of your steps. Some examples of how long it's taken others are available in this list of real-world experiences. During the unload process, the TSM server takes all of the scattered data blocks, and assembles them in order.

    Be sure to carefully read the documentation of the DSMSERV UNLOADDB command in the TSM docs. I use

           DSMSERV UNLOADDB devclass=DBUNLOAD  \
    > /var/tmp/unloaddb.log 2>1 < /dev/null &

    This formulation lets you watch the log (possibly from some location other than that from which you began the process) and removes some concerns about (say) the machine on which your terminal resides dying in the interim.

    This command ought to result in a consistent database image. No audit ought to be necessary.

    At the end of the log output of the unload process, you will see a recap of the list of volumes used. This list will be necessary at reload-time.

     

  4. Format the DB containers.

    You must prepare the DB containers to receive the load. This process overwrites the recovery log, but you'd already blown away the database in the unload. You did do an incremental up in step 1, right?

    Be sure to carefully read the documentation of the DSMSERV LOADFORMAT command in the TSM docs. This command will be different for every installation. One of mine is

           DSMSERV LOADFORMAT 2 /dev/rtwebctlglv01a  /dev/rtwebctlglv02a   \
    4 /dev/rtwebctdblv01a /dev/rtwebctdblv02a \
    /dev/rtwebctdblv03a /dev/rtwebctdblv04a \
    > /var/tmp/loadformat.log 2>1 < /dev/null &

    This formulation lets you watch the log (possibly from some location other than that from which you began the process) and removes some concerns about (say) the machine on which your terminal resides dying in the interim.

    You may wish to use an alternate log volume for this process, one which is very small. The majority of the time taken by the LOADFORMAT is the initialization of the log. Once your server is up and running, you can add the production log volumes back to the log scheme, and re-extend the log.

    This format process formats ALL the database volumes supplied as a single database. If your database is mirrored, you should not supply both sets of volumes, only one. You'll re-mirror the database once the process is complete.

    The logformat process is fairly quick. Expect minutes, rather than tens of minutes.

  5. Perform the load

    The load process is usually substantially shorter than the unload. Less than half is quite common. During this process, the TSM server feeds the well-ordered data blocks back onto your server DB volumes.

    Be sure to carefully read the documentation of the DSMSERV LOADDB command in the TSM docs.

           DSMSERV LOADDB devclass=DBUNLOAD \ 
    VOLumenames= vola,volb[,...] \
    > /var/tmp/unloaddb.log 2>1 < /dev/null &

    This formulation lets you watch the log (possibly from some location other than that from which you began the process) and removes some concerns about (say) the machine on which your terminal resides dying in the interim.

    This command ought to result in a consistent database image. No audit ought to be necessary.

  6. Clean up the detritus

    Now, you are ready to restart the server and check that all is well. Some things you should expect, or expect to do:

    • Your DB will have its' assigned capacity as the complete capacity of all available volumes. I prefer to run with somewhat less; according to local conventions, you might want to shrink it some.

    • If your database was mirrored before, re-define the mirror copies. If you accidentally formatted both sets of volumes, blow away the empty ones (there should be plenty of empty ones) and redefine them in a manner that permits the re-mirroring.

    • If you used a temporary log volume to shorten loadformat time, then put your production volumes in place.
    • Do a full DB backup. You want to safeguard this new, more organized DB state.
    • For each of the servers with which you have set up server-to-server communications, perform an UPDATE SERVER FORCESYNC=YES so that the server identification token can be updated.
    • Backup your VOLUMEHISTORY and DEVCONFIG
    • If you disallowed sessions in your dsmserv.opt, then re-allow them now, and halt and restart the server.

Real-world experiences:

Platform Disk tech Original Size Final size Unload time Load time Comments
Win2K [unknown] 100GB 50GB (50% decrease) 22 hours 8 hours (64% faster) Inventory expiration went from 21 hours to 7
AIX SSA 43GB 28GB (34% decrease) 11 hours 3 hours (72% faster)
AIX SSA 16GB 12GB (25% decrease) 4 hours 1 hour (75% faster)