Daily Tasks

Introduction

This document is designed to be an introduction to daily TSM care-and-feeding, and a brief reference guide for on-call personnel.

Conventions used in this document:

tsm: CTRL> [command]

A command executed on the "CTRL" TSM server. See the system server-list page for a list of TSM servers available. The admin client should be able to access any of these servers from, for example, NERSP, with an invocation like dsmadmc -se=[servername] Most of the time, in these instructions, you'll be logging into the 'ctrl' server.

Some of these commands will be assocated with all of their output, but those which produce a large amount of output will be truncated.

Command redirection

We have more than 10 TSM servers; logging in to each of them individually to manipulate them is a royal pain. Consequently, we've defined redirection aliases to permit the servers to be dealt with in groups. Conventionally, an admin would log into 'CTRL', and then forward commands to other nodes in the cluster.

For example, if one wants to see what processes are running across the cluster, one could use the alias 'all', like so:

tsm: CTRL>all: q proc
ANR1699I Resolved ALL to 10 server(s) - issuing command Q PROC against server(s).
ANR1687I Output for command 'Q PROC ' issued against server GLMAIL02 follows:
ANR0944E QUERY PROCESS: No active processes found.
ANR1688I Output for command 'Q PROC ' issued against server GLMAIL02 completed.
ANR1687I Output for command 'Q PROC ' issued against server ERP follows:
ANR0944E QUERY PROCESS: No active processes found.
ANR1688I Output for command 'Q PROC ' issued against server ERP completed.
ANR1687I Output for command 'Q PROC ' issued against server COPIES follows:

Process Process Description Status
Number
-------- -------------------- -------------------------------------------------
2,276 Migration Disk Storage Pool COPIES-DISK, Moved Files: 59,
Moved Bytes: 31,939,899,392, Unreadable Files:
0, Unreadable Bytes: 0. Current Physical File
(bytes): 2,157,436,928 Current output volume:
T00755.
ANR1688I Output for command 'Q PROC ' issued against server COPIES completed.
ANR1687I Output for command 'Q PROC ' issued against server GLMAIL03 follows:
ANR0944E QUERY PROCESS: No active processes found.
ANR1688I Output for command 'Q PROC ' issued against server GLMAIL03 completed.
ANR1687I Output for command 'Q PROC ' issued against server GLMAIL04 follows:
ANR0944E QUERY PROCESS: No active processes found.
ANR1688I Output for command 'Q PROC ' issued against server GLMAIL04 completed.
ANR1687I Output for command 'Q PROC ' issued against server INT follows:

Process Process Description Status
Number
-------- -------------------- -------------------------------------------------
5,147 Space Reclamation Volume ATLCOPY.BFS.147057370 (storage pool
INT-C1), Moved Files: 1376252, Moved Bytes:
329,046,406,596, Unreadable Files: 0, Unreadable
Bytes: 0. Current Physical File (bytes):
44,096,335 Current input volume:
ATLCOPY.BFS.147057370. Current output volume:
ATLCOPY.BFS.151574152.
5,165 Migration Disk Storage Pool C-DISK, Moved Files: 34856,
Moved Bytes: 136,889,999,360, Unreadable Files:
0, Unreadable Bytes: 0. Current Physical File
(bytes): 419,561,472 Current output volume:
T00800.
5,167 Space Reclamation Volume T01772 (storage pool C-3592), Moved Files:
0, Moved Bytes: 0, Unreadable Files: 0,
Unreadable Bytes: 0. Current Physical File
(bytes): 260,341,315 Waiting for mount point in
device class (5416 seconds).
ANR1688I Output for command 'Q PROC ' issued against server INT completed.
ANR1687I Output for command 'Q PROC ' issued against server EXT follows:

Process Process Description Status
Number
-------- -------------------- -------------------------------------------------
298 Expiration Examined 4213308 objects, deleting 496663 backup
objects, 61 archive objects, 0 DB backup
volumes, 0 recovery plan files; 0 errors
encountered.
306 Migration Disk Storage Pool C-DISK, Moved Files: 330675,
Moved Bytes: 291,970,093,056, Unreadable Files:
0, Unreadable Bytes: 0. Current Physical File
(bytes): 20,480 Waiting for mount of output
volume T00215 (135 seconds).
ANR1688I Output for command 'Q PROC ' issued against server EXT completed.
ANR1687I Output for command 'Q PROC ' issued against server VI follows:
ANR0944E QUERY PROCESS: No active processes found.
ANR1688I Output for command 'Q PROC ' issued against server VI completed.
ANR1687I Output for command 'Q PROC ' issued against server WEBCT follows:
ANR0944E QUERY PROCESS: No active processes found.
ANR1688I Output for command 'Q PROC ' issued against server WEBCT completed.
ANR1687I Output for command 'Q PROC ' issued against server GLMAIL01 follows:
ANR0944E QUERY PROCESS: No active processes found.
ANR1688I Output for command 'Q PROC ' issued against server GLMAIL01 completed.
ANR1694I Server COPIES processed command 'Q PROC ' and completed successfully.
ANR1695W Server ERP processed command 'Q PROC ' but completed with warnings.
ANR1694I Server EXT processed command 'Q PROC ' and completed successfully.
ANR1695W Server GLMAIL01 processed command 'Q PROC ' but completed with warnings.
ANR1695W Server GLMAIL02 processed command 'Q PROC ' but completed with warnings.
ANR1695W Server GLMAIL03 processed command 'Q PROC ' but completed with warnings.
ANR1695W Server GLMAIL04 processed command 'Q PROC ' but completed with warnings.
ANR1694I Server INT processed command 'Q PROC ' and completed successfully.
ANR1695W Server VI processed command 'Q PROC ' but completed with warnings.
ANR1695W Server WEBCT processed command 'Q PROC ' but completed with warnings.
ANR1697I Command 'Q PROC ' processed by 10 server(s): 3 successful, 7 with warnings, and 0 with errors.
ANS8001I Return code 11.

There are several features of this output worth noting.

  • The output from each server is bracketed in statements denoting the server.
  • There is a trailing summary indicating how many servers could be contacted.
  • "No data returned" generates a warning. Obviously, this isn't usually a problem.

Early evening: ERP Backup window.

The ERP database backup window goes from 1800 to [mumble], and their workflow depends on the database backup completing before other nightly batch processing begins. Consequently, we've got to make sure that the tape drives are available. They need two 3592 tape drives to complete their usual processes in good order.

You should usually check the state of the system somewhere between 1630 and 1700, and double check in the vicinity of 1800. If there's a problem, Operations usually gets the call by 1815 or so, but if the erpies have some other disruption in their evening schedule, the call might come later.

To start with, see what is currently mounted. This command is "Query mount":

tsm: CTRL>q mount
ANR8330I 3592 volume T00013 is mounted R/W in drive DRIVE1 (/dev/rmt5), status: IN USE.
ANR8331I 3592 volume T01643 is mounted R/W in drive DRIVE0 (/dev/rmt4), status: DISMOUNTING.
ANR8331I 3592 volume T01467 is mounted R/W in drive DRIVE2 (/dev/rmt6), status: DISMOUNTING.
ANR8330I 3592 volume T00192 is mounted R/W in drive DRIVE3 (/dev/rmt7), status: IN USE.
ANR8379I Mount point in device class 3592DEV is waiting for the volume mount to complete, status: WAITING FOR VOLUME.
ANR8379I Mount point in device class 3592DEV is waiting for the volume mount to complete, status: WAITING FOR VOLUME.
ANR8379I Mount point in device class 3590DEV is waiting for the volume mount to complete, status: WAITING FOR VOLUME.
ANR8330I 3590 volume KA1130 is mounted R/W in drive DRIVE_B (/dev/rmt2), status: IN USE.
ANR8331I 3590 volume KA0874 is mounted R/W in drive DRIVE_A (/dev/rmt1), status: DISMOUNTING.
ANR8331I 3590 volume KA0955 is mounted R/O in drive DRIVE_C (/dev/rmt3), status: DISMOUNTING.
ANR8334I 10 matches found.

This is a rather busy snapshot, but shows most of the drives we've got. Importantly for the evenings' processes, there are four '3592' volumes (with 'T' tapes) currently in play. We only have four of them, so this means everything is busy. The 3590 drives are not used by the database backups, so we don't care much about them.

The next step is to locate where they are mounted. You can scan the various servers with a command-redirection shortcut "all:", as in:

tsm: CTRL>all: q mount
[....]
ANR1687I Output for command 'Q MOUNT ' issued against server INT follows:
ANR8333I SERVER volume ATLCOPY.BFS.151438527 is mounted R/W, status: IN USE.
ANR8330I 3592 volume T01000 is mounted R/W in drive DRIVE2 (/dev/rmt6), status: IN USE.
ANR8334I 2 matches found.
ANR1688I Output for command 'Q MOUNT ' issued against server INT completed.
[....]

The colon must be contiguous with the redirection shortcut. This will generate a listing of all the servers, with subsets of the mount display from CTRL possibly preset on each of them.

For each of the servers which reported a 3592 drive occupied, issue a "q proc" to determine what's using the drive.

tsm: CTRL>int: q proc
ANR1699I Resolved INT to 1 server(s) - issuing command Q PROC against server(s).
ANR1687I Output for command 'Q PROC ' issued against server INT follows:

Process Process Description Status
Number
-------- -------------------- -------------------------------------------------
5,122 Space Reclamation Offsite Volume(s) (storage pool INT-C1), Moved
Files: 315, Moved Bytes: 8,413,034,406,
Unreadable Files: 0, Unreadable Bytes: 0.
Current Physical File (bytes): 69,783,774
Current input volume: T01000. Current output
volume: ATLCOPY.BFS.151438527.
ANR1688I Output for command 'Q PROC ' issued against server INT completed.
ANR1694I Server INT processed command 'Q PROC ' and completed successfully.
ANR1697I Command 'Q PROC ' processed by 1 server(s): 1 successful, 0 with warnings, and 0 with errors.

And then, start cancelling them.

tsm: CTRL>ext: cancel proc 251
ANR1699I Resolved EXT to 1 server(s) - issuing command CANCEL PROC 251 against server(s).
ANR1687I Output for command 'CANCEL PROC 251 ' issued against server EXT follows:
ANR0940I Cancel request accepted for process 251.
ANR1688I Output for command 'CANCEL PROC 251 ' issued against server EXT completed.
ANR1694I Server EXT processed command 'CANCEL PROC 251 ' and completed successfully.
ANR1697I Command 'CANCEL PROC 251 ' processed by 1 server(s): 1 successful, 0 with warnings, and 0 with errors.

While the cancel command itself does not block, there can be some delay before it takes effect. This is especially so if the file currently being dealt with is large; processes can only be interrupted between files. It's possible that some other pending process will snap up the tape drive. If there are lots of processes in a state of 'Waiting for a mount point', it's worthwhile to just cancel all of them.

This check should be repeated just before 1800. Then, at or just after 1800, run q sess on erp.

tsm: CTRL>erp: q sess
ANR1699I Resolved ERP to 1 server(s) - issuing command Q SESS against
server(s).
ANR1687I Output for command 'Q SESS ' issued against server ERP follows:

Sess Comm. Sess Wait Bytes Bytes Sess Platform Client Name
Number Method State Time Sent Recvd Type
------- ------ ------ ------ ------- ------- ----- -------- -------------------
265,261 Tcp/Ip IdleW 4 S 1.2 G 24.0 K Node WinNT CHARGER.ERP.UFL.EDU
268,382 Tcp/Ip IdleW 39.0 M 270 206 Node DB2 HRQAT
268,428 Tcp/Ip IdleW 9.2 M 13.2 K 648 Node AIX ERP01-GE1.CNS.UFL.-
EDU
268,462 Tcp/Ip IdleW 31 S 9.8 K 6.6 K Node WinNT UFAD-ALPHA-WKS2.AD-
.UFL.EDU
268,469 Tcp/Ip Run 0 S 156 230 Admin AIX-RS/- ASR
6000
ANR1688I Output for command 'Q SESS ' issued against server ERP completed.
ANR1694I Server ERP processed command 'Q SESS ' and completed successfully.
ANR1697I Command 'Q SESS ' processed by 1 server(s): 1 successful, 0 with
warnings, and 0 with errors.

There will be a variety of sessions; you're interested in the database sessions, which will usually be nodes FIPRD or HRPRD, and will usually have something about DB2 in the "Platform" column.

Once the DB backups have their tape drives, we don't need to worry about it further: the backups are not interrupted by any normal processing.

Sessions initially appear in "IdleW" (Idle Wait) state, and then go into "MediaW" (Media Wait) state waiting to get access to a tape drive; during this time the wait time counter increments. When a session gets a tape drive, it goes back into MediaW until the desired tape is mounted in the drive (2-3 minutes, usually). Then the session enters RUN state, and after sufficient time to seek the tape to the correct