50 Ways

Summary

Allen S. Rout, session 6701

For any TSM admin, a day arrives when it is interesting or necessary to move some backup work from one TSM server to another. This could be preparation for a decommissioning, architecture change, or a simple reorganization; but for one reason or another, the node has to move.

Since TSM is so flexible, an admin is faced with an embarrasment of choices when picking how to move a given node. many different methods are available, and each has its' possible tradeoffs, advantages, and liabilities.

This whitepaper is the beginning of a catlog of the possible methods, and some important ways they differ. It is not intended to be exhaustive, but I hope that it will grow to be fairly complete.

I am focusing on delicate shades of difference between one method or another. We TSM admins tend to focus on the long-term reliability of our facilities; Transfers, however, are sufficiently brief use of resources that if we can shake our minds free of our very protective patterns of thoughts, we have many more options than we'd ordinarily use.

All of the transfer methods discussed here are subject to all sorts of problems if the correct domain structure is not present on the receiving server. Ensuring this is beyond the scope of this paper.

Some terms used in the remainder of this document:

SERVER_A

The source server. It is attached to LIB_A

SERVER_B

The target server, accepting some nodes of the export. It is attached to LIB_B

SERVER_C

Another target server, accepting some nodes of the export. It is attached to LIB_C

REMOTE

Another machine on the network.

LIB_ABC

A library physically and logically connected to all servers.

LIBMGR

A TSM server configured to be able to manage libraries for other TSM servers.

NODE_1

A node on SERVER_A, export desired.

NODE_2

A node on SERVER_A, export desired.

Methods involving serial media

One of the more common move-a-node scenaria occurs when a TSM server machine is replaced; Servers seem to be replaced quite a bit more commonly than do tape infrastructure. In this case, and in any other case where the two servers are within SAN range, it is possible to exploit the SAN connection to simplify the transfer.

  1. Conventional export; physical move of tapes. This is the most common, perhaps the 'default' method for transferring data from one server to another. On the source instance, one performs an

    SERVER_A> EXPORT NODE-
    NODE_1-
    filedata=all-
    devclass=LIB_A-

    One then ejects the tapes from the library LIB_A, and carts the tapes to location B. After checking the tapes into LIB_B, one can

    SERVER_B> IMPORT NODE-
    NODE_1-
    filedata=all-
    devclass=LIB_B-
    VOL=X,Y,Z[...]-

    After this process, the data will be available through the auspices of SERVER_B.

  2. Conventional export; share a physical library. The simplest of these methods is almost identical to the physical move scenario: First, perform the export as before.

    SERVER_A> EXPORT NODE-
    NODE_1-
    filedata=all-
    devclass=LIB_AB-

    Then, rather than ejecting the tapes from the library, (in this case LIB_AB), simply check them out without removing them. Then check them into the other instance's view of the same library. Then, on SERVER_B, you can

    SERVER_B> IMPORT NODE-
    NODE_1-
    filedata=all-
    devclass=LIB_AB-
    VOL=X,Y,Z[...]-

    After this process, the data will be available through the auspices of SERVER_B.

  3. Conventional export; share a virtual library. Checking volumes in and out of libraries can be time consuming and error-prone. If you have the facilities to do so, you can erect a shared library on one of your servers (in this case I'm positing an additional server for clarity's sake) and perform the transition all within its' demesne. First, perform the export as before. In this case, however, LIB_AB is presumed to be a library of type SHARED, managed by server LIBMGR. This library should be available to both SERVER_A and SERVER_B; in fact, they may have both been using the shared library for their own purposes beforehand.

    SERVER_A> EXPORT NODE-
    NODE_1-
    filedata=all-
    devclass=LIB_A-

    Then, on LIBMGR, we change the OWNERs of the tapes in question.

    LIBMGR> UPDATE LIBVOL-
    LIB_AB-
    owner=SERVER_B-

    Once this is accomplished, we have done, virtually, the same work which we did physically in the first scenario. The tapes are now available to SERVER_B and we can continue the import.

    SERVER_B> IMPORT NODE-
    NODE_1-
    filedata=all-
    devclass=LIB_B-
    VOL=X,Y,Z[...]-

    After this process, the data will be available through the auspices of SERVER_B.

Methods involving the network

Transferring one filespace at a time saves consistency, at the cost of substantial configuration complexity. For a large system which requires several days to complete in this manner, many reconfigurations will have to happen, and the chance for error will grow.

  1. Conventional export, one pass. This method is probably the second-most-common. It takes somewhat more preparation than the similar default exprt with physical transport of media, and network connectivity. However, it is simple and straightforward. Once the servers are defined to each other:

    SERVER_A> EXPORT NODE-
    NODE_1-
    filedata=all-
    toserver=SERVER_B-

    Time passes, tapes move, and some time later the export will be complete.

    The most prominent problem with this method is that it is a large chunk, taken all at once, which impedes access to the data while it runs. There will be many nodes for which this is not a problem; for others, however, it will be a Very Big Deal. I've got a node supporting a Content Manager instance whose export looked to take 60 hours or more. This is clearly too long a duration to keep a production service down. There are several options for addressing this.

  2. Export, one pass per FS. So long as the vast majority of your nodes' data is not on a single filespace, you might gain substantial advantage from exporting the node one chunk at a time.

    SERVER_A> EXPORT NODE-
    NODE_1-
    /some/filespace-
    filedata=all-
    toserver=SERVER_B-

    and in this manner limit the duration of your outage to the time necessary to transmit a single filespace. This process can extend over days, if necessary. All that is necessary to maintain the machine in this split state indefinitely is to run two sets of incrementals, one pointed at each server, and each limited by DOMAIN statements to the filespaces deemed current on that server.

  3. Export, 'slide under' running system. In many cases, TSM admins deem a substantial cost in resource occupation well worth configuration simplicity. At the cost of making an extra copy of the target system, and doing a bunch of extra expiration work, the export can be arranged at leisure. The EXPORT NODE command includes a facility for "merging" the filespaces being exported with extant filespaces on the target server of the same name. When this happens, the file versions present on the source server are added to the list of versions present on the new server. The active version from the source server is made inactive if it is older than the active version on the target server. Then, on next expiration pass, the retention directives active on the target server are applied to the new list of versions, with some likely being marked for reclamation. With this tool available, it is feasible to move a machine to a new server, and feel comfortable about having lost nothing, by the following process:

    • On SERVER_B (the target server) define a default management class which retains one extra copy more than the relevant management class on the source server. (SERVER_B:RETEXTRA must be higher than SERVER_A:RETEXTRA)

    • Direct the node to the target server, and run the initial incremental. Of course, all the data will be copied.

    • Once backups are stable running against the target server, begin exporting filespaces like so:

      SERVER_A> EXPORT NODE-
      NODE_1-
      /some/filespace-
      filedata=all-
      mergefilespaces=yes-
      toserver=SERVER_B-

      While the ongoing export will interfere with data retrieval from the source server, it will not interfere with incrementals on the target.

    • When all of the filespaces have been exported, you will have on the target server at least all the data you had on the source, modulo the retention charactersitcs and the aging of the data over the export durations.

    If your retention needs are exact, it will be important that the target management class retain one more 'extra' copy than the source management class. Since the active version on the source is injected into the list of inactive versions on the target, it will be possible for the oldest version from the source server to be 'pushed off the end' of the list. For most sites, for most purposes, this will be an irrelevancy. But you should keep it in mind.

Methods involving disk

  1. FILE devclass; arbitrary transport. If you define a FILE device class, the volumes so written are tolerably portable from machine to machine. It's possible that you could end up with endian problems moving from e.g. MVS to an open-systems or windows machine. In the simplest case, you can just write your volumes on SERVER_A and then use your protocol of choice. If you erect a FILE device class on the target server, it is not necessary that the volumes in that devclass be resident in the directory in question: You can simply copy or mount them anywhere, and reference them in the IMPORT statement. There are a variety of ways to get the data from A to B; removable media such as CD and DVD are obvious choices, though transporting hard drives has its' attraction also. I'll discuss some interesting corners of this space in detail, but here are some of the broad highways:

    1. FTP

    2. SCP

    3. HTTP

    4. RSYNC

    5. carry CD/DVD

    6. carry hard drive

    7. carry Tape. (but then, why not just write directly to it?)

    8. Move SSA wires

    9. Rezone Fiber fabric

     

  1. FILE devclass; NFS mount (SMB, etc) . Most TSM admins would shudder if you suggested storing their data on NFS mounts, or their cousin in the Windows world, SMB. The weaknesses of such an approach are legion, but dramatic expansion in exposure is probably the single biggest problem: You've roughly doubled the circumstances in which your server could have serious problems.

    However, for the special case of temporary storage, especially for transport, there are some enticing aspects to the idea. For one, moving nodes tends to represent a lot of data; it may be complicated or time consuming to arrange for adequate storage to be directly attached to the TSM server. Further, many types of storage addition and deletion require that the server be booted. At best this will constrain the speed of export/import to the rate of outage windows available. In some organizations, it may make the operation all but impossible.

    To make use of remote-mounted space for this purpose simply requires that your organization have enough space "somewhere". It's not even necessary that you have the same namespace. For example:

    root@server_a# mount REMOTE:/export/reallybig /mnt/a
    root@server_a# dsmadmc
    [...]
    SERVER_A> def devc reallybigfile FILE maxcap=100G dir=/mnt/a
    SERVER_A> export node NODE_1 devc=reallybigfile filed=all

    At this point, either on the command line or in the activity log, you will find a list of volumes (files) that were used for the export. Say it's one file, 0001111.exp

    root@server_b# mount REMOTE:/export/reallybig /mnt/b
    root@server_b# mkdir /mnt/redherring
    root@server_b# dsmadmc
    [...]
    SERVER_B> def devc irrelevant FILE dir=/mnt/redherring
    SERVER_B> import node NODE_1 devc=irrelevant vol=/mnt/b/0001111.exp
  2. FILE devclass, NFS, library managed. Bookkeeping can be an issue for managing volumes rattling around on disk. It's very easy to lose track of what is owned by whom, and while the suffixes of the disk files help you convince yourself that you're not deleting (say) a database backup, it can be very difficult to determine when a given export file is really not needed any more.

    One aid to such bookkeeping can be defining your FILE volumes in a library. With the shared library primitives, the library can be used more or less as a physical library might be. This process is discussed above, similar volume updates can be performed on the disk volumes. This provides a simple, TSM-ish method for recording where the volume might be needed.

    This would be especially relevant if your transfer is going to occur over a rather long period; keeping track of what volumes need to be with the old server, and which with the new, could get error-prone and complex.

  3. Copy the whole thing. If your goal is to transplant a server from one place to another (as is mine) then there will come a time when you have already moved all the things that are straightforward, all the things that can be dodged around, and all the things you can punt. You will be left with the kernel of your problem-child nodes, all in one server, each representing unacceptable sacrifices. For me, this is the data repository standing behind my Content Manager installation.

    When you get to this point, it is important to recall that you can still treat the operation like a server move. You can simply ensure that all the data is housed on media available to both old and new servers, and then (after suitable testing) perform a database backup on SERVER_A and then restore it onto SERVER_B. So long as your shortest reusedelay does not expire, you will even be able to revert to SERVER_A without losing anything but the work done against SERVER_B.

  4. Copy the whole thing, blow some of it away. What if your problem-child nodes are mutually hostile? I was lucky enough to be able to move another problem node, which had a particularly nasty 17M-file filespace, out of my old server. Had my constraints been different, I might have had both it and my CM repository in the same server, still affecting each other, neither amenable to export.

    In this case, so long as you can completely collocate all of your data, you can even restore your SERVER_A server to multiple instances on SERVER_B. This is a rather dangerous maneuver, but it is possible to do it safely. NODE_1 and NODE_2 will be our problem children in this example, and will be the only two nodes still resident on SERVER_A. We want NODE_1 on SERVER_B and NODE_2 on SERVER_C.

    • Ensure that the SERVER_A data is completely (COMPLETELY, mind you) collocated. I will presume that all the data is resident on LIBMGR, because it simplifies the procedure.

    • Back up the SERVER_A database to media available to SERVER_B

    • Restore the database to SERVER_B. Lock NODE_2 on SERVER_B.

    • Restore the database to SERVER_C. Lock NODE_1 on SERVER_C

    At this point, you have all of your database artifacts in place. You can perform incrementals, but you can't access any of the old data because all of the tapes still belong to SERVER_A.

    • Assign all of the tapes associated with NODE_1 to SERVER_B. This can be done with the same UPDATE LIBVOLUME procedure used above to transfer ownership of export volumes.

    • Assign all of the tapes associated with NODE_2 to SERVER_C.

    Now, you have all of the old-server data for each node available to the server which is intended to host the node going forward. However, you -also- have the state of the other node, as of the time of the transfer.

    • DELETE FILESPACE * for node_2 on SERVER_B.

    • DELETE FILESPACE * for node_1 on SERVER_C.

    You've separated the siamese twins: Each node is now present on its' desired server, alone, with access to all of the old server's data. While this procedure has used the abstraction of the TSM-shared library to accomplish the movement of tapes, it is equally possible to move the tapes physically, or alter library categories, if such a procedure is more convenient in your system environment.