mcs | computing | architectures | machines | software | help | information

MCS Big Data FAQ


We are trying discourage the use of the ADSM HSM servers in favor of the direct use of the ADSM Archive client. The HSM servers are less reliable and slower than the ADSM Archive client software on any given platform.

Please email the Systems Group with any questions to which you can not find the answer, or ideas for improving this FAQ.


Big Data

  1. What resources does MCS have for storing big data sets?
  2. What is ADSM?
  3. What can ADSM do for me?
  4. How fast is ADSM?
  5. How can I easily put data into ADSM, and retrieve it later?
  6. What if I stored data into ADSM on one machine, but want to retrieve it on a different machine?
  7. How do I use the ADSM HSM systems in MCS?
  8. How do I access my files in the MCS HSM?
  9. How do I access my files in the CCST HSM?
  10. ADSM documentation for  UNIX and Microsoft Windows client software.

^ Back to Questions ^

What resources does MCS have for storing big data sets?

Here's a map:

MCS_Storage.GIF (45377 bytes)

In this picture there are several pieces of interest:

IBM 3494 Tape Robot and
IBM 3590 Tape Drives

The IBM 3494 tape robot can store nearly 4,000 tapes.  Each tape can store 10 Gigabytes of data before hardware compression.   This works out to well over 40 terabytes of online storage.

The IBM 3590 tape drive can access any point on a 10GB tape in 30 seconds.   It can supply data to the ADSM server at speeds up to 12 megabytes per second.

ADSM Server The ADSM Server is an RS/6000 model F50.  It has some local disk for caching and storing the ADSM database, but its primary task is to move data between the network and the five IBM 3590 tape drives it controls.
MCS Network The glue that holds the MCS storage system together.  At some point all of the data has to go over the network.  The speed of the network can range from a few tens of  kilobytes per second (using NFS) to well over 10 megabytes per second.  See below for more details on how your choices on how to store data can affect the performance you receive.
QuadADSM, UCATS, and MCSHSM Hierarchical Storage Manager (HSM) Front End Servers Each of these computers has one or more filesystems that are configured to push their data off to tape when the filesystem starts getting full.   This is very convenient for occasional use, but turns out to be very inefficient for both small (less than 100 kilobyte) and large (more than 100 megabyte) files--or for data sets that contain a bunch (more than 5) of files that have to be moved around as a set..

Depending on what kind of data you have, how often you access it, and other factors, any or all of the resources above may be useful to you.  Read on for more detail.


^ Back to Questions ^

What is ADSM?

Here's what the Tivoli Storage Manager (formerly ADSM) marketing web page says:

Tivoli Storage Manager (formerly Tivoli ADSM) provides the only application-centric approach to information management by delivering a true end-to-end solution spanning the entire enterprise. Tivoli Storage Manager is an enterprise-wide solution integrating automated network backup, restore and archive, storage management and disaster recovery functions utilized by more than one million systems worldwide.

ADSM is software.  One piece of ADSM software runs on the ADSM Server in the picture above and handles moving bits from tape to the network.  The ADSM Server maintains a database of all the data stored on the tapes.  Another piece of ADSM software runs on each of the computers in the MCS Division that are concerned with data storage.  This client software can be used to do traditional backups of hard drives.  It can be configured to do Hierarchical Storage Management, where a filesystem "never fills up" because the actual data is migrated off to tape.   Finally, it can do explicit archiving of files under user control.


^ Back to Questions ^

What can ADSM do for me?

ADSM is already protecting the data you have in the MCS Division UNIX and NT servers.   Beyond that, here are some things you can do with ADSM:

Hierarchical Storage Management If you have just a few moderate-sized files, you might want to store them into one of the HSM Front Ends.  Some of them are accessible through NFS to the MCS Division UNIX systems.  But if you're doing serious data storage, you'll be disappointed in how fast this runs.
ADSM Archive The most efficient way to deal with sets of data files is through the ADSM Archive process.  You tell ADSM to store a directory and it does so.   When you ask for the directory back, it restores the files in the order they're stored on tape--which is the quickest way possible.

 


^ Back to Questions ^

How fast does ADSM go?

Here are some examples to give you a feel for ADSM performance:

Reading out of an HSM Client into a UNIX system using NFS:

21 files, 1 gigabyte of data in 25 minutes.  (Roughly 700 kilobytes/second overall)
I ran the UNIX program sum across 21 files.  The 21 files added up to about 1 Gigabyte of data.  These files were all on tape, so each file access required the ADSM server to mount a tape and seek to the file before copying it to the HSM front end.
Storing PIOFS files into ADSM's Archive Client:

29 files, 1.15 gigabytes of data in 11 minutes (Roughly 1.8 megabytes/second overall)

The most efficient way to deal with sets of data files is through the ADSM Archive process.  You tell ADSM to store a directory and it does so.  When you ask for the directory back, it restores the files in the order they're stored on tape--which is the quickest way possible.

All of these examples were carefully run to ensure no unfair advantage.  All of the retrieved data was on tape, and that tape was not mounted in a drive before the retrieve started.   The times for retrieval include 45-90 seconds each for the tape to be mounted and positioned.  Nightly backups were running on the ADSM server, so this shows the system under load.

Reading out of ADSM's Archive Client into PIOFS on Quad:

29 files, 1.15 gigabytes of data in 10 minutes (Roughly 2 megabytes/second overall)

Reading out of ADSM's Archive Client into Denali's /quicksand:

29 files, 1.15 gigabytes of data in 5 minutes (Roughly 4.2 megabytes/second overall)


^ Back to Questions ^

How can I easily put data into ADSM?

By far the easiest way to put data into ADSM is with a Perl script that acts as a front end to the ADSM program dsm.  /mcs/bin/archive is the name of this Perl script.   It should be reachable from everywhere in MCS that has ADSM support. 

In fact, the quickest way to find out whether a given system has ADSM installed is to simply type

/mcs/bin/archive

and see if it complains about ADSM software not being installed.

With ADSM's Archive/Retrieve client, each operation can be individually named.   This makes it easy for you to find your data files later.  For the purposes of this FAQ, I'm creating an archive named "Big Data FAQ Example".

To store data on Denali, simply run /mcs/bin/archive with the name of the directory you want to archive, and a description of the files in that directory.  Here's an example:

denali % /mcs/bin/archive /quicksand/nickless/FAQ-test-dir \
         -description="Big Data FAQ Example"

A bunch of text follows, but eventually it finishes with a report like this:

Archive processing of '/disks/quicksand/nickless/FAQ-test-dir/' finished without failure.

Total number of objects inspected: 29
Total number of objects archived: 29
Total number of objects updated: 0
Total number of objects rebound: 0
Total number of objects deleted: 0
Total number of objects failed: 0
Total number of bytes transferred: 1.15 GB
Data transfer time: 374.62 sec
Network data transfer rate: 3,245.19 KB/sec
Aggregate data transfer rate: 2,889.26 KB/sec
Objects compressed by: 0%
Elapsed processing time: 00:07:00

That's all there is to it!

For more information on how to use the /mcs/bin/archive script, simply run it with no options to get a full usage statement.

There is also documentation available for the ADSM Archive/Retrieve client.


^ Back to Questions ^

How can I easily retrieve data from ADSM?

For this, I recommend using the ADSM Archive/Retrieve graphical user interface.   With your X $DISPLAY variable set properly (either manually or through SSH) you can run /usr/adsm/dsm (or simply dsm on an AIX or Linux system).  You'll get a window that looks like this:

adsm-1.gif (15006 bytes)

Click on the option that I've circled in yellow, and you'll get a window that looks like this:

adsm-2.gif (19421 bytes)

In order to open the menus, I clicked on five hotpoints, circled in yellow, from the upper left to the bottom right.

See the grey boxes?  You click on them to select individual files or directories.   In the following window, I've clicked on three of the grey boxes associated with files I would like to retrieve:

adsm-3.gif (19394 bytes)

The next step is to click on the Retrieve button (circled in yellow above).  When you do that, you'll get a window that looks like this:

adsm-4.gif (8543 bytes)

If I wanted to put the files somewhere other than the original location, this window is where I could choose the other destination.  But since the original location is fine, I simply click on Retrieve (circled in yellow) to go on.  While the restore goes on, I see a window that looks like this:

adsm-5.gif (11638 bytes)

The slider bar (circled in yellow) will move from the left to the right as each file is retrieved.  When it makes it all the way to the right, the file is done.  You can see from this circled red number in this example that the network is running about as fast as possible--a 155 million bit-per-second network running at 14.3 million BYTES per second is doing quite well.

Once all of the files are retrieved you'll get a window with an OK button.  Click on the OK and you're done.  Once again, that's all there is to it.

You can also use the /mcs/bin/archive script to retrieve files.  Run it without any arguments to see how.  Or, you can use the /usr/adsm/dsmc  (or /usr/bin/dsmc depending on architecture) command directly; that's covered in the documentation.


^ Back to Questions ^

What if I stored data into ADSM on one machine, but want to retrieve it on another?

ADSM treats each individual machine separately.  That's good and bad.  The bad news is that you have to explicitly tell ADSM that it's OK to give your Denali login access to files that are archived on Quad.  The good news is that you can do this for any user on any ADSM client machine, even if your login isn't the same on both machines or they don't share any common system administration.

There are two steps needed to make this work.  The first is telling ADSM what other systems can have access to the files archived on a given machine, and this only has to be done once.  The second is to actually retrieve files stored on a different machine.

In the examples below, we will use the Graphical User Interface to do the work.   You can do the same things with command line arguments to the /usr/adsm/dsmc command (or /usr/bin/dsmc depending on architecture); see the documentation for more information.

Telling ADSM what other systems can have access to archived files:

Let's do this using the ADSM Graphical User Interface.  When you start ADSM, you get this window:

adsm-6.gif (14489 bytes)

Click on the yellow-circled option in order to get the submenu.  Then click on the menu option circled in red to get this window.  (You may have to resize it to get everything you see here.)

adsm-7.gif (22272 bytes)

In this example, I've given the user "nickless" on the machine "QUAD.MCS.ANL.GOV" access to all of the files and directories I've Archived on this machine.  I typed all the information into the area circled in yellow, clicked on the button circled in orange, and then clicked on the Add button circled in purple.   I'm now ready to click on the OK button circled on purple to complete the process.

You need to use the fully qualified domain name.  QUAD.MCS.ANL.GOV is NOT the same as QUAD.  ADSM won't figure out the difference.

Once you have done this, it sticks forever until you change it.

Getting ready to actually retrieve files stored on a different machine:

Before you go through this process to retrieve files that were stored on another machine, you have to tell ADSM what machine to use.  For the purposes of this example, let's say that we've stored a file in /piofs on Quad, and want to retrieve it on Denali.  Here is how we would go about it.   First, we start /usr/adsm/dsm on Denali and get this window:

adsm-8.gif (12738 bytes)

On this familiar window, choose the menu option circled in purple to bring up this new window:

adsm-9.gif (7103 bytes)

Change the Node name (circled in green) to read QUAD.MCS.ANL.GOV instead.  Then click Set (circled in red).  From then on, simply follow this procedure to retrieve your files.  The menus will appear as if you are running on Quad, but if you retrieve a file it will be written to Denali's filesystems.


^ Back to Questions ^

How do I use the ADSM HSM systems in MCS?

So we can't convince you to use the ADSM Archive/Retrieve client

There are two major ADSM HSM systems in MCS.  There is the standard MCS HSM, and the CCST HSM.


^ Back to Questions ^

How do I access my files in the MCS HSM?

To access files in the MCS HSM server, you have 3 options:

  1. ftp to mcshsm.mcs.anl.gov :
    ftp mcshsm.mcs.anl.gov

    You can use FTP to get/put your files onto the hsm server space.    This is probably the best of the methods to use.  When you ftp, your ftp session will start in your HSM directory on the hsm server.  Note that the Remote Access Policy does not permit the use of this mechanism outside of the MCS Division.

  2. Use scp to copy files to and from mcshsm :
    scp file.name mcshsm.mcs.anl.gov:file.name
  3. Login directly to mcshsm and access the files by hand.   When you login, your home directory will be your hsm directory.

In every case, the path to your HSM directory on mcshsm.mcs.anl.gov will be:

/hsm/mcs/<login>

... i.e. the same name as before, but only available on mcshsm.


^ Back to Questions ^

How do I access my files in the CCST HSM?

To access files in the CCST HSM server, you have two options:

  1. On any of the quad front ends, cd to /hsm/ccst/ and then into your subdirectory.
  2. Login directly to mcshsm and access the files by hand.   When you login, your home directory will be your hsm directory.
  3. Use scp to copy files to and from mcshsm :
    scp file.name mcshsm.mcs.anl.gov:file.name
  4. ftp to mcshsm.mcs.anl.gov :
    ftp mcshsm.mcs.anl.gov

NOTE: The Remote Access Policy does not permit the use of FTP from hosts outside of the MCS Division.

When you ftp or scp, your session will start in your HSM directory on the hsm server.  

Note that you can not get to /hsm/ccst on any of Quad's nodes, or in fact any system except the Quad front ends.


^ Back to Questions ^


[ FAQs | Account Request | Equipment Checkout | Announcements | Tech Updates | Systems ]
Last updated on June 01, 2004
systems@mcs.anl.gov
webmaster@mcs.anl.gov