Introduction
This guide contains configuration information for system administrators working with the Globus Usage Statistics Tools. It provides references to information on procedures typically performed by system administrators, including installation, configuring, deploying, and testing the installation. It also describes additional prerequisites and host settings necessary for Usage Statistics Tools operation.
Table of Contents
Table of Contents
Before installing the Usage Statistics Tools, first determine
into which directory you wish to install the tools. Set the
environment variable GLOBUS_LOCATION
to this
directory.
The Usage Statistics Tools are written in python, with the following prerequisites:
To install these prerequisites on debian linux, install the packages python, postgresql, and python-psycopg2 using apt-get:
#
apt-get install python postgresql python-psycopg2
To install these prerequisites on fedora linux, install the packages python, postgresql-server, and python-psycopg2 using yum:
#
yum install python postgresql-server python-psycopg2
For other systems, consult your operating system's documentation for package names, or install from the sources mentioned above.
Download the GPT tools from ftp://ftp.ncsa.uiuc.edu//aces/gpt/releases/gpt-3.2//gpt-3.2-src.tar.gz and the Usage Statistics Tools source bundle from globus_usagestats_server-1.2-src_bundle.tar.gz.
Choose a destination directory into which you will install the
usage stats tools and set the GLOBUS_LOCATION
environment variable to that path.
%
GLOBUS_LOCATION=/opt/usage-stats%
export GLOBUS_LOCATION
Untar and build gpt:
%
tar zxf gpt-3.2-src.tar.gz%
cd gpt-3.2%
./build_gpt -prefix=$GLOBUS_LOCATION%
cd ..
Build and install the usage stats tools:
%
$GLOBUS_LOCATION/sbin/gpt-build usagestats_server-1.0-src_bundle.tar.gz
First, create a database user and database to contain the usage stats
data. This and the following sections assume that the usagestats
database and the service will be run on the same machine. If that is
not the case, run the database configuration commands on the machine
running the database, and use that machine's hostname in place of
localhost
in the uploader configuration below.
#
su postgres%
createuser --pwprompt usagestatsEnter password for the new role:
Enter it again:
Shall the new role be a superuser? (y/n)
n
Shall the new role be allowed to create databases? (y/n)
n
Shall the new role be allowed to create more roles? (y/n)
n
%
createdb -O usagestats usagestats%
psql -h localhost --password -U usagestats usagestats <$GLOBUS_LOCATION
/share/globus_usage_tools/usage-tables.sql
On fedora, you'll need to configure the postgres service to allow
password authentication, unless you will
be running the globus-usage-uploader as the
usagestats
user. To do this, change the method
used for IPv4 local connections in
/var/lib/pgsql/data/pg_hba.conf
from
ident
to
md5
.
# "local" is for Unix domain socket connections only
local all all ident
# IPv4 local connections:
host all all 127.0.0.1/32 md5
# IPv6 local connections:
host all all ::1/128 ident
Then, create the user and database:
#
su postgres%
createuser --pwprompt usagestatsEnter password for the new role:
Enter it again:
Shall the new role be a superuser? (y/n)
n
Shall the new role be allowed to create databases? (y/n)
n
Shall the new role be allowed to create more roles? (y/n)
n
%
createdb -O usagestats usagestats%
psql -h localhost --password -U usagestats usagestats <$GLOBUS_LOCATION
/share/globus_usage_tools/usage-tables.sql
The usage stats package looks up the database connection
information and database password the file
.
The file contains one variable definition per line, with the value
contained within quotation marks. Add the password value into the line
$GLOBUS_LOCATION
/etc/globus-usage-tools.confpassword = ""
between the quotation marks.
The globus_usage_tools_test package contains test cases for all of the packet formats supported by the usage stats service. To run these tests, do the following:
%
cd $GLOBUS_LOCATION/test/globus_usage_tools_test
./TESTS.pl
If everything is working, the output of the tests should end with the lines indicating that all tests were successful. If not, check the error messages from the tests and verify that all configuration above is completed.
The usage stats tools consist of two programs: globus-usage-collector, and globus-usage-uploader. The globus-usage-collector program acts as a network service to receive usage stats packets and store them to the filesystem. The globus-usage-uploader parses those packet files and uploads their contents into a PostgreSQL database.
The globus-usage-collector program stores the
packets it receives to files named by the pattern
,
where $GLOBUS_LOCATION
/var/usage/YYYYMMDD
/HH
.gupYYYYMMDD
is the date that the packet
was received and HH
is the hour in which it
was received. The globus-usage-collector program
is typically run indefinitely in the background. Full usage information
is available in the reference
section of this manual.
To run globus-usage-collector, use the command:
%
$GLOBUS_LOCATION/sbin/globus-usage-collector -b -f $GLOBUS_LOCATION/var/globus-usage-collector.pidStarting usage collector on UDP port 4810 Running in background (7128)
The globus-usage-uploader program parses packet files
created by globus-usage-collector and loads them
into the database. The database contact information is stored in the
configuration file refered to in the
configuration section of
this document. The globus-usage-uploader processes
all files that were created before the current hour in the
directories and then exits. It is meant to be run
periodically by a service such as cron.
$GLOBUS_LOCATION
/var/usage/YYYYMMDD
The following example crontab will run the
globus-usage-uploader program every hour and remove
empty usage date directories every day. Replace
GLOBUS_LOCATION
with the path to where
the usage stats code is installed.
GLOBUS_LOCATION=GLOBUS_LOCATION
PATH=GLOBUS_LOCATION
/sbin MAILTO="[email protected]" # Every hour, upload packets to database 1 * * * * $GLOBUS_LOCATION/sbin/globus-usage-uploader # Daily, remove empty usage directories 59 0 * * * rmdir $GLOBUS_LOCATION/var/usage/*
Table of Contents
Usage Statistics Collector — Record usage statistics packets
globus-usage-collector
[-h] [-p PORT
] [-b] [-f FILE
]
The globus-usage-collector command is a servivce which accepts usage packets on a UDP port and writes them to a file for later processing.
The full set of command-line options to globus-usage-collector consists of:
-h | Display a help message and exit |
-p PORT | Listen on UDP port PORT
instead of the default port 4810 |
-d DIRECTORY | Write data to DIRECTORY
instead of the configured path. |
-b | Run the globus-usage-collector process in the background |
-f FILE | Write the process ID of the backgrounded
globus-usage-collector process to
FILE . |
The files are written in a subdirectory of the current directory
with its name derived from the current time in UTC. The form of this
directory name is YYYYMMDD
, (e.g. the date July 20,
2009 would be 20090720
). Within that directory,
files are generated with name based on the hour (again in UTC). The form of
the filename is HH.gup
.when the packet
was processed. That is, a packet processed at 3:20 a.m. on that same day would
be stored in the file 20090720/03.gup
.
Each usage packet file consists of simple records containing the binary UDP usage packet data. Each packet record consists of 4 fields:
Address Length (2 bytes) | Big-endian length of the Address |
Address | Big-endian packed binary address |
Packet Length (2 bytes) | Big-endian length of the packet |
Packet | Binary packet data |
Usage Statistics Database Uploader — Store usage statistics packets in a database
globus-usage-uploader
[-h] [-d DIRECTORY
] [-n]
The globus-usage-uploader command is a utility which parses usage packets in directories of the form created by globus-usage-collector and uploads them to a postgresql database.
The full set of command-line options to globus-usage-uploader consists of:
-h | Display a help message and exit |
-d DIRECTORY | Read data from DIRECTORY
instead of the configured path. |
-n | Don't commit usage packets to the database after processing files (for testing) |