Usage Statistics Tools Administrator's Guide

Introduction

This guide contains configuration information for system administrators working with the Globus Usage Statistics Tools. It provides references to information on procedures typically performed by system administrators, including installation, configuring, deploying, and testing the installation. It also describes additional prerequisites and host settings necessary for Usage Statistics Tools operation.

Table of Contents

1. Building and Installing

Preparation and Prerequisites

Downloading and Installing the Usage Statistics Tools

Configuring the Usage Stats Tools

Database configuration
Usage Statistics Tools Uploader Configuration
Testing the Usage Stats Tools
Running the Usage Stats Tools

2. Command Reference

Chapter 1. Building and Installing

Table of Contents

Preparation and Prerequisites

Downloading and Installing the Usage Statistics Tools

Configuring the Usage Stats Tools

Database configuration
Usage Statistics Tools Uploader Configuration
Testing the Usage Stats Tools
Running the Usage Stats Tools

Preparation and Prerequisites

Before installing the Usage Statistics Tools, first determine into which directory you wish to install the tools. Set the environment variable GLOBUS_LOCATION to this directory.

The Usage Statistics Tools are written in python, with the following prerequisites:

Python 2.5.x or 2.6.x
Postgresql Server 8.3.x or 8.4.x
psycopg2python postgres module

To install these prerequisites on debian linux, install the packages python, postgresql, and python-psycopg2 using apt-get:

# apt-get install python postgresql python-psycopg2

To install these prerequisites on fedora linux, install the packages python, postgresql-server, and python-psycopg2 using yum:

# yum install python postgresql-server python-psycopg2

For other systems, consult your operating system's documentation for package names, or install from the sources mentioned above.

Downloading and Installing the Usage Statistics Tools

Download the GPT tools from ftp://ftp.ncsa.uiuc.edu//aces/gpt/releases/gpt-3.2//gpt-3.2-src.tar.gz and the Usage Statistics Tools source bundle from globus_usagestats_server-1.2-src_bundle.tar.gz.

Choose a destination directory into which you will install the usage stats tools and set the GLOBUS_LOCATION environment variable to that path.

% GLOBUS_LOCATION=/opt/usage-stats
% export GLOBUS_LOCATION

Untar and build gpt:

% tar zxf gpt-3.2-src.tar.gz
% cd gpt-3.2
% ./build_gpt -prefix=$GLOBUS_LOCATION
% cd ..

Build and install the usage stats tools:

% $GLOBUS_LOCATION/sbin/gpt-build usagestats_server-1.0-src_bundle.tar.gz

Configuring the Usage Stats Tools

Database configuration

First, create a database user and database to contain the usage stats data. This and the following sections assume that the usagestats database and the service will be run on the same machine. If that is not the case, run the database configuration commands on the machine running the database, and use that machine's hostname in place of localhost in the uploader configuration below.

Debian-specific configuration

# su postgres
% createuser --pwprompt usagestats
Enter password for the new role:
Enter it again:
Shall the new role be a superuser? (y/n)  n
Shall the new role be allowed to create databases? (y/n)  n
Shall the new role be allowed to create more roles? (y/n)  n
% createdb -O usagestats usagestats
% psql -h localhost --password -U usagestats usagestats < $GLOBUS_LOCATION/share/globus_usage_tools/usage-tables.sql

Fedora-specific configuration

On fedora, you'll need to configure the postgres service to allow password authentication, unless you will be running the globus-usage-uploader as the usagestats user. To do this, change the method used for IPv4 local connections in /var/lib/pgsql/data/pg_hba.conf from ident to md5.

# "local" is for Unix domain socket connections only
local   all         all                               ident
# IPv4 local connections:
host    all         all         127.0.0.1/32          md5
# IPv6 local connections:
host    all         all         ::1/128               ident

Then, create the user and database:

# su postgres
% createuser --pwprompt usagestats
Enter password for the new role:
Enter it again:
Shall the new role be a superuser? (y/n)  n
Shall the new role be allowed to create databases? (y/n)  n
Shall the new role be allowed to create more roles? (y/n)  n
% createdb -O usagestats usagestats
% psql -h localhost --password -U usagestats usagestats < $GLOBUS_LOCATION/share/globus_usage_tools/usage-tables.sql

Usage Statistics Tools Uploader Configuration

The usage stats package looks up the database connection information and database password the file $GLOBUS_LOCATION/etc/globus-usage-tools.conf. The file contains one variable definition per line, with the value contained within quotation marks. Add the password value into the line password = "" between the quotation marks.

Testing the Usage Stats Tools

The globus_usage_tools_test package contains test cases for all of the packet formats supported by the usage stats service. To run these tests, do the following:

% cd $GLOBUS_LOCATION/test/globus_usage_tools_test
./TESTS.pl

If everything is working, the output of the tests should end with the lines indicating that all tests were successful. If not, check the error messages from the tests and verify that all configuration above is completed.

Running the Usage Stats Tools

The usage stats tools consist of two programs: globus-usage-collector, and globus-usage-uploader. The globus-usage-collector program acts as a network service to receive usage stats packets and store them to the filesystem. The globus-usage-uploader parses those packet files and uploads their contents into a PostgreSQL database.

The globus-usage-collector program stores the packets it receives to files named by the pattern $GLOBUS_LOCATION/var/usage/YYYYMMDD/HH.gup, where YYYYMMDD is the date that the packet was received and HH is the hour in which it was received. The globus-usage-collector program is typically run indefinitely in the background. Full usage information is available in the reference section of this manual.

To run globus-usage-collector, use the command:

% $GLOBUS_LOCATION/sbin/globus-usage-collector -b -f $GLOBUS_LOCATION/var/globus-usage-collector.pid
Starting usage collector on UDP port 4810
Running in background (7128)

The globus-usage-uploader program parses packet files created by globus-usage-collector and loads them into the database. The database contact information is stored in the configuration file refered to in the configuration section of this document. The globus-usage-uploader processes all files that were created before the current hour in the $GLOBUS_LOCATION/var/usage/YYYYMMDD directories and then exits. It is meant to be run periodically by a service such as cron.

Note

If the globus-usage-uploader program is not run periodically, the globus-usage-collector program may fail if it reaches disk or directory limits.

The following example crontab will run the globus-usage-uploader program every hour and remove empty usage date directories every day. Replace GLOBUS_LOCATION with the path to where the usage stats code is installed.

GLOBUS_LOCATION=GLOBUS_LOCATION
PATH=GLOBUS_LOCATION/sbin
MAILTO="[email protected]"
# Every hour, upload packets to database
1 * * * * $GLOBUS_LOCATION/sbin/globus-usage-uploader
# Daily, remove empty usage directories
59 0 * * * rmdir $GLOBUS_LOCATION/var/usage/*

Chapter 2. Command Reference

Table of Contents

globus-usage-collector — Record usage statistics packets
globus-usage-uploader — Store usage statistics packets in a database

Name

Usage Statistics Collector — Record usage statistics packets

Synopsis

globus-usage-collector [-h] [-p PORT] [-b] [-f FILE]

Description

The globus-usage-collector command is a servivce which accepts usage packets on a UDP port and writes them to a file for later processing.

The full set of command-line options to globus-usage-collector consists of:

-h	Display a help message and exit
-p `PORT`	Listen on UDP port `PORT` instead of the default port 4810
-d `DIRECTORY`	Write data to `DIRECTORY` instead of the configured path.
-b	Run the globus-usage-collector process in the background
-f `FILE`	Write the process ID of the backgrounded globus-usage-collector process to `FILE`.

Usage Packet Files

The files are written in a subdirectory of the current directory with its name derived from the current time in UTC. The form of this directory name is YYYYMMDD, (e.g. the date July 20, 2009 would be 20090720). Within that directory, files are generated with name based on the hour (again in UTC). The form of the filename is HH.gup .when the packet was processed. That is, a packet processed at 3:20 a.m. on that same day would be stored in the file 20090720/03.gup.

Each usage packet file consists of simple records containing the binary UDP usage packet data. Each packet record consists of 4 fields:

Address Length (2 bytes)	Big-endian length of the Address
Address	Big-endian packed binary address
Packet Length (2 bytes)	Big-endian length of the packet
Packet	Binary packet data

Name

Usage Statistics Database Uploader — Store usage statistics packets in a database

Synopsis

globus-usage-uploader [-h] [-d DIRECTORY] [-n]

Description

The globus-usage-uploader command is a utility which parses usage packets in directories of the form created by globus-usage-collector and uploads them to a postgresql database.

The full set of command-line options to globus-usage-uploader consists of:

-h	Display a help message and exit
-d `DIRECTORY`	Read data from `DIRECTORY` instead of the configured path.
-n	Don't commit usage packets to the database after processing files (for testing)

Examples

% globus-usage-uploader

Processing packets: 20090723
Processing packets: 20090724
14.gup... ok [2 packets]

Processed 1 file(s).
Processed 2 packet(s).