Scheduler Event Generator / Job Manager Integration

Introduction

This bundle contains a new method for the pre-ws GRAM Job Manager to monitor the jobs it submits to the local scheduler. After installing, you can configure a job manager to use the new event based method for monitoring jobs, instead of the script-based polling implementation.

This change consists of a few parts

Source bundles:

globus-job-manager-event-generator

The globus-job-manager-event-generator script creates a log of all scheduler events related to a particular scheduler instance. This script was created for two purposes

One instance of the globus-job-manager-event-generator must be running for each scheduler type to be implemented using the Scheduler Event Generator interface to receive job state changes. This program is located in the sbin subdirectory of the GLOBUS_LOCATION. The typical command line for this program is $GLOBUS_LOCATION/sbin/globus-job-manager-event-generator -s SCHEDULER_TYPE, where SCHEDULER_TYPE is the scheduler name of the Scheduler Event Generator module which should be used to generate events (lsf, condor, pbs).

For example, to start the event generator program to monitor an LSF batch system:

    $GLOBUS_LOCATION/sbin/globus-job-manager-event-generator -s lsf

NOTE: if the globus-job-manager-event-generator is not running, no job state changes will be sent from any job manager program which is configured to use the Scheduler Event Generator.

Job Manager Configuration

By default, the job manager is configured to use the pre-WS GRAM script-based polling method. A new command line option (-seg) was added to the globus-job-manager program to enable using the Scheduler Event Generator-driven job state change notifications.

There are two ways to configure the job manager to use the scheduler event generator: globally, in the $GLOBUS_LOCATION/etc/globus-job-manager.conf file, or on a per-service basis in the service entry file in the $GLOBUS_LOCATION/etc/grid-services directory.

Global Job Manager Configuration

To enable using the Scheduler Event Generator interface for all Job Managers started from a particular GLOBUS_LOCATION, add a line containing the string

-seg

to the file $GLOBUS_LOCATION/etc/globus-job-manager.conf.

EXAMPLE $GLOBUS_LOCATION/etc/globus-job-manager.conf:

        -home "/opt/globus"
        -globus-gatekeeper-host globus.yourdomain.org
        -globus-gatekeeper-port 2119
        -globus-gatekeeper-subject "/O=Grid/OU=Your Organization/CN=host/globus.yourdomain.org"
        -globus-host-cputype i686
        -globus-host-manufacturer pc
        -globus-host-osname Linux
        -globus-host-osversion 2.6.10
        -save-logfile on_error
        -state-file-dir /opt/globus/tmp/gram_job_state
        -machine-type unknown
        -seg

Scheduler-specific Job Manager Configuration

To enable using the Scheduler Event Generator interface for a particular Job Manager, add the string -seg to the end of the line in the service's file in the $GLOBUS_LOCATION/etc/grid-services directory.

EXAMPLE $GLOBUS_LOCATION/etc/grid-services/jobmanager-lsf:

stderr_log,local_cred - /opt/globus/libexec/globus-job-manager globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf -type lsf -rdn jobmanager-lsf -machine-type unknown -publish-jobs -seg

NOTE: The Job Manager does not support using the Scheduler Event Generator fork Job Managers. if the -seg option is passed to a fork Job Manager, it will be ignored.

globus-job-manager-event-generator Configuration

The globus-job-manager-event-generator program requires that the globus_job_manager_event_generator setup package be installed and run. This setup package creates the $GLOBUS_LOCATION/etc/globus-job-manager-seg.conf file and initializes a directory to use for the scheduler logs.

By default, this setup script will create a configuration entry and directory for each scheduler installed on the system. For each scheduler to be handled by the globus-job-manager-event-generator program, there must be an entry in the file in the pattern:

<SCHEDULER_TYPE>_log_path=<PATH>

The two variable substitutions for this pattern are

SCHEDULER_TYPE
Must match the name of the scheduler-event-generator module for the scheduler (supported with GT 4.0 are lsf, condor, and pbs).
PATH
A path to a directory which must be writable by the account which will run the globus-job-manager-event-generator program for the SCHEDULER_TYPE, and world-readable (or readable for a group which contains all users which will run jobs via GRAM on that system). Each directory specified in the configuration file must be unique, or behavior is undefined.

EXAMPLE $GLOBUS_LOCATION/etc/globus-job-manger-seg.conf:

    lsf_log_path=/opt/globus/var/globus-job-manager-seg-lsf
    pbs_log_path=/opt/globus/var/globus-job-manager-seg-pbs

In this example, pbs and lsf schedulers are configured to use distinct subdirectories of the /opt/globus/var/ directory.

NOTE: For best performance, the log paths should be persistent across system reboots and mounted locally (non-networked).

NOTE: If a scheduler is added after the configuration step is done, administrator must rerun the setup package's script ($GLOBUS_LOCATION/setup/globus/setup-seg-job-manager.pl) or modify the configuration file and create the required directory with appropriate permissions.

Running the globus-job-manager-event-generator

The globus-job-manager-event-generator must be running when jobs are submitted to the Job Manager if job state changes are to be detected. One instance of the globus-job-manager-event-generator program must be running for each scheduler type which is handled by a Job Manager and configured to use the Scheduler Event Generator interface.

The command line for the globus-job-manager-event-generator program is globus-job-manager-event-generator -s SCHEDULER_TYPE. The SCHEDULER_TYPE should match the pattern of a log_path entry in the $GLOBUS_LOCATION/etc/globus-job-manager-seg.conf as described above.

NOTE: Remember, if your scheduler logs have restrictive permissions, then this script must be run by an account which has privileges to read those files.

NOTE: Old log files created by the globus-job-manager-event-generator script may be deleted if the administrator is certain that there are no jobs which will restart and require the old information. The names of the log files correspond to the dates when the events occurred. If there is at least one log file in the directory, then when the globus-job-manager-event-generator is restarted, it will resume logging from the timestamp of the newest event in that log file.

Troubleshooting the globus-job-manager-event-generator

PROBLEM: The globus-job-manager-event-generator program terminates immediately with the output:
        Error: SCHEDULER not configured
        

SOLUTION 1: Make sure that you specified the correct name for the SCHEDULER module on the command line to the globus-job-manager-event-generator program

SOLUTION 2: There is no entry for lsf in the $GLOBUS_LOCATION/etc/globus-job-manager-seg.conf file. See the section on globus-job-manager-event-generator Configuration.

PROBLEM: The globus-job-manager-event-generator program terminates immediately with the output:
        Fault: globus_xio: Operation was canceled
        

SOLUTION: The scheduler module selected on the command line could not be loaded by the Globus Scheduler Event Generator. Check that the name is correct, the module is installed, and the setup script for that module has been run.

PROBLEM: The Job Manager never receives any events from the scheduler.

SOLUTION 1: Verify that the directory specified in the $GLOBUS_LOCATION/etc/globus-job-manager-seg.conf for the scheduler exists, is writable by the account running the globus-job-manager-event-generator and is readable by the user account running the job manager.

SOLUTION 2: Verify that the globus-job-manager-event-generator program is running.

SOLUTION 3: Verify that the globus-job-manager-event-generator program has permissions to read the scheduler logs. To help diagnose this, run (as the account you wish to run the globus-job-manager-event-generator as) the command

        $GLOBUS_LOCATION/libexec/globus-scheduler-event-generator -s <SCHEDULER_TYPE> -t 1 
        

You should see events printed to the stdout of that process if it is working correctly.