ALCF Swift/Turbine Tutorial

Introduction

This document describes a technique to run ordinary C codes at large-scale on ALCF machines using Swift/T scripts. The scheme of integrating the C code into Swift/T described here is called main-function wrapping. It enables C main() programs to be defined as app() functions callable by Swift/T. Arguments to these wrapped main functions are passed as an array of strings (argv).

This approach enables C programs (and in fact, programs in almost any compiled language) to be called as a function rather than executed via a POSIX fork()/exec() invocation. This permits apps to be called from Swift/T on the Blue Gene/Q (whose compute node kernels do not support fork/exec), and is more efficient than fork/exec even on systems that do support that invocation mechanism.

A package that demonstrates this technique can be obtained from svn:

svn co https://svn.mcs.anl.gov/repos/exm/apps/main-wrap main-wrap
cd main-wrap

If you do not have an MCS login, download the latest snapshot via wget:

wget http://swift-lang.org/guides/T/main-wrap.tgz
tar zxvf main-wrap.tgz

This directory contains an example application and utilities to generate the required wrapped components (stubs and object files) from C source codes.

Quickstart

A synopsis of the steps we’ll be describing here is:

 ./gendata 100 5
 ./genleaf vesta-gcc mockdock.[ch] user.swift
 ./run-cobalt.sh vesta-gcc 32
 cat work/output.txt

Definitions

Swift: The Swift language
Swift/T: The new, high-performance version of Swift that runs under MPI
Turbine: The run time system for Swift/T (hence /T)
Leaf function: An external foreign language function or application that is called from Swift

Sample application: mockdock

This package contains an example docking program called mockdock.c and associated header file called mockdock.h. The program accepts two files as input (representing a protein file and peptide file), and returns a number. These files are read by mockdock, but their contents is ignored - they can contain anything. The program prints a single integer on standard out, based on the length of the input files. The code is as follows:

#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include "mockdock.h"

int main(int argc, char** argv){

    assert(argc == 4);
    char* protfile = argv[1];
    char* peptfile = argv[2];
    int   runtime  = atoi(argv[3]);

    int fd1 = open(protfile, O_RDONLY);
    if (fd1 < 0){
      printf("could not open: %s\n", protfile);
      exit(1);
    }

    int fd2 = open(peptfile, O_RDONLY);
    if (fd2 < 0){
      printf("could not open: %s\n", peptfile);
      exit(1);
    }

    printf("result number: %d\n", dock(fd1, fd2));

    close(fd1);
    close(fd2);

    return 0;
}

size_t maxsize=10*1024*1024;

int dock(int fd1, int fd2){

  char* buf = malloc(maxsize);

  int len1 = read(fd1, buf, maxsize);
  assert(len1 >= 0);

  int len2 = read(fd2, buf, maxsize);
  assert(len2 >= 0);

  return(len1 * 1000000 + len2);
}

Sample data for the above program can be generated using the provided gendata script:

./gendata <numpept> <numentries>

In the above commandline, <numentries> is the number of generated peptide files, <numpept> is the number of lines in a single peptide file. The generated peptide files will be in a directory called peptides and the generated protein file will be prot.txt.

For instance, to generate 1000 peptide files with 50 lines in each file, run:

./gendata 1000 50

Generate and compile the wrapped-main() leaf function

The process of generating leaf function involves the following steps, which are automated by the script genleaf for simple apps:

Create a copy of the user code and replace the main function with a Swift-accessible entry point, called leaf_main.
Add this leaf_main to the user’s header file.
Create a Tcl stub that enables the leaf function to be called by the Swift runtime.
Add the stub definition to the user’s Swift script source.
Compile C source to build object code and integrate it into a single shared object library using the C compiler.

The genleaf script takes a machine-type and the user’s C and header files as input and produces a Swift script called user-code.swift. The machine type specifies the BG/Q system name and the compiler to use: (vesta|mira)-(gcc|xlc).

To run genleaf on a typical ("vanilla") Linux system:

./genleaf -v vanilla <csource.c> <cheader.h> <source.swift>

To run on Vesta with gcc:

./genleaf -v vesta-gcc <csource.c> <cheader.h> <source.swift>

To run on Mira with xlc:

./genleaf -v mira-xlc <csource.c> <cheader.h> <source.swift>

For example, on Mira, to generate the required shared objects and Swift code for the mockdock example:

./genleaf -v mira-gcc mockdock.c mockdock.h mockdock.swift

This will generate multiple files, including the modified user-code.swift, some Tcl code, object (.o) and a shared-object file (.so) . In this tutorial example, the user.swift script is:

/**
 * USER.SWIFT
 * The user may make arbitrary edits to this file
 * */

import io;
import string;

// This token will be substituted out for the linkage to the C code:
mainapp;

main{
   printf("running Swift...");
   foreach i in [1:10]{
     leaf_main(["prot.txt", sprintf("peptides/pept%i.txt", i), "2"]);
  }
}

The three strings in the array passed to leaf_main() will become, at the C level, the strings in argv.

Run the application

On a vanilla Linux system, run:

turbine user-code.tcl

For Vesta or Mira, this command will submit a Cobalt job to run the generated user-code.swift script:

./run-cobalt.sh vesta-gcc <PROCS>

…where the machine type is the same as was specified for genleaf - (vesta|mira)-(gcc|xlc), and PROCS is the number of processes. (PROCS must be at least 512 on Mira.)

The output appears in ./work/output.txt .

Appendix: Further options

Custom configuration can be set for Swift/Turbine or BG/Q via a configuration file. In our example, this file is named cf . It sets environment variables such as queues, project names, and Turbine settings:

export MODE=BGQ
export WALLTIME=00:10:00
export PROJECT=ExM
export PPN=16 # Processes per node
export QUEUE=default
export ADLB_PRINT_TIME=1
export TURBINE_LOG=0
export TURBINE_DEBUG=0
export TURBINE_ENGINES=1
export ADLB_SERVERS=1

To run the generated Turbine object code on ALCF machines, run-cobalt.sh adds Turbine to you PATH. To set this and related properties in your environment, do:

$TURBINE_SCRIPTS/turbine-cobalt-run.zsh -n <nproc> -s ./cf user-code.tcl

…where <nproc> is the number of MPI processes.

This will submit a nprocs/16 node Cobalt job. Outputs will be sent to the directory designated by the TURBINE_OUTPUT environment variable.

The standard output and error from the Swift run will be located in $TURBINE_OUTPUT/output.txt.

More detailed documentation can be found here.

Appendix: Benchmark results on Mira

Using the genleaf technique with the mockdock example as described above, a benchmark study was done on Mira for up to 1 million wrapped-main application invocations. Each app task invocation ("Tasks" column) ran for 30 seconds of real time. The number of MPI processes (on process per CPU core) was set to process 16 waves of 30-second tasks. The number of load-balancing and task-distribution servers in each run ("Servers" column) was incremented as the number of tasks increased. Performance results of this study are shown in Table 1:

Table 1. Performance figures
Tasks	Cores (Nodes)	Servers	Run time (sec)
1,000,000	62,756 (3923)	256	556.021
256,000	16,032 (1002)	32	492.150
512,000	32,064 (2004)	64	501.608
768,000	48,128 (3008)	128	495.333