Introduction
This document describes a technique to run ordinary C codes at large-scale on ALCF machines using Swift/T scripts. The scheme of integrating the C code into Swift/T described here is called main-function wrapping. It enables C main() programs to be defined as app() functions callable by Swift/T. Arguments to these wrapped main functions are passed as an array of strings (argv).
This approach enables C programs (and in fact, programs in almost any compiled language) to be called as a function rather than executed via a POSIX fork()/exec() invocation. This permits apps to be called from Swift/T on the Blue Gene/Q (whose compute node kernels do not support fork/exec), and is more efficient than fork/exec even on systems that do support that invocation mechanism.
A package that demonstrates this technique can be obtained from
svn
:
svn co https://svn.mcs.anl.gov/repos/exm/apps/main-wrap main-wrap
cd main-wrap
If you do not have an MCS login, download the latest snapshot via wget
:
wget http://swift-lang.org/guides/T/main-wrap.tgz
tar zxvf main-wrap.tgz
This directory contains an example application and utilities to generate the required wrapped components (stubs and object files) from C source codes.
Quickstart
A synopsis of the steps we’ll be describing here is:
./gendata 100 5
./genleaf vesta-gcc mockdock.[ch] user.swift
./run-cobalt.sh vesta-gcc 32
cat work/output.txt
Definitions
- Swift
-
The Swift language
- Swift/T
-
The new, high-performance version of Swift that runs under MPI
- Turbine
-
The run time system for Swift/T (hence /T)
- Leaf function
-
An external foreign language function or application that is called from Swift
Sample application: mockdock
This package contains an example docking program called mockdock.c
and associated header file called mockdock.h
. The program accepts
two files as input (representing a protein file and peptide file), and
returns a number. These files are read by mockdock, but their contents
is ignored - they can contain anything. The program prints a single
integer on standard out, based on the length of the input files. The
code is as follows:
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include "mockdock.h"
int main(int argc, char** argv){
assert(argc == 4);
char* protfile = argv[1];
char* peptfile = argv[2];
int runtime = atoi(argv[3]);
int fd1 = open(protfile, O_RDONLY);
if (fd1 < 0){
printf("could not open: %s\n", protfile);
exit(1);
}
int fd2 = open(peptfile, O_RDONLY);
if (fd2 < 0){
printf("could not open: %s\n", peptfile);
exit(1);
}
printf("result number: %d\n", dock(fd1, fd2));
close(fd1);
close(fd2);
return 0;
}
size_t maxsize=10*1024*1024;
int dock(int fd1, int fd2){
char* buf = malloc(maxsize);
int len1 = read(fd1, buf, maxsize);
assert(len1 >= 0);
int len2 = read(fd2, buf, maxsize);
assert(len2 >= 0);
return(len1 * 1000000 + len2);
}
Sample data for the above program can be generated using the provided
gendata
script:
./gendata <numpept> <numentries>
In the above commandline, <numentries>
is the number of generated
peptide files, <numpept>
is the number of lines in a single peptide
file. The generated peptide files will be in a directory called
peptides
and the generated protein file will be prot.txt
.
For instance, to generate 1000
peptide files with 50 lines in each
file, run:
./gendata 1000 50
Generate and compile the wrapped-main() leaf function
The process of generating leaf function involves the following steps,
which are automated by the script genleaf
for simple apps:
-
Create a copy of the user code and replace the main function with a Swift-accessible entry point, called
leaf_main
. -
Add this
leaf_main
to the user’s header file. -
Create a Tcl stub that enables the leaf function to be called by the Swift runtime.
-
Add the stub definition to the user’s Swift script source.
-
Compile C source to build object code and integrate it into a single shared object library using the C compiler.
The genleaf
script takes a machine-type and the user’s C and header
files as input and produces a Swift script called user-code.swift
.
The machine type specifies the BG/Q system name and the compiler to
use: (vesta|mira)-(gcc|xlc).
To run genleaf
on a typical ("vanilla") Linux system:
./genleaf -v vanilla <csource.c> <cheader.h> <source.swift>
To run on Vesta with gcc:
./genleaf -v vesta-gcc <csource.c> <cheader.h> <source.swift>
To run on Mira with xlc:
./genleaf -v mira-xlc <csource.c> <cheader.h> <source.swift>
For example, on Mira, to generate the required shared objects and
Swift code for the mockdock
example:
./genleaf -v mira-gcc mockdock.c mockdock.h mockdock.swift
This will generate multiple files, including the modified
user-code.swift
, some Tcl code, object (.o
) and a shared-object
file (.so
) . In this tutorial example, the user.swift
script
is:
/**
* USER.SWIFT
* The user may make arbitrary edits to this file
* */
import io;
import string;
// This token will be substituted out for the linkage to the C code:
mainapp;
main{
printf("running Swift...");
foreach i in [1:10]{
leaf_main(["prot.txt", sprintf("peptides/pept%i.txt", i), "2"]);
}
}
The three strings in the array passed to leaf_main()
will become, at the C level, the strings in argv
.
Run the application
On a vanilla Linux system, run:
turbine user-code.tcl
For Vesta or Mira, this command will submit a Cobalt job to run the generated user-code.swift script:
./run-cobalt.sh vesta-gcc <PROCS>
…where the machine type is the same as was specified for genleaf -
(vesta|mira)-(gcc|xlc), and PROCS
is the number of processes.
(PROCS
must be at least 512 on Mira.)
The output appears in ./work/output.txt
.
Appendix: Further options
Custom configuration can be set for Swift/Turbine or BG/Q via a configuration
file. In our example, this file is named cf
. It sets environment variables
such as queues, project names, and Turbine settings:
export MODE=BGQ
export WALLTIME=00:10:00
export PROJECT=ExM
export PPN=16 # Processes per node
export QUEUE=default
export ADLB_PRINT_TIME=1
export TURBINE_LOG=0
export TURBINE_DEBUG=0
export TURBINE_ENGINES=1
export ADLB_SERVERS=1
To run the generated Turbine object code on ALCF machines,
run-cobalt.sh
adds Turbine to you PATH
. To set this and related
properties in your environment, do:
$TURBINE_SCRIPTS/turbine-cobalt-run.zsh -n <nproc> -s ./cf user-code.tcl
…where <nproc>
is the number of MPI processes.
This will submit a nprocs/16 node
Cobalt job. Outputs will be sent
to the directory designated by the TURBINE_OUTPUT
environment
variable.
The standard output and error from the Swift run will be located in
$TURBINE_OUTPUT/output.txt
.
More detailed documentation can be found here.
Appendix: Benchmark results on Mira
Using the genleaf
technique with the mockdock
example as described above, a
benchmark study was done on Mira for up to 1 million wrapped-main application
invocations. Each app task invocation ("Tasks" column) ran for 30 seconds of
real time. The number of MPI processes (on process per CPU core) was set to
process 16 waves of 30-second tasks. The number of load-balancing and
task-distribution servers in each run ("Servers" column) was incremented as the
number of tasks increased. Performance results of this study are shown in Table
1:
Tasks | Cores (Nodes) | Servers | Run time (sec) |
---|---|---|---|
1,000,000 |
62,756 (3923) |
256 |
556.021 |
256,000 |
16,032 (1002) |
32 |
492.150 |
512,000 |
32,064 (2004) |
64 |
501.608 |
768,000 |
48,128 (3008) |
128 |
495.333 |