How to run a million jobs

December 3, 2008

At SC08, several experts organized an informal session to share information on up-and-coming solutions for expressing, managing, and executing �megajobs.� They also discussed ways of repackaging work to avoid megajobs altogether.

Here iSGTW shares the latest ideas and developments about megajobs with its readers, and plans to follow up with articles on various mentioned technologies and trends in the coming months. Contributions are welcome.

Biting off a megajob�it's a lot to chew

As large systems surpass 200,000 processors, more scientists are running �megajobs�, thousands to millions of identical or very similar, but independent, jobs executed on separate processors. From biology, physics, chemistry and mathematics to genetics, mechanical engineering, economics and computational finance, researchers want an easy way to specify and manage many jobs, arrange inputs, and aggregate outputs. They want to readily identify successful and failed jobs, repair failures, and get on with the business of research. System administrators need effective ways to process large numbers of jobs for multiple users.

Many small meals

As tools and resources change, people describe their computing jobs differently, says Ben Clifford of the University of Chicago. A decade ago, scientists submitted big jobs with big inputs and outputs. Now, due to the availability of massive processor farms, jobs tend to be numerous and small�seconds long with only kilobyte-scale data files.

Some older, well-established job management systems are extremely feature-rich, and their high overhead makes them unsuitable for executing many short jobs on many processors. Others have been developed specifically for the data-intensive, loosely-coupled, high throughput computing (HTC) grid model. These newer systems work well up to many thousands of jobs, short or long. Still others, like Swift and Gracie, which aim to scale even higher, are in their infancy.

As job management systems change, so do applications. Ioan Raicu and Ian Foster, both of the University of Chicago and Argonne National Laboratory, have defined a class of applications called Many Tasks Computing (MTC). An MTC application is composed of many tasks, both independent and dependent, that are (in Foster�s words) �communication-intensive but not naturally expressed in Message Passing Interface,� referring to a standard for setting up communications between parallel jobs. MTC applications can be individually scheduled on different computing resources to achieve some larger application goal that would otherwise require megajobs.

Some computer systems are undergoing extensions to support megajobs and MTC; for example, IBM provides a new high throughput, grid-style mode on the Blue Gene/P supercomputer. Raicu and his colleagues have implemented MTC applications on this new platform via Falkon, a fast, scalable and lightweight task execution framework from the University of Chicago.

Ask about Swift

�Users don�t start with the idea of running a million jobs,� Clifford says. �They start with some high level application, and it�s just convenient to describe it as a million jobs.� If they can break an application into separately schedulable, restartable, relocatable �application procedures�, he says, then they just need a tool to describe how the pieces connect. Then the jobs are easy to run.

Clifford has helped develop Swift, a highly scalable scripting language/engine to manage procedures composed of many loosely-coupled components that take the place of megajobs. In addition to describing jobs, Swift sports clever mechanisms to reward well-behaved, high-performing sites while penalizing slow or failing sites. It also throttles job submission as needed, and controls file transfers to ensure adequate performance.

Also in the megajob medicine cabinet are Falkon and Gracie, a framework for executing massive independent tasks in parallel on grid resources.

Indigestion? Rethink, repackage �

Some experts think that many research questions can be answered using the same grid-type computing model, but using a different approach than megajobs.

David Abramson of Monash University wants the grid community to help scientists find smarter ways to specify problems by focusing on the scientific questions in the first place. He cites his Nimrod family of tools that provides sophisticated methods to search for good solutions as opposed to all solutions. Users can ask complex questions such as �Which parameter values will minimize the output of my model?� This helps shield the user from the complexity of managing lots of independent jobs.

John McGee from the Renaissance Computing Institute notes that a number of workload management systems on Open Science Grid including PanDA, the OSG MatchMaker, and Condor DAGMan, are used extensively to repackage the total computational workload into chunks that are optimized for the infrastructure and scheduling overhead.

According to Gregor von Laszewski of the Rochester Institute of Technology, many supercomputing applications exist that manage millions of �tasks� internally. The differentiation between �tasks� and �jobs� is essential, he says, as many users do not care about the concept of a �job�. They just want to get their work done. He believes that in the future scientists will benefit from tools that allow them to bypass the technical difficulties of mapping their work to jobs�tools that transparently perform the mapping between millions of tasks and traditional queuing systems.

In the end, a balanced diet

�The answer will not be a single tool,� von Laszewski says, �but a suite of tools addressing a variety of solutions needed to simplify access to millions of science services working in coordination.� As an example, he cites the CoG Kit and its workflow engine used internally in Swift to provide much of the functionality for scalable services, noting that it also can be used directly by scientists to simplify their job management.

To simplify access and avoid congestion, the session participants recognize that more work is needed to leverage tools and services that integrate megajobs with existing queuing systems and high performance interconnects, and to allow loosely-coupled applications to communicate directly between processors. In the meantime, researchers have some remedies to choose from.

See links to more information and presentations from the session.