Planning Spatial Workflows to Optimize Grid Performance

TitlePlanning Spatial Workflows to Optimize Grid Performance
Publication TypeReport
Year of Publication2005
AuthorsMeyer, L, Annis, J, Wilde, M, Mattoso, M, Foster, IT
Date Published11/2005
Other NumbersANL/MCS-P1308-1105

In many scientific workflows, particularly those that operate on spatially oriented data, jobs that process adjacent regions of space often reference large numbers of files in common. Such workflows, when processed using workflow planning algorithms are are unaware of the application\'s file reference pattern, result in a huge number of redundant file transfers between Grid sites and consequently perform poorly. This work presents a generalized approach to planning spatial workflow schedules for Grid execution based on the spatial proximity of files and the spatial range of jobs. We evaluate our solution to this problem using the file access pattern of an astronomy application that performs co-addition of images from the Sloan Digital Sky Survey. We show that, in initial tests on grids of 5-25 sites, our spatial clustering approach eliminates 50-90% of the file transfers between Grid sites relative to the next-best planning algorithms we tested that were not "spatially aware." At moderate levels of concurrent file transfer, this reduction of redundant network I/O improves the application execution time by 30% to 70%, reduces Grid network and storage overhead, and is broadly applicable to a wide range of spatially oriented problems.