The Fifth International Workshop on
Accelerators and Hybrid Exascale Systems (AsHES)
Accelerators and Hybrid Exascale Systems (AsHES)
Join us on May 25th, 2015 in Hyderabad, India
To be held in conjunction with
IPDPS 2015: IEEE International Parallel and Distributed Processing Symposium
To be held in conjunction with
IPDPS 2015: IEEE International Parallel and Distributed Processing Symposium
Opening remarks
8:30 - 8:45 am
Satoshi Matsuoka
Keynote by Michela Taufer -- The Numerical Reproducibility Fair Trade: Facing the Concurrency Challenges at the Extreme Scale
8:45 - 9:50 am
Abstract:
Trends in execution concurrency on accelerated platforms make a compelling case
for developing methods that can automatically and efficiently model and
mitigate numerical irreproducibility beyond petascale and into exascale.
High-performance accelerated computers at the extreme scale exhibit an enormous
level of concurrency—a factor of 10,000 greater than on traditional
platforms—that is moving computer simulations from bulk-synchronous executions
to nondeterministic multithreading calculations and asynchronous I/O. As
concurrency levels in simulations increase, the impact of rounding errors on
numerical reproducibility is also exacerbated, ultimately affecting the ability
of scientific simulations to reproduce program executions and numerical
results. Under these circumstances, irreproducible results may not be trusted
by a scientific community expecting reproducible behaviors; and any attempt to
pursue reproducibility may come at a cost in performance that is too high.
In this talk we discuss the impact of rounding errors on result reproducibility
when concurrent executions burst and workflow determinism vanishes on
cutting-edge accelerated platforms. We unveil the power of mathematical methods
to model rounding errors in scientific applications and discuss how these
methods can mitigate error drifting on new generations of accelerators.
Specifically, we focus on floating-point error accumulations for global
summations for which any reduction order is too expensive or even impossible to
enforce at the extreme scale from run to run. We model summations as reduction
trees and identify those parameters that can be used to estimate the
reduction's sensitivity to variability in a reduction tree. We assess the
impact of these parameters on the ability of different reduction methods based
on compensated summation (e.g., composite-precision summation) and
“distillation" algorithms (e.g., prerounding) to mitigate errors. Our results
illustrate the pressing need for intelligent runtime selection of reduction
operators that ensure a given degree of reproducibility.
Bio:
Michela Taufer is the David L. and Beverly J.C. Mills Chair of Computer and
Information Sciences and an associate professor in the same department at the
University of Delaware. She earned her master’s degrees in Computer
Engineering from the University of Padova (Italy) and her doctoral degree in
Computer Science from the Swiss Federal Institute of Technology (Switzerland).
From 2003 to 2004 she was a La Jolla Interfaces in Science Training Program
(LJIS) Postdoctoral Fellow at the University of California San Diego (UCSD) and
The Scripps Research Institute (TSRI), where she worked on interdisciplinary
projects in computer systems and computational chemistry. From 2005 to 2007,
she was an Assistant Professor at the Computer Science Department of the
University of Texas at El Paso (UTEP). She joined the University of Delaware
in 2007 as an Assistant Professor and was promoted to Associate Professor with
tenure in 2012.
Taufer's research interests include scientific applications and their advanced
programmability in heterogeneous computing (i.e., multi-core and many-core
platforms, GPUs); performance analysis, modeling, and optimization of
multi-scale applications on heterogeneous computing, cloud computing, and
volunteer computing; numerical reproducibility and stability of large-scale
simulations on multi-core platforms; big data analytics and MapReduce.
Break 9:50 - 10:30 am
Session 1: Accelerating Analytics
10:30 am - 12:00 pm
Chair: Sriram Krishnamoorthy
-
Towards A Combined Grouping and Aggregation Algorithm for Fast Query Processing in Columnar Databases with GPUs
Sina Meraji, Sunil Kamath, John Keenleyside and Bob Blainey [ slides ] -
Implementation of CG Method on GPU Cluster with Proprietary Interconnect TCA for GPU Direct Communication
Kazuya Matsumoto, Toshihiro Hanawa, Yuetsu Kodama, Hisafumi Fujii and Taisuke Boku [ slides ]
Lunch 12:00 - 1:30 pm
Session 2: Algorithm Design for Heterogeneous Systems
1:30 - 3:30 pm
Chair: Min Si
-
GPGPU-based Parallel R-tree Construction and Querying
Sushil K. Prasad, Michael McDermott, Xi He and Satish Puri -
Fast Burrows Wheeler Compression Using All-Cores
Aditya Deshpande and P J Narayanan [ slides ] -
A Novel Heterogeneous Algorithm for Multiplying Scale-Free Sparse Matrices
Kiran Raj Ramamoorthy, Dip Sankar Banerjee, Kannan Srinathan and Kishore Kothapalli [ slides ] -
GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems
Dipanjan Sengupta, Kapil Agarwal, Shuaiwen Song and Karsten Schwan [ slides ] -
Graph Coloring on the GPU and Some Techniques to Improve Load Imbalance
Shuai Che, Gregory Rodgers, Brad Beckmann and Steve Reinhardt [ slides ]