Technical Program

Tuesday * Wednesday * Thursday * Friday

8:00-10:00 FTS 2015 HUCAA HPCMASPA 2015 WRAp HiPINEB
Coffee Break
10:30-12:30 FTS 2015 HUCAA HPCMASPA 2015 WRAp HiPINEB
Lunch Break
14:00 – 15:30 Student
(Part I)
Coffee Break
16:00-18:00 Student
(Part II)
8:45 – 9:00 Opening Session
9:00 – 10:00 Keynote: Al Gara (Intel)
10:00 – 10:30
Coffee Break
10:30 – 12:30 Session 1: Best Paper Candidates
12:30 – 14:00 Student Mentoring Program (Part III):
Master Class on Paper Presentation
Lunch Break
14:00 – 15:00 Industry Session: Cray Session 3:
Big Data Processing
15:00 – 15:30
15:30 – 16:00
Coffee Break
16:00 – 18:00 Session 4:
GPU Computing
Session 5:
Machine Learning and Data Mining
18:15 – 20:00 Poster Reception
8:45 – 9:00 Presentation of CLUSTER 2016
9:00 – 10:00 Keynote: Marc Snir (Argonne National Laboratory)
10:00 – 10:30
Coffee Break
10:30 – 12:30 Session 6:
Resilience and Reliability
Session 7:
High Performance I/O
12:30 – 14:00
Lunch Break
14:00 – 15:00 Session 8:
MPI and OpenCL
Session 9:
Distributed Data Processing
15:00 – 15:30
15:30 – 16:00
Coffee Break
16:00 -17:00 Session 10: Energy Efficiency Session 11: Graph Processing
17:00 – 18:00 Panel: Emerging Exascale Problems and Solutions
18:15 – 20:00 Awards Presentation
Gala Dinner
9:00 – 10:00 Keynote: William Harrod (DOE ASCR)
10:00 – 10:30
Coffee Break
10:30 – 12:30 Session 12:
Application Acceleration
Session 13: Network and High Performance Communication
12:30 – 14:00
Lunch Break
14:00 – 15:30 Session 14:
Parallel Algorithms
Session 15:
Task and Process Scheduling
15:30 – 16:00
Coffee Break
16:00 -17:30 Session 16: PGAS and Shared Memory Programming Session 17:
Cluster Tools
HUCAA: Heterogeneous and Unconventional Cluster Architectures and Applications. (8:15AM – 12:30PM, Room: Superior II)

HiPINEB: High-Performance Interconnection Networks Towards the Exascale and Big-Data Era. (8:15AM – 6:00PM, Room: Superior III)

FTS 2015: 1st International Workshop on Fault Tolerant Systems. (8:30AM – 12:30PM, Room: Superior I)

HPCMASPA 2015: 2nd Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications. (8:30AM – 5:30PM, Room: Huron)

WRAp: Workshop on Representative Applications. (9:00AM – 5:00PM, Room: Michigan)

Campus Bridging: Reducing Obstacles on the Path to Big Answers. (1:30PM – 6:00PM, Room: Superior II)

Student Mentoring Program: Master Class on Career Challenges ans Poster Presentation. (2:00PM – 5:30PM, Room: Superior I)


Session 1: Best Paper Candidates-(Room: LaSalle Ballroom)
Chair: Satoshi Matsuoka (Tokyo Inst. Technology)

  • Parallel Modularity-based Community Detection on Large-scale Graphs
    Jianping Zeng and Hongfeng Yu
  • Optimizing Explicit Hydrodynamics for Power, Energy, and Performance
    Edgar A. Leon, Ian Karlin and Ryan E. Grant (Best Paper)
  • Machines Tuning Machines: Configuring Distributed Stream Processors with Bayesian Optimization
    Lorenz Fischer, Shen Gao and Abraham Bernstein
  • Workload-Aware Resource Reservation for Multi-Tenant NoSQL
    Jiaan Zeng and Beth Plale

Session 3: Big Data Processing-(Room: LaSalle Ballroom II)
Chair: Gabriel Antoniu (INRIA)

  • Taming Non-local Stragglers using Efficient Prefetching in MapReduce
    Ze Yu, Min Li, Xin Yang, Han Zhao and Xiaolin Li
  • IOSIG+: on the Role of I/O Tracing and Analysis for Hadoop Systems (short paper)
    Bo Feng, Xi Yang, Kun Feng, Yanlong Yin and Xian-He Sun
  • Exploring Memory Hierarchy to Improve Scientific Data Read Performance (short paper)
    Wenzhao Zhang, Houjun Tang, Xiaocheng Zou, Steven Harenberg, Qing Liu, Scott Klasky and Nagiza Samatova
  • S-Shuffle: Optimizing Big Data Analytical Stacks on Large Clusters using Structured Data Shuffling (short paper)
    Dixin Tang, Taoying Liu, Rubao Lee, Hong Liu and Wei Li
  • An SSD-HDD Integrated Storage Architecture for Write-Once-Read-Once Applications on Clusters (short paper)
    Cailiang Xu, Wei Wang, Deng Zhou and Tao Xie

Session 4: GPU Computing-(Room: LaSalle Ballroom I)
Chair: Naoya Maruyama (Riken AICS)

  • Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters
    Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu and Dhabaleswar Panda
  • Improving Strong-Scaling on GPU Cluster Based on Tightly Coupled Accelerators Architecture (short paper)
    Toshihiro Hanawa, Hisafumi Fujii, Norihisa Fujita, Tetsuya Odajima, Kazuya Matsumoto, Yuetsu Kodama and Taisuke Boku
  • Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA (short paper)
    Adrian Castello, Antonio J. Pena, Rafael Mayo Gual, Pavan Balaji and Enrique S. Quintana-Orti
  • PLB-HeC: A Profile-based Load-Balancing algorithm for Heterogeneous CPU-GPU Clusters
    Luis Sant Ana, Daniel Cordeiro and Raphael Camargo
  • A TSQR based Krylov Basis Computation Method on Hybrid GPU Cluster (short paper)
    Langshi Chen and Serge Petiton
  • Detecting Thread-Safety Violations in Hybrid OpenMP/MPI Programs (short paper)
    Hongyi Ma and Liqiang Wang

Session 5: Machine Learning and Data Mining-(Room: LaSalle Ballroom II)
Chair: Osamu Tatebe (Univ. Tsukuba)

  • Fast and Accurate Support Vector Machines on Large Scale Systems
    Abhinav Vishnu, Jeyanthi Narasimhan, Lawrence Holder, Darren Kerbyson and Adolfy Hoisie
  • A Machine-Learning Approach for Communication Prediction of Large-Scale Applications (short paper)
    Nikela Papadopoulou, Georgios Goumas and Nectarios Koziris
  • An Efficient Parallel Approach of Parsing and Indexing for Large-scale XML Datasets (short paper)
    Song Kunfang and Lu Hongwei
  • Collective I/O Tuning Using Analytical and Machine Learning Models
    Florin Isaila, Prasanna Balaprakash, Stefan Wild, Robert Latham, Dries Kimpe, Robert Ross and Paul Hovland
  • Large Scale Frequent Pattern Mining using MPI One-Sided Model
    Abhinav Vishnu and Khushbu Agarwal


Session 6: Resilience and Reliability-(Room: LaSalle Ballroom I)
Chair: Bronis R. de Supinski (LLNL)

  • Fast Fault Injection and Sensitivity Analysis for Collective Communications
    Kun Feng, Manjunath Gorentla Venkata, Dong Li and Xian-He Sun
  • A Practical Approach for Handling Soft Errors in Iterative Applications (short paper)
    Jiaqi Liu, Mehmet Can Kurt and Gagan Agrawal
  • Ensuring Data Durability with Increasingly Interdependent Content (short paper)
    Veronica Del Carmen Estrada Galinanes and Pascal Felber
  • On the Need for Reproducible Numerical Accuracy through Intelligent Runtime Selection of Reduction Algorithms at the Extreme Scale
    Dylan Chapp, J. Travis Johnston and Michela Taufer
  • Towards Building Resilient Scientific Applications: Resilience Analysis on the Impact of Soft Error and Transient Error Tolerance With the CLAMR Hydrodynamics Mini-App (short paper)
    Qiang Guan, Nathan Debardeleben, Brian Atkinson, Robert Robey and William Jones
  • DINO: Divergent Node Cloning for Sustained Redundancy in HPC (short paper)
    Arash Rezaei and Frank Mueller

Session 7: High Performance I/O-(Room: LaSalle Ballroom II)
Chair: D.K. Panda (Ohio State U.)

  • Dynamic Model-driven Parallel I/O Performance Tuning
    Babak Behzad, Surendra Byna, Stefan Wild, Mr Prabhat and Marc Snir
  • TRIO: Burst Buffer Based I/O Orchestration
    Teng Wang, Sarp Oral, Michael Pritchard, Bin Wang and Weikuan Yu
  • BPS: A Balanced Partial Stripe Write Scheme to Improve the Performance of RAID-6
    Congjin Du, Chentao Wu, Jie Li, Minyi Guo and Xubin He
  • RDMA-based Direct Transfer of File Data to Remote Page Cache
    Shin Sasaki, Kazushi Takahashi, Yoshihiro Oyama and Osamu Tatebe

Session 8: MPI and OpenCL-(Room: LaSalle Ballroom I)
Chair: Pavan Balaji (ANL)

  • High Performance MPI Datatype Support with User-mode Memory Registration: Challenges, Designs and Benefits
    Mingzhe Li, Hari Subramoni, Khaled Hamidouche, Xiaoyi Lu and Dhabaleswar Panda
  • Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL
    Ashwin Aji, Antonio J. Pena, Pavan Balaji and Wu-Chun Feng

Session 9: Distributed Data Processing-(Room: LaSalle Ballroom II)
Chair: Zhiling Lan (IIT)

  • Overcoming Hadoop Scaling Limitations through Distributed Task Execution
    Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, Xaiobing Zhou, Michael Lang, Xian-He Sun and Ioan Raicu
  • SideWalk: A Facility of Lightweight Out-of-band Communications for Augmenting Distributed Data Processing Flows (short paper)
    Yin Huai, Yuan Yuan, Rubao Lee and Xiaodong Zhang
  • High-performance, Distributed Dictionary Encoding of RDF Datasets (short paper)
    Alessandro Morari, Jesse Weaver, Oreste Villa, David Haglin, Vito Giovanni Castellana, Antonino Tumeo and John Feo
  • I/O-Aware Batch Scheduling for Petascale Computing Systems
    Zhou Zhou, Xu Yang, Dongfang Zhao, Paul Rich, Wei Tang, Jia Wang and Zhiling Lan

Session 10: Energy Efficiency-(Room: LaSalle Ballroom I)
Chair: Masaaki Kondo (Univ. Tokyo)

  • Performance-to-Power Ratio Aware Virtual Machine (VM) Allocation in Energy-Efficient Clouds
    Xiaojun Ruan and Haiquan Chen
  • A Workload-Aware Energy Model for Virtual Machine Migration
    Vincenzo De Maio, Gabor Kecskemeti and Radu Prodan

Session 11: Graph Processing-(Room: LaSalle Ballroom II)
Chair: Seetharami Seelam (IBM)

  • GraphTrek: Asynchronous Graph Traversal for Property Graph Based Metadata Management
    Dong Dai, Philip Carns, Robert Ross, John Jenkins, Kyle Blauer and Yong Chen
  • Towards Multi-Site Metadata Management for Geographically Distributed Cloud Workflows
    Luis Pineda-Morales, Alexandru Costan and Gabriel Antoniu


Session 12: Application Acceleration-(Room: LaSalle Ballroom I)
Chair: Tom Peterka (ANL)

  • PaRSEC in Practice: Optimizing a legacy Chemistry application through distributed task-based execution
    Anthony Danalis, Heike McCraw, George Bosilca and Jack Dongarra
  • Optimizing Large Scale I/O for Petascale Seismic Simulations on Unstructured Meshes (short paper)
    Sebastian Rettenberger and Michael Bader
  • LBM-HPC an Open-Source Tool for Fluid Simulations. Case Study: Unified Parallel C (UPC-PGAS) (short paper)
    Pedro Valero-Lara and Johan Jansson
  • Scaling Data Intensive Physics Applications to 10k Cores on Non-Dedicated Clusters with Lobster
    Anna Woodard, Matthias Wolf, Charles Mueller, Nil Valls, Ben Tovar, Patrick Donnelly, Peter Ivie, Kenyi Hurtado Anampa, Paul Brenner, Doug Thain, Kevin Lannon and Michael Hildreth
  • RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Data
    Mucahid Kutlu and Gagan Agrawal

Session 13: Network and High Performance Communication-(Room: LaSalle Ballroom II)
Chair: Ron Brightwell (Sandia National Laboratories)

  • Re-evaluating Network Onload vs. Offload for the Many-Core Era
    Matthew Dosanjh, Ryan Grant, Patrick Bridges and Ron Brightwell
  • Fast Calculation of Max-min Fair Rates for Multi-commodity Flows in Fat-tree Networks
    Md Atiqul Mollah, Xin Yuan, Scott Pakin and Michael Lang
  • Comparing Global Link Arrangements for Dragonfly Networks
    Emily Hastings, David Rincon-Cruz, Marc Spehlmann, Sofia Meyers, Anda Xu, David Bunde and Vitus Leung
  • Towards the InfiniBand SR-IOV vSwitch Architecture
    Evangelos Tasoulas, Ernst Gunnar Gran, Bjorn Dag Johnsen, Kyrre Begnum and Tor Skeie

Session 14: Parallel Algorithms-(Room: LaSalle Ballroom I)
Chair: Hajime Fujita (Intel)

  • Lossless Floating-Point Compression Algorithms for Massively Parallel Architectures
    Annie Yang, Hari Mukka, Farbod Hesaaraki and Martin Burtscher
  • Balancing Thread-level and Task-level Parallelism for Data-Intensive Workloads on Clusters and Clouds (short paper)
    Olivia Choudhury, Dinesh Rajan, Nicholas Hazekamp, Sandra Gesing, Douglas Thain and Scott Emrich
  • LU Factorization: Towards Hiding Communication Overheads With A Lookahead-free Algorithm (short paper)
    Tan Nguyen and Scott Baden
  • Distributed-Memory Algorithms for Maximal Cardinality Matching using Matrix Algebra
    Ariful Azad and Aydin Buluc

Session 15: Task and Process Scheduling-(Room: LaSalle Ballroom II)
Chair: Todd Gamblin (LLNL)

  • The Cost of Synchronizing Imbalanced Processes in Message Passing Systems
    Ivy Peng, Stefano Markidis and Erwin Laure
  • An Approach to Selecting Thread + Process Mixes for Hybrid MPI + OpenMP Applications
    Hormozd Gahvari, Martin Schulz and Ulrike Yang
  • On the Application Task Granularity and the Interplay with the Scheduling Overhead in Many-core Shared Memory Systems
    Dana Akhmetova, Gokcen Kestor, Roberto Gioiosa and Stefano Markidis

Session 16: PGAS and Shared Memory Programming-(Room: LaSalle Ballroom I)
Chair:  Abdelhalim Amer (ANL)

  • OpenSHMEM as a Portable Communication Layer for PGAS Models – A Case Study with Coarray Fortran
    Naveen Namashivayam, Deepak Eachempati, Dounia Khaldi and Barbara Chapman
  • A Team-Based Methodology of Memory Hierarchy-Aware Runtime Support in Coarray Fortran (short paper)
    Dounia Khaldi, Deepak Eachempati, Shiyao Ge, Pierre Jouvelot and Barbara Chapman
  • Optimizing Caching DSM for Distributed Software Speculation (short paper)
    Sai Charan Koduru, Keval Vora and Rajiv Gupta
  • Empirical Comparison of Three Versioning Architectures (short paper)
    Hajime Fujita, Kamil Iskra, Pavan Balaji and Andrew A. Chien

Session 17: Cluster Tools-(Room: LaSalle Ballroom II)
Chair: Ioan Raicu (IIT)

  • Toward Rapid Understanding of Production HPC Applications and Systems
    Anthony Agelastos, Benjamin Allan, Jim Brandt, Ann Gentile, Sophia Lefantzi, Steve Monk, Jeff Ogden, Mahesh Rajan and Joel Stevenson
  • ObsCon: Integrated Monitoring and Control for Parallel, Real-time Applications (short paper)
    Shwetha Mathangi Chandra Choodamani, Alan Nussbaum and Karsten Schwan
  • VecMeter: An Easy-to-use Tool to Analyze Vectorization in HPC Codes (short paper)
    Joshua Peraza, Ananta Tiwari, William Ward, Roy Campbell and Laura Carrington
  • Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing (short paper)
    Justin Wozniak, Timothy Armstrong, Ketan Maheshwari, Daniel Katz, Michael Wilde and Ian Foster