The Tenth International Workshop on
Accelerators and Hybrid Exascale Systems

To be held in conjunction with
34th IEEE International Parallel and Distributed Processing Symposium in New Orleans, Louisiana USA (May 18th, 2020)

AsHES 2020 Virtual Presentation

Keynote (5:30 pm CDT)

Multi-Hetero Accelerated Supercomputing: System, Programming and Applications

Taisuke Boku, Center for Computational Sciences, University of Tsukuba

Abstract: In the Exa-scale era, one of the most important and tough problems is how to enhance the sustained performance against the limited power budget. Traditional multi- or many-core general CPUs are still popular for easy programming and porting of general applications. However, it is getting to face to the limit by semiconductor technology limit, memory capacity per core, network bandwidth, etc. GPU represents the attached accelerator solution in heterogeneous computing thanks to its high peak performance ratio to power consumption, and moreover, recent progress on AI applications such as TensorFlow ready NVIDIA GPUs. However, GPU's extremely high performance is provided by wide width of data parallel computation both in instruction level and core/thread level which requires thousands of SIMD operations simultaneously executed. Many of success stories on GPU acceleration depend on their simple parallel execution and quite low rate of exception (if statements) handling. Another problem is the interconnection network which relies on CPU-bundle high performance network such as InfiniBand. In our research team has been focusing on the FPGA computation, which is one of the hot topics of new type of accelerators for HPC, however it is quite difficult to achieve a comparable performance with GPU especially for SIMD style applications. So, we think that a new generation of accelerated computing supported by multiple heterogeneous accelerator platform including several types of ones together on computation node. The first target is a combination of GPU and FPGA to provide 360-degree solution with SIMD and pipelined parallelism depending on the characteristics of each computation part of a large application. In this talk, I will introduce the current status of our Multi-Hetero Accelerated System running on University of Tsukuba, its hardware and software development, and real application with preliminary performance evaluation.

Bio: Taisuke Boku received Master and PhD degrees from Department of Electrical Engineering at Keio University. After his career as assistant professor in Department of Physics at Keio University, he joined to Center for Computational Sciences (former Center for Computational Physics) at University of Tsukuba where he is currently the director and the HPC division leader. He has been working there more than 25 years for HPC system architecture, system software, and performance evaluation on various scientific applications. In these years, he has been playing the central role of system development on CP-PACS (ranked as number one in TOP500 in 1996), FIRST, PACS-CS and HA-PACS as the representative supercomputers in Japan. He is currently the Director of Center for Computational Sciences, University of Tsukuba, and the Vice Director of JCAHPC (Joint Center for Advanced HPC) which is a joint organization by University of Tsukuba and the University of Tokyo to operate the largest KNL base cluster in Japan, Oakforest-PACS (25PFLOPS peak performance). He is also the Chair of HPCI Resource Management and Service Committee for all supercomputer resource utilization under MEXT HPCI program. He is a member of system architecture working group of Post-K Computer development. He received ACM Gordon Bell Prize in 2011.

Opening Statement

1:25 pm - 1:35 pm CDT

Min Si, Argonne National Laboratory

Session One: GPU computing

1:35 pm - 3:15 pm CDT

Session Chair: Simon Garcia de Gonzalo, Barcelona Supercomputing Center

  • Towards automated kernel selection in machine learning systems: A SYCL case study
    John Lawson
  • Unified data movement for offloading Charm++ applications
    Matthias Diener, Laxmikant Kale
  • Population Count on Intel CPU, GPU, and FPGA
    Zheming Jin, Hal Finkel
  • SPHYNX: Spectral Partitioning for HYbrid aNd aXelerator-enabled systems
    Seher Acer, Erik G. Boman, Sivasankaran Rajamanickam

Break 3:15 - 3:50 pm CDT

Session Two: FPGAs

3:50 pm - 5:30 pm CDT

Session Chair: Lena Oden, FernUniversität in Hagen

  • Understanding the Performance of Elementary Numerical Linear Algebra Kernels in FPGAs
    Federico Favaro, Juan Oliver, Ernesto Dufrechou, Pablo Ezzatti
  • Scalability of Sparse Matrix Dense Vector Multiply (SpMV) on a Migrating Thread Architecture
    Brian A. Page, Peter M. Kogge
  • In-depth Optimization with the OpenACC-to-FPGA Framework on an Arria X FPGA
    Jacob Lambert, Seyong Lee, Jeffrey Vetter, Allen Malony
  • Performance Evaluation of Pipelined Communication Combined with Computation in OpenCL Programming on FPGA
    Norihisa Fujita, Ryohei Kobayashi, Yoshiki Yamaguchi, Tomohiro Ueno, Kentaro Sano, Taisuke Boku

home : organizers : call for papers : registration : program : submission : contact us