|Abstract||The end of Moore’s law creates a significant turning point for computer architecture. Today, performance is largely limited by energy, power, and cooling. Heterogeneity and radical new architecture designs are keys to achieving higher energy proportionality. In mobile computing, heterogeneity is well adopted in system-on-chip designs (e.g., to improve battery life). In high-performance computing (HPC), graphics processing units (GPUs) are now being accepted as efficient heterogeneous accelerators for certain workloads. FPGAs are also attracting considerable attention because their reconfigurability allows the hardware to be customized for different workloads in order to attain both higher performance and energy efficiency. In addition, the advent of high-level synthesis technology such as OpenCL for FPGAs, competitive floating-point capability, and CPU-FPGA hybrid designs can lower major hurdles for the FPGA adoption process in HPC. Nevertheless, the characteristics of FPGAs particularly with high-level synthesis are little studied. Since FPGAs run slower (e.g., 200 MHz) than do CPUs/GPUs, it is crucial to exploit pipeline parallelism and avoid pipeline stalls due to memory operations.
In this paper, we present a brief summary of our OpenCL microbenchmark that primarily targets the data path between off-chip memory and OpenCL system-side implementation.