See my Google Scholar page for statistics.
Conference Publications
- Advanced Thread Synchronization for Mul- tithreaded MPI Implementations. In CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Madrid, Spain, May 2017. .
- A Review of Lightweight Thread Approaches for High Performance Computing. In Cluster '16: Proceedings of the 2016 IEEE International Conference on Cluster Computing, Taipei, Taiwan, September 2016. .
- SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Extreme Scale. In ICPP '16: Proceedings of the 45th International Conference on Parallel Processing, Philadelphia, PA, USA, August 2016. .
- MPI+ULT: Overlapping Communication and Computation with User-Level Threads. In HPCC '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, pp. 444-454, New York, NY, USA, August 2015. .
- Automatic OpenCL Work-Group Size Selection for Multicore CPUs. In PACT '13: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pp. 387-397, Edinburgh, Scotland, September 2013. (36/208, 17.3%) .
- SnuCL: an OpenCL Framework for Heterogeneous CPU/GPU Clusters. In ICS '12: Proceedings of the 26th International Conference on Supercomputing, pp. 341-352, San Servolo Island, Venice, Italy, June 2012. (36/161, 22.4%) .
- An Automatic Code Overlaying Technique for Multicores with Explicitly-Managed Memory Hierarchies. In CGO '12: Proceedings of the 2012 International Symposium on Code Generation and Optimization, pp. 219-229, San Jose, California, USA, March 2012. (26/90, 28.9%) .
- Performance Characterization of the NAS Parallel Benchmarks in OpenCL. In IISWC '11: Proceedings of the 2011 IEEE International Symposium on Workload Characterization, pp. 137-148, Austin, Texas, USA, November 2011. (20/50, 40.0%) .
- SFMalloc: A Lock-Free and Mostly Synchronization-Free Dynamic Memory Allocator for Manycores. In PACT '11: Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques, pp. 252-262, Galveston Island, Texas, USA, October 2011. (36/221, 16.3%) .
- An OpenCL Framework for Homogeneous Manycores with no Hardware Cache Coherence. In PACT '11: Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques, pp. 56-67, Galveston Island, Texas, USA, October 2011. (36/221, 16.3%) .
- An OpenCL Framework for Heterogeneous Multicores with Local Memory. In PACT '10: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 193-204, Vienna, Austria, September 2010. (46/266, 17.3%) .
- COMIC++: A Software SVM System for Heterogeneous Multicore Accelerator Clusters. In HPCA '10: Proceedings of the 16th IEEE International Symposium on High Performance Computer Architecture, pp. 329-340, Bangalore, India, January 2010. (32/175, 18.3%) .
- Design and Implementation of Software-Managed Caches for Multicores with Local Memory. In HPCA '09: Proceedings of the 15th IEEE International Symposium on High Performance Computer Architecture, pp. 55-66, Raleigh, North Carolina, USA, February, 2009. (35/184, 19.0%) .
- COMIC: A Coherent Shared Memory Interface for Cell BE. In PACT '08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 303-314, Toronto, Canada, October, 2008. (30/159, 18.9%) .
Journal Publications
- A Performance Model for GPUs with Caches, IEEE Transactions on Parallel and Distributed Systems, Vol. 26, No. 7, pp. 1800-1813, July, 2015. .
Workshop Publications
- Enabling NVM for Data-Intensive Scientific Services. In INFLOW '16: Proceedings of the 4th Workshop on Interactions of NVM/Flash with Operating Systems and Workloads, Savannah, GA, USA, November 2016. .
- Systemwide Power Management with Argo. In HPPAC '16: Proceedings of the Twelfth Workshop on High-Performance, Power-Aware Computing, Chicago, IL, USA, May 2016. .
- Implementation and Evaluation of MPI Nonblocking Collective I/O. In PPMM '15: Proceedings of the 2nd Workshop on Parallel Programming Model for the Masses, Shenzhen, Guangdong, China, May 2015. .
- OpenCL as a Programming Model for GPU Clusters. In LCPC '11: Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing, pp. 76-90, Fort Collins, Colorado, USA, September 2011. .
- An Efficient Software Shared Virtual Memory for the Single-chip Cloud Computer. In APSys '11: Proceedings of the 2nd ACM SIGOPS Asia-Pacific Workshop on Systems, pp. 17-21, Shanghai, China, July 2011. (19/57, 33.3%) .
Poster Papers
- SWAP-Assembler 2: Scalable Genome Assembler towards Millions of Cores - Practice and Experience. Doctoral symposium in CCGrid '15: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Shenzhen, Guangdong, China, May 2015. .
- Lessons Learned Implementing User-Level Failure Mitigation in MPICH. In poster presentation in CCGrid '15: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Shenzhen, Guangdong, China, May 2015. .
- OpenCL as a Unified Programming Model for Heterogeneous CPU/GPU Clusters. Poster presentation in PPoPP '12: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 299-300, New Orleans, Louisiana, USA, February 2012. .
- A Software-SVM-based Transactional Memory for Multicore Accelerator Architectures with Local Memory. Poster presentation in PACT '10: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 567-568, Vienna, Austria, September 2010. .
Korean Publications
- SnuCL : OpenCL Programming Environment for Heterogeneous Manycore Clusters. Communications of KIISE, Vol. 32, No. 5, pp. 66-76, May 2014. .
- Trends on Heterogeneous Supercomputers and a Case Study on the Development of a Supercomputer Chundoong. Communications of KIISE, Vol. 31, No. 4, pp. 34-41, April 2013. .
- The Design and Implementation of a Cache Simulator for Multicore Systems. Korea Computer Congress, June 2009. .
- Automatic Detection of Memory Subsystem Parameters for Embedded Systems. Journal of KIISE: Computing Practices and Letters, Vol. 20, No. 5, pp. 350-354, May 2009. .
- Automatic Detection of Memory Subsystem Parameters for Embedded Systems. Korea Computer Congress, June 2008. .