Argonne National Laboratory

Towards Addressing the Challenges of Data-Intensive Computing on the Cloud

TitleTowards Addressing the Challenges of Data-Intensive Computing on the Cloud
Publication TypeReport
Year of Publication2014
AuthorsJung, E, Kettimuthu, R
Other NumbersANL/MCS-P5205-1014

The era of big data is here. The amount and generation rates of data are dramatically growing in science, business, and government sectors and in IT management areas. For example, a tomography beamline at the Advanced Photon Source (APS), an experimental facility at Argonne National Laboratory, can produce 150 terabytes of data in one day. The Data-Driven Urban Design and Analysis project1 at the University of Chicago aims at utilizing city-level sensor data gathering and predictive analytics on the data for the future urban design. Such data-intensive applications so far have been run on dedicated high performance computing facilities, in order to ensure timely computation for the applications. But this approach can be adopted only by a small number of users who have access to the expensive high performance computing facilities. Even this approach can scale up only to the capacity of the particular facility. Nevertheless, cloud computing can provide any user access to large computing resources for data-intensive applications, on-demand with greater flexibility, at low cost.
In order to deploy data-intensive applications on clouds successfully, however, sophisticated big data management and data transfer frameworks must be seamlessly incorporated into cloud computing infrastructure. In this article, we highlight the current limitations of cloud computing for data-intensive applications, and discuss the challenges facing deployment of data-intensive applications on the cloud and measures to address these challenges.