Scalable Data Management for Map-Reduce-based Data-Intensive Applications: A View for Cloud and Hybrid Infrastructures

TitleScalable Data Management for Map-Reduce-based Data-Intensive Applications: A View for Cloud and Hybrid Infrastructures
Publication TypeJournal Article
Year of Publication2013
AuthorsAntoniu, G, Bigot, J, Blanchet, C, Bouge, L, Briant, F, Cappello, F, Costan, A, Desprez, F, Fedak, G, Gault, S, Keahey, K, Nicolae, B, Perez, C, Simonet, A, Suter, F, Tang, B, Terreux, R
JournalInternational Journal of Cloud Computing
Volume2
Issue2/3
Pagination150-170
Date Published02/2013
Other NumbersANL/MCS-P4032-0213
Abstract

As Map-Reduce emerges as a leading programming paradigm for data-intensive computing, today's frameworks which support it still have substantial shortcomings that limit its potential scalability. In this paper we discuss several directions where there is room for such progress: they concern storage efficiency under massive data access concurrency, scheduling, volatility and fault-tolerance. We place our discussion in the perspective of the current evolution towards an increasing integration of large-scale distributed platforms (clouds, cloud federations, enterprise desktop grids, etc.). We propose an approach which aims to overcome the current limitations of existing Map-Reduce frameworks, in order to achieve scalable, concurrency-optimized, fault-tolerant Map-Reduce data processing on hybrid infrastructures. This approach will be evaluated with real- life bio-informatics applications on existing Nimbus-powered cloud testbeds interconnected with desktop grids.

URLhttp://www.inderscience.com/info/inarticle.php?artid=55265
PDFhttp://www.mcs.anl.gov/papers/P4032-0213.pdf