Scalable Data Management for Map-Reduce-based Data-Intensive Applications: A View for Cloud and Hybrid Infrastructures
|Title||Scalable Data Management for Map-Reduce-based Data-Intensive Applications: A View for Cloud and Hybrid Infrastructures|
|Publication Type||Journal Article|
|Year of Publication||2013|
|Authors||Antoniu, G, Bigot, J, Blanchet, C, Bouge, L, Briant, F, Cappello, F, Costan, A, Desprez, F, Fedak, G, Gault, S, Keahey, K, Nicolae, B, Perez, C, Simonet, A, Suter, F, Tang, B, Terreux, R|
|Journal||International Journal of Cloud Computing|
As Map-Reduce emerges as a leading programming paradigm for data-intensive computing, today's frameworks which support it still have substantial shortcomings that limit its potential scalability. In this paper we discuss several directions where there is room for such progress: they concern storage efficiency under massive data access concurrency, scheduling, volatility and fault-tolerance. We place our discussion in the perspective of the current evolution towards an increasing integration of large-scale distributed platforms (clouds, cloud federations, enterprise desktop grids, etc.). We propose an approach which aims to overcome the current limitations of existing Map-Reduce frameworks, in order to achieve scalable, concurrency-optimized, fault-tolerant Map-Reduce data processing on hybrid infrastructures. This approach will be evaluated with real- life bio-informatics applications on existing Nimbus-powered cloud testbeds interconnected with desktop grids.