The Open Cybernetics & Systemics Journal
2015, 9 : 131-137Published online 2015 April 17. DOI: 10.2174/1874110X01509010131
Publisher ID: TOCSJ-9-131
The Study of a Hierarchical Hadoop Architecture in Multiple Data Centers Environment
ABSTRACT
Hadoop is a reasonable tool for cloud computing in big data era and MapReduce paradigm may be a highly successful programming model for large-scale data-intensive computing application, but the conventional MapReduce model and Hadoop framework limit themselves to implement jobs within single cluster. Traditional single-cluster Hadoop may not suitable for situations when data and compute resources are widely distributed this paper focuses on the application of Hadoop across multiple data centers and clusters. A hierarchical distributed computing architecture of Hadoop is designed and the virtual Hadoop file system is proposed to provide global data view across multiple data centers. The job submitted by user can be decomposed automatically into several sub-jobs which are then allocated and executed on corresponding clusters by location-aware manner. The prototype based on this architecture shows encouraging results.