The Open Cybernetics & Systemics Journal

2015, 9 : 131-137
Published online 2015 April 17. DOI: 10.2174/1874110X01509010131
Publisher ID: TOCSJ-9-131

The Study of a Hierarchical Hadoop Architecture in Multiple Data Centers Environment

Sun Shengtao , Wu Aizhi and Liu Xiaoyang
School of Information Science and Engineering, Yanshan University, Hebei, 066004, P.R. China.

ABSTRACT

Hadoop is a reasonable tool for cloud computing in big data era and MapReduce paradigm may be a highly successful programming model for large-scale data-intensive computing application, but the conventional MapReduce model and Hadoop framework limit themselves to implement jobs within single cluster. Traditional single-cluster Hadoop may not suitable for situations when data and compute resources are widely distributed this paper focuses on the application of Hadoop across multiple data centers and clusters. A hierarchical distributed computing architecture of Hadoop is designed and the virtual Hadoop file system is proposed to provide global data view across multiple data centers. The job submitted by user can be decomposed automatically into several sub-jobs which are then allocated and executed on corresponding clusters by location-aware manner. The prototype based on this architecture shows encouraging results.

Keywords:

Across clusters , apache Hadoop, hierarchical distributed computing architecture, multiple data centers, virtual Hadoop file system.