The Open Cybernetics & Systemics Journal

2015, 9 : 792-798
Published online 2015 July 31. DOI: 10.2174/1874110X01509010792
Publisher ID: TOCSJ-9-792

An Improved Data Placement Strategy in a Heterogeneous Hadoop Cluster

Wentao Zhao , Lingjun Meng , Jiangfeng Sun , Yang Ding , Haohao Zhao and Lina Wang
School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454000, China; and Opening Project of Key Laboratory of Mine Informatization, Henan Polytechnic University, Jiaozuo 454000, Henan, China.

ABSTRACT

Hadoop Distributed File System (HDFS) is designed to store big data reliably, and to stream these data at high bandwidth to user applications. However, the default HDFS block placement policy assumes that all nodes in the cluster are homogeneous, and randomly place blocks without considering any nodes’ resource characteristics, which decreases self-adaptability of the system. In this paper, we take account nodes heterogeneities, such as utilization of nodes’ disk space, and put forward an improved blocks placement strategy for solving some drawbacks in the default HDFS. The simulation experiments indicate that our improved strategy performs much better not only in the data distribution but also significantly saves more time than the default blocks placement.

Keywords:

Data placement, disk space utilization, HDFS, network load, nodes heterogeneity.