The Open Cybernetics & Systemics Journal
2015, 9 : 792-798Published online 2015 July 31. DOI: 10.2174/1874110X01509010792
Publisher ID: TOCSJ-9-792
An Improved Data Placement Strategy in a Heterogeneous Hadoop Cluster
ABSTRACT
Hadoop Distributed File System (HDFS) is designed to store big data reliably, and to stream these data at high bandwidth to user applications. However, the default HDFS block placement policy assumes that all nodes in the cluster are homogeneous, and randomly place blocks without considering any nodes’ resource characteristics, which decreases self-adaptability of the system. In this paper, we take account nodes heterogeneities, such as utilization of nodes’ disk space, and put forward an improved blocks placement strategy for solving some drawbacks in the default HDFS. The simulation experiments indicate that our improved strategy performs much better not only in the data distribution but also significantly saves more time than the default blocks placement.