Sina Jasim

The Open Cybernetics & Systemics Journal

2015, 9 : 288-294
Published online 2015 May 29. DOI: 10.2174/1874110X01509010288
Publisher ID: TOCSJ-9-288

Research on K-Means Algorithm Based on Parallel Improving and Applying

Deng Zhenrong , Deng Xing , Zhang Chuan , Xu Liang and Huang Wenming

School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin, 541004, P.R. China.

ABSTRACT

The capacity of single server or CPU is unable to finish the task of the mining of mass data. In consideration of this bottleneck problem, a combined algorithm which is used by genetic and MR-based parallel clustering algorithm is proposed. To make up for the defects of clustering analysis in screening the clustering center, the clusters are used by genetic algorithm, relying on M-R parallel computing model to accelerate the convergence of the clustering analysis. To verify reasonableness of algorithm, this algorithm which is applied to analysis of the actual log is based on building of Hadoop platform. Experimental results show that relying on performance of distributed cluster computing and genetic clustering analysis to process log files can get better clustering results, and the efficiency of mining of massive log can be greatly improved.

Keywords:

Cloud computing, Clustering analysis, Genetic algorithm, Map-reduce, Mass data processing.