The Open Cybernetics & Systemics Journal
2015, 9 : 288-294Published online 2015 May 29. DOI: 10.2174/1874110X01509010288
Publisher ID: TOCSJ-9-288
Research on K-Means Algorithm Based on Parallel Improving and Applying
ABSTRACT
The capacity of single server or CPU is unable to finish the task of the mining of mass data. In consideration of this bottleneck problem, a combined algorithm which is used by genetic and MR-based parallel clustering algorithm is proposed. To make up for the defects of clustering analysis in screening the clustering center, the clusters are used by genetic algorithm, relying on M-R parallel computing model to accelerate the convergence of the clustering analysis. To verify reasonableness of algorithm, this algorithm which is applied to analysis of the actual log is based on building of Hadoop platform. Experimental results show that relying on performance of distributed cluster computing and genetic clustering analysis to process log files can get better clustering results, and the efficiency of mining of massive log can be greatly improved.