The Open Cybernetics & Systemics Journal
2014, 8 : 530-534Published online 2014 December 31. DOI: 10.2174/1874110X01408010530
Publisher ID: TOCSJ-8-530
Improved K-Means Algorithm in Text Semantic Clustering
ABSTRACT
Text clustering is a very important technology in the area of text data mining. The semantic calculation method can greatly improve the computational. The aim of this paper is to improve the existing text clustering algorithms, for Chinese text and used semantic clustering method. First, in similarity calculation module of the clustering, used a staged and integrated semantic similarity algorithm, the text semantic and context factors was blended in the each computation stage. Then improved the traditional K-means algorithm, which used the priority strategy to divide up the relative centralize data at first, reduced the randomness of initial center, ensured that each cluster partition of the sample points have high similarity. Finally, the experiments have proved that the proposed algorithm not only can improve the accuracy of clustering, but also has the very high stability.