The Open Automation and Control Systems Journal

2015, 7 : 1922-1929
Published online 2015 October 20. DOI: 10.2174/1874444301507011922
Publisher ID: TOAUTOCJ-7-1922

On the Application of a New Method of the Top-Down Decision Tree Incremental Pruning in Data Classification

Shao Hongbo , Zhou Jing and Wu Jianhui
College of Science, Agricultural University of Hebei, Baoding, China.

ABSTRACT

Decision tree, as an important branch of machine learning, has been successfully used in several areas. The limitation of decision tree learning has led to the over-fitting of the training set, thus weakening the accuracy of decision trees. In order to overcome its defects, decision trees pruning is often adopted as a follow-up step of the decision trees learning algorithm to optimize decision trees. At present the commonly-used decision tree sample is based on statistical analysis. Due to the lack of samples, the small training set is less statistical, and it leads pruning methods to failure. Based on the previous research and study, this paper has presented a top-down decision tree incremental pruning method (TDIP), which applies the incremental learning to the comparison between the certainty and uncertainty rules so that only the former remains. In addition, to speed up the process of its pruning, a top-down search is defined to avoid the iteration of the same decision tree. The top-down decision tree incremental pruning method (TDIP) is independent of statistical characteristics of the training set. It is a robust pruning method. The experimental results show that the method maintain a good balance between accuracy and size of pruned decision trees, and is better than those traditional methods in classification problems.

Keywords:

Decision trees, Incremental learning, Overfitting, Post-pruning, TDIP, Iteration.