The Open Automation and Control Systems Journal
2015, 7 : 2039-2043Published online 2015 October 27. DOI: 10.2174/1874444301507012039
Publisher ID: TOAUTOCJ-7-2039
Research and Realization of the Extensible Data Cleaning Framework EDCF
ABSTRACT
This paper proposes the idea of establishing an extensible data cleaning framework which is based on the key technology of data cleaning, and the framework includes open rules library and algorithms library. This paper provides the descriptions of model principle and working process of the extensible data cleaning framework, and the validity of the framework is verified by experiment. When the data are being cleaned, all the errors in the data source can be cleaned according to the specific business by the predefined rules of the cleaning and choosing the appropriate algorithm. The last stage of the realization initially completes the basic functions of data cleaning module in the framework, and the framework which has good efficiency and operation effect is verified by the experiment.