The Open Automation and Control Systems Journal

2015, 7 : 1759-1767
Published online 2015 October 20. DOI: 10.2174/1874444301507011759
Publisher ID: TOAUTOCJ-7-1759

Research of Distributed Query and Optimization Method Based on Metadata

Huaiyuan Wang
College of Electronics and Information Engineering, QiongZhou University, Sanya, Hainan, 572022, China.

ABSTRACT

A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, designed two different data solutions based on query and optimization, for applying to common data and huge data respectively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; copying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.

Keywords:

Distributed, federated query, Hadoop, memory database, metadata.