Skip to main navigation Skip to search Skip to main content

HEDC: A histogram estimator for data in the cloud

  • Renmin University of China

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

With increasing popularity of cloud based data management, improving the performance of queries in the cloud is an urgent issue to solve. Summary of data distribution and statistical information has been commonly used in traditional database to support query optimization, and histograms are of particular interest. Naturally, histograms could be used to support query optimization and efficient utilization of computing resources in the cloud. Histograms could provide helpful reference information for generating optimal query plan, and generate basic statistics useful for guaranteeing the load balance of query processing in the cloud. Since it is too expensive to construct the exact histogram on massive data, building the approximate histogram is a more feasible solution. This problem, however, is challenging to solve in the cloud environment because of the special data organization and processing mode in the cloud. In this paper, we present HEDC, a Histogram Estimator for Data in the Cloud. We design a histogram estimate workflow based on an extended MapReduce framework, and propose novel sampling mechanisms to leverage the sampling efficiency and estimate accuracy. We experimentally validate our techniques on Hadoop and the results demonstrate that HEDC can provide promising histogram estimate for massive data in the cloud.

Original languageEnglish
Title of host publicationCloudDB'12 - Proceedings of the 3rd ACM International Workshop on Cloud Data Management, Co-located with CIKM 2012
Pages51-58
Number of pages8
DOIs
StatePublished - 2012
Event3rd ACM International Workshop on Cloud Data Management, CloudDB 2012 - Co-located with CIKM 2012 - Maui, HI, United States
Duration: Oct 29 2012Oct 29 2012

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference3rd ACM International Workshop on Cloud Data Management, CloudDB 2012 - Co-located with CIKM 2012
Country/TerritoryUnited States
CityMaui, HI
Period10/29/1210/29/12

Keywords

  • Cloud computing
  • Histogram estimate
  • MapReduce
  • Sampling

Fingerprint

Dive into the research topics of 'HEDC: A histogram estimator for data in the cloud'. Together they form a unique fingerprint.

Cite this