Skip to main navigation Skip to search Skip to main content

Keyword search with real-time entity resolution in relational databases

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Traditional methods of IR-style keyword search/query in relational databases are based on clean data without entity resolution (ER), and as a result, their answers to a query may contain duplicates for dirty datasets with duplicate tuples that have different identifiers and refer to the same real-world entity. In this paper, we propose a method for processing top -N keyword queries with real-time ER. This method creates an index to obtain candidate tuples for a keyword query, defines a function to compute the similarities between the query and its candidate tuples, and designs a clustering algorithm with the Divide and Conquer mechanism to deduplicate the query results. Extensive experiments are conducted to confirm the effectiveness and efficiency of the method for both dirty and (almost) clean datasets.

Original languageEnglish
Title of host publicationProceedingsof 2018 10th International Conference on Machine Learning and Computing, ICMLC 2018
PublisherAssociation for Computing Machinery
Pages134-139
Number of pages6
ISBN (Electronic)9781450363532
DOIs
StatePublished - Feb 26 2018
Event10th International Conference on Machine Learning and Computing, ICMLC 2018 - Macau, China
Duration: Feb 26 2018Feb 28 2018

Publication series

NameACM International Conference Proceeding Series

Conference

Conference10th International Conference on Machine Learning and Computing, ICMLC 2018
Country/TerritoryChina
CityMacau
Period02/26/1802/28/18

Keywords

  • Entity resolution
  • Relational database
  • Similarity
  • Top-N keyword query

Fingerprint

Dive into the research topics of 'Keyword search with real-time entity resolution in relational databases'. Together they form a unique fingerprint.

Cite this