TY - CHAP
T1 - Modeling and extracting deep-web query interfaces
AU - Wu, Wensheng
AU - Doan, An Hai
AU - Yu, Clement
AU - Meng, Weiyi
PY - 2009
Y1 - 2009
N2 - Interface modeling & extraction is a fundamental step in building a uniform query interface to a multitude of databases on the Web. Existing solutions are limited in that they assume interfaces are flat and thus ignore the inherent structure of interfaces, which then seriously hampers the effectiveness of interface integration. To address this limitation, in this chapter, we model an interface with a hierarchical schema (e.g., an ordered-tree of attributes). We describe ExQ, a novel schema extraction system with two distinct features. First, ExQ discovers the structure of an interface based on its visual representation via spatial clustering. Second, ExQ annotates the discovered schema with labels from the interface by imitating the human-annotation process. ExQ has been extensively evaluated with real-world query interfaces in five different domains and the results show that ExQ achieves above 90% accuracy rate in both structure discovery & schema annotation tasks.
AB - Interface modeling & extraction is a fundamental step in building a uniform query interface to a multitude of databases on the Web. Existing solutions are limited in that they assume interfaces are flat and thus ignore the inherent structure of interfaces, which then seriously hampers the effectiveness of interface integration. To address this limitation, in this chapter, we model an interface with a hierarchical schema (e.g., an ordered-tree of attributes). We describe ExQ, a novel schema extraction system with two distinct features. First, ExQ discovers the structure of an interface based on its visual representation via spatial clustering. Second, ExQ annotates the discovered schema with labels from the interface by imitating the human-annotation process. ExQ has been extensively evaluated with real-world query interfaces in five different domains and the results show that ExQ achieves above 90% accuracy rate in both structure discovery & schema annotation tasks.
UR - https://www.scopus.com/pages/publications/70350212437
U2 - 10.1007/978-3-642-04141-9_4
DO - 10.1007/978-3-642-04141-9_4
M3 - Chapter
SN - 9783642041402
T3 - Studies in Computational Intelligence
SP - 65
EP - 90
BT - Advances in Information and Intelligent Systems
A2 - Ras, Zbigniew
A2 - Ribarsky, William
ER -