TY - GEN
T1 - A general approach for partitioning web page content based on geometric and style information
AU - Guo, Hui
AU - Mahmud, Jalal
AU - Borodin, Yevgen
AU - Stent, Amanda
AU - Ramakrishnan, I. V.
PY - 2007
Y1 - 2007
N2 - In this paper, we describe a general-purpose approach for partitioning Web page content. The novelty of our approach lies in the use of detailed layout information from a Web page renderer to determine spatial locality and identify visual separators, and the use of relaxed matching over presentation style information to determine presentation style similarity. We present several examples to illustrate the generality of our approach.
AB - In this paper, we describe a general-purpose approach for partitioning Web page content. The novelty of our approach lies in the use of detailed layout information from a Web page renderer to determine spatial locality and identify visual separators, and the use of relaxed matching over presentation style information to determine presentation style similarity. We present several examples to illustrate the generality of our approach.
UR - https://www.scopus.com/pages/publications/51149116506
U2 - 10.1109/ICDAR.2007.4377051
DO - 10.1109/ICDAR.2007.4377051
M3 - Conference contribution
SN - 0769528228
SN - 9780769528229
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 929
EP - 933
BT - Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007
T2 - 9th International Conference on Document Analysis and Recognition, ICDAR 2007
Y2 - 23 September 2007 through 26 September 2007
ER -