Skip to main navigation Skip to search Skip to main content

A general approach for partitioning web page content based on geometric and style information

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

In this paper, we describe a general-purpose approach for partitioning Web page content. The novelty of our approach lies in the use of detailed layout information from a Web page renderer to determine spatial locality and identify visual separators, and the use of relaxed matching over presentation style information to determine presentation style similarity. We present several examples to illustrate the generality of our approach.

Original languageEnglish
Title of host publicationProceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007
Pages929-933
Number of pages5
DOIs
StatePublished - 2007
Event9th International Conference on Document Analysis and Recognition, ICDAR 2007 - Curitiba, Brazil
Duration: Sep 23 2007Sep 26 2007

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2

Conference

Conference9th International Conference on Document Analysis and Recognition, ICDAR 2007
Country/TerritoryBrazil
CityCuritiba
Period09/23/0709/26/07

Fingerprint

Dive into the research topics of 'A general approach for partitioning web page content based on geometric and style information'. Together they form a unique fingerprint.

Cite this