This document summarizes a research paper on vision-based deep web data extraction from nested query result records. It proposes a technique to extract data from web pages using different font styles, sizes, and cascading style sheets. The extracted data is then aligned into a table using alignment algorithms, including pair-wise, holistic, and nested-structure alignment. The goal is to remove immaterial information from query result pages to facilitate analysis of the extracted data.
Related topics: