The document discusses various approaches to information extraction from web documents, including knowledge engineering, machine learning, wrappers, and different IE systems. It analyzes IE systems based on their capabilities, such as their ability to extract from complex objects, different document types, resilience to changes, and degree of automation. The best system is the BYU ontology approach, which has capabilities such as supporting nested data, being resilient and adaptive, and working on semi-structured and unstructured text.
Related topics: