The document describes a generic crawler that can crawl websites without APIs by using rules to extract data. It discusses the crawler's infrastructure, introduction to crawler rules using XPATH and CSS expressions, the crawl procedure of generating links, crawling based on links and saving data to a local DB, and limitations such as not working on AJAX sites. The goal is to build a multipurpose crawler powered by cloud computing that can extract information from various websites.