The document discusses composing reusable extract-transform-load (ETL) processes on Hadoop. It covers the data science lifecycle of acquiring, analyzing and taking action on data. It states that 80% of work in data science is spent on acquiring and preparing data. The document then discusses using Cascading, an abstraction framework for building MapReduce jobs, to create reusable ETL processes that are linearly scalable and follow a single-purpose composable design.