This document discusses scaling extract, transform, load (ETL) processes with Apache Hadoop. It describes how data volumes and varieties have increased, challenging traditional ETL approaches. Hadoop offers a flexible way to store and process structured and unstructured data at scale. The document outlines best practices for extracting data from databases and files, transforming data using tools like MapReduce, Pig and Hive, and loading data into data warehouses or keeping it in Hadoop. It also discusses workflow management with tools like Oozie. The document cautions against several potential mistakes in ETL design and implementation with Hadoop.