The document describes an in-memory data pipeline and warehouse using Spark, Spark SQL, Tachyon and Parquet. It involves ingesting financial transaction data from S3, transforming the data through cleaning and joining steps, and building a data warehouse using Spark SQL and Parquet for querying. Key aspects covered include distributing metadata lookups, balancing data partitions, broadcasting joins to avoid skew, caching data in Tachyon and Jaws for a RESTful interface to Spark SQL.