This document discusses Netflix's use of Spark on Yarn for ETL workloads. Some key points: - Netflix runs Spark on Yarn across 3000 EC2 nodes to process large amounts of streaming data from over 100 million daily users. - Technical challenges included optimizing performance for S3, dynamic resource allocation, and Parquet read/write. Improvements led to up to 18x faster job completion times. - Production Spark applications include recommender systems that analyze user behavior and personalize content across billions of profiles and titles.