SlideShare a Scribd company logo
Eva Tse, Netflix 
November 12, 2014 | Las Vegas, LV
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Cloud 
apps 
Event Data 
Suro Ursula 
Cassandra 
Aegisthus 
15 min 
Dimension Data 
Daily 
S3 
SS Tables
Storage Compute Service Tools 
S3
Storage Compute Service Tools 
S3 v2.0
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
• Works well on S3
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
YARN-1864 
YARN-2026 
YARN-2012 
YARN-2214 
YARN-2360 
YARN-2540
S3
S3
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
TezCompiler MRCompiler 
Tez Plan 
Logical Plan 
Physical Plan 
Tez Execution Engine 
MR Plan 
MR Execution Engine 
d
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
A Distributed SQL Query Engine for Big Data
techblog.netflix.com
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
21 committed PRs and 14 PRs in review
S3
v2.0
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
techblog.netflix.com
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Storage Compute Service Tools 
d 
S3 v2.0
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
YARN-1864 
YARN-2026 
YARN-2012 
YARN-2214 
YARN-2360 
YARN-2540 
HIVE-6783 
HIVE-6785 
HIVE-6938 
HIVE-7800 
PARQUET-100 
PARQUET-106 
PARQUET-2 
PARQUET-22 
PARQUET-70 
PARQUET-75 
PARQUET-92 
PARQUET-99 
PIG-3986
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
Talk Time Title 
PFC-305 Wednesday, 1:15pm Embracing Failure: Fault Injection and Service Reliability 
BDT-403 Wednesday, 2:15pm Next Generation Big Data Platform at Netflix 
PFC-306 Wednesday, 3:30pm Performance Tuning EC2 
DEV-309 Wednesday, 3:30pm From Asgard to Zuul, How Netflix’s proven Open Source 
Tools can accelerate and scale your services 
ARC-317 Wednesday, 4:30pm Maintaining a Resilient Front-Door at Massive Scale 
PFC-304 Wednesday, 4:30pm Effective Inter-process Communications in the Cloud: The 
Pros and Cons of Micro Services Architectures 
ENT-209 Wednesday, 4:30pm Cloud Migration, Dev-Ops and Distributed Systems 
APP-310 Friday, 9:00am Scheduling using Apache Mesos in the Cloud
Next Generation Big Data Platform at Netflix 2014

More Related Content

What's hot (12)

PDF
Lambda architecture
Szilveszter Molnár
 
PDF
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
PDF
Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...
Databricks
 
PDF
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Databricks
 
PDF
Sa introduction to big data pipelining with cassandra & spark west mins...
Simon Ambridge
 
PDF
Querying Data Pipeline with AWS Athena
Yaroslav Tkachenko
 
PPTX
Kafka Lambda architecture with mirroring
Anant Rustagi
 
PDF
Databases & Analytics AWS re:invent 2019 Recap
Sungmin Kim
 
PDF
Spark Summit EU talk by Rolf Jagerman
Spark Summit
 
PDF
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Gigaom
 
PDF
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Natalino Busa
 
PDF
Mhug apache storm
Joseph Niemiec
 
Lambda architecture
Szilveszter Molnár
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...
Databricks
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Databricks
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Simon Ambridge
 
Querying Data Pipeline with AWS Athena
Yaroslav Tkachenko
 
Kafka Lambda architecture with mirroring
Anant Rustagi
 
Databases & Analytics AWS re:invent 2019 Recap
Sungmin Kim
 
Spark Summit EU talk by Rolf Jagerman
Spark Summit
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Gigaom
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Natalino Busa
 
Mhug apache storm
Joseph Niemiec
 

Similar to Next Generation Big Data Platform at Netflix 2014 (20)

PPTX
Data Science with Elastic MapReduce (EMR) at Netflix
Kurt Brown
 
PPTX
Scaling Data Quality @ Netflix
Michelle Ufford
 
PPTX
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
DataWorks Summit
 
PDF
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
PPSX
Big Data Redis Mongodb Dynamodb Sharding
Araf Karsh Hamid
 
PDF
リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)
Hajime Sano
 
PDF
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
DataStax
 
PPTX
Web Server Scheduling
David Evans
 
PPTX
State of Azure Sql Database
Marco Parenzan
 
PDF
How Cloudflare analyzes -1m dns queries per second @ Percona E17
Tom Arnfeld
 
PDF
Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015
stevemcpherson
 
PDF
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
Amazon Web Services Korea
 
PDF
NetApp Insight 2015 Berlin Sponsors Guide
NetApp Insight
 
PPTX
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
PPTX
Spark on Yarn @ Netflix
Nezih Yigitbasi
 
PDF
Adios hadoop, Hola Spark! T3chfest 2015
dhiguero
 
PDF
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Lightbend
 
PPTX
What's new with Azure Sql Database
Marco Parenzan
 
PDF
2018512 AWS上での機械学習システムの構築とSageMaker
Yasuhiro Matsuo
 
PDF
Managing your Black Friday Logs NDC Oslo
David Pilato
 
Data Science with Elastic MapReduce (EMR) at Netflix
Kurt Brown
 
Scaling Data Quality @ Netflix
Michelle Ufford
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
DataWorks Summit
 
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Big Data Redis Mongodb Dynamodb Sharding
Araf Karsh Hamid
 
リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)
Hajime Sano
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
DataStax
 
Web Server Scheduling
David Evans
 
State of Azure Sql Database
Marco Parenzan
 
How Cloudflare analyzes -1m dns queries per second @ Percona E17
Tom Arnfeld
 
Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015
stevemcpherson
 
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
Amazon Web Services Korea
 
NetApp Insight 2015 Berlin Sponsors Guide
NetApp Insight
 
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
Spark on Yarn @ Netflix
Nezih Yigitbasi
 
Adios hadoop, Hola Spark! T3chfest 2015
dhiguero
 
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Lightbend
 
What's new with Azure Sql Database
Marco Parenzan
 
2018512 AWS上での機械学習システムの構築とSageMaker
Yasuhiro Matsuo
 
Managing your Black Friday Logs NDC Oslo
David Pilato
 
Ad

Recently uploaded (20)

PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
PDF
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
Integrating IIoT with SCADA in Oil & Gas A Technical Perspective.pdf
Rejig Digital
 
PDF
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
PCU Keynote at IEEE World Congress on Services 250710.pptx
Ramesh Jain
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PPTX
Machine Learning Benefits Across Industries
SynapseIndia
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
Integrating IIoT with SCADA in Oil & Gas A Technical Perspective.pdf
Rejig Digital
 
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PCU Keynote at IEEE World Congress on Services 250710.pptx
Ramesh Jain
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
Machine Learning Benefits Across Industries
SynapseIndia
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Ad

Next Generation Big Data Platform at Netflix 2014

Editor's Notes