SlideShare a Scribd company logo
Matt VanLandeghem, Nielsen
How Nielsen Utilized
Databricks for Large-scale
Research and Development
#EUent4
About Nielsen
• Founded 1923
• Buy & Watch
– Buy: Market Research
– Watch: Audience Measurement
• Not just TV!
• Also Radio and Digital, including PC, Mobile, Connected Devices, Digital
Audio, Digital TV
• Digital Ad spending now meeting/exceeding TV Ad spending
2#EUent4
What is Nielsen Digital Ad Ratings (DAR)?
• Measurement of computer, mobile, and over-the-top device audience
– Comparable to TV ratings
– Who is behind the screen?
• Advertising campaigns
– Primary focus is age/gender demographic breaks
– On-Target Delivery (%) is a key metric
• Global product
– 25 countries
3#EUent4
How does DAR work?
4#EUent4
4.3
2.4
2.0
0
1
2
3
4
5
Third-party
Demographics
Report to ClientAd Impression
“Big Data”Mobile, computer, over-the-
top
Overnight daily reporting of:
Unique audience, Ad
impressions, On-Target %
Nielsen
Bias-Correction
Adjustment
Focus of today’s
presentation…
Nielsen Adjustments
• “Big Data” is not perfect
– Needs bias correction
– Where the value of Nielsen’s high-quality panels really
shines
– Nielsen’s panels provide a “truth set” that can be used to
develop models that adjust big data
• 3 sources of bias
– Misrepresentation
– Misattribution
– Non-coverage
• Nielsen adjustments are an active area of Research
and Development
5#EUent4
Nielsen Adjustments
• Metered home PC behavior
– Representative sample of U.S. homes
– “Medium” data
• Production impression data
– Big data
• What is the best way to create Nielsen adjustments
AND test them in a production environment?
• Foundation for Nielsen’s Databricks Use Case
6#EUent4
Nielsen Business Case
• Recently created new DAR adjustment
methodologies
– Small-scale testing showed the new methodologies are an
enhancement over current methodologies
• Business requirement: test new methodologies on a
large # of campaigns
– Need to understand client impact
– Large-scale testing could identify corner or edge cases where new
methodologies could break down and cause a data quality issue
– Small scale testing: ~20 campaigns
– Large scale testing: ~4000 campaigns
7#EUent4
Databricks
• Cluster management
• Provide a friendly interface to Spark for our
Data Scientists
– Multiple programming languages
– Create adjustment factors
• Uses an algorithm not available in SQL
– Link to production databases
– Apply adjustment factors to production-level data
– Analyze data with new adjustment factors applied
8#EUent4
Nielsen Business Case
9#EUent4
Aggregated panel data
Netezza
Cloud
-Combine small and large data
-Run all analyses in one place
using PySpark/Spark SQL
Data Lake
Oracle
Aggregated production data
10#EUent4
11#EUent4
Nielsen Business Case
• Performance gains:
– What would have taken 36 hours with standalone
Python only took 1.5 hours in Spark/Databricks
– Edge-cases identified
• Advantages of one methodology over another also
identified
– Short turn-around if any revisions to
methodology
12#EUent4
Nielsen Business Case
• Other benefits
– Reduced time from idea to deployment
– Enhanced support/investigation once deployed
• Client data inquiries and issues addressed quicker
– Collaboration
• Application Development teams
• International data science teams
• These new methodologies being tested in other
products
– Enhanced skillsets of data scientists
13#EUent4
Summary
• At the end of the day, the Databricks/Spark
technology allowed us to solve this business
use case
• The reduced R&D timeline plus extensive
testing will allow enhanced methodologies to be
available to our clients sooner
14#EUent4
Copyright © 2017 The Nielsen Company. Confidential and proprietary.
Special thanks: Mala Sivarajan, Anil Singh
Ad

More Related Content

What's hot (20)

DevOps Best Practices
DevOps Best PracticesDevOps Best Practices
DevOps Best Practices
Giragadurai Vallirajan
 
Terraform Introduction
Terraform IntroductionTerraform Introduction
Terraform Introduction
soniasnowfrog
 
10 user centered design
10 user centered design10 user centered design
10 user centered design
Lilia Sfaxi
 
Quantum computing
Quantum computingQuantum computing
Quantum computing
sri satya sai institute of higher learning
 
Containerization
ContainerizationContainerization
Containerization
Gowtham Ventrapati
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar
Cloudera, Inc.
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
Rishabh Kumar
 
Cloud native principles
Cloud native principlesCloud native principles
Cloud native principles
Diego Pacheco
 
Realizzazione di Microservizi con Docker, Kubernetes, Kafka e Mongodb
Realizzazione di Microservizi con Docker, Kubernetes, Kafka e MongodbRealizzazione di Microservizi con Docker, Kubernetes, Kafka e Mongodb
Realizzazione di Microservizi con Docker, Kubernetes, Kafka e Mongodb
MongoDB
 
Cloud development and career path
Cloud development and career pathCloud development and career path
Cloud development and career path
Praveen Hanchinal
 
Google Cloud IoT Core
Google Cloud IoT CoreGoogle Cloud IoT Core
Google Cloud IoT Core
Ido Flatow
 
Why to Cloud Native
Why to Cloud NativeWhy to Cloud Native
Why to Cloud Native
Karthik Gaekwad
 
DevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesDevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to Kubernetes
Ronny Trommer
 
Node.js 기본
Node.js 기본Node.js 기본
Node.js 기본
Han Jung Hyun
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best Practices
QBurst
 
Mashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud ComputingMashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud Computing
David Linthicum
 
Module 2-Cloud Computing Architecture.pptx
Module 2-Cloud Computing Architecture.pptxModule 2-Cloud Computing Architecture.pptx
Module 2-Cloud Computing Architecture.pptx
SabaFatima350242
 
Kubernetes (K8s) 簡介 | GDSC NYCU
Kubernetes (K8s) 簡介 | GDSC NYCUKubernetes (K8s) 簡介 | GDSC NYCU
Kubernetes (K8s) 簡介 | GDSC NYCU
秀吉(Hsiu-Chi) 蔡(Tsai)
 
Visual Design
Visual DesignVisual Design
Visual Design
Ruth Malan
 
Quantum Computing
Quantum ComputingQuantum Computing
Quantum Computing
AakashBhalla2
 
Terraform Introduction
Terraform IntroductionTerraform Introduction
Terraform Introduction
soniasnowfrog
 
10 user centered design
10 user centered design10 user centered design
10 user centered design
Lilia Sfaxi
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar
Cloudera, Inc.
 
Cloud native principles
Cloud native principlesCloud native principles
Cloud native principles
Diego Pacheco
 
Realizzazione di Microservizi con Docker, Kubernetes, Kafka e Mongodb
Realizzazione di Microservizi con Docker, Kubernetes, Kafka e MongodbRealizzazione di Microservizi con Docker, Kubernetes, Kafka e Mongodb
Realizzazione di Microservizi con Docker, Kubernetes, Kafka e Mongodb
MongoDB
 
Cloud development and career path
Cloud development and career pathCloud development and career path
Cloud development and career path
Praveen Hanchinal
 
Google Cloud IoT Core
Google Cloud IoT CoreGoogle Cloud IoT Core
Google Cloud IoT Core
Ido Flatow
 
DevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesDevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to Kubernetes
Ronny Trommer
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best Practices
QBurst
 
Mashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud ComputingMashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud Computing
David Linthicum
 
Module 2-Cloud Computing Architecture.pptx
Module 2-Cloud Computing Architecture.pptxModule 2-Cloud Computing Architecture.pptx
Module 2-Cloud Computing Architecture.pptx
SabaFatima350242
 

Similar to How Nielsen Utilized Databricks for Large-Scale Research and Development with Matt VanLandeghem (20)

GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in Graphdatenbanken
Neo4j
 
Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLA
Paul Barsch
 
Lynn Dwyer: Smarter Working: What is all this digitalisation about?
Lynn Dwyer: Smarter Working: What is all this digitalisation about?Lynn Dwyer: Smarter Working: What is all this digitalisation about?
Lynn Dwyer: Smarter Working: What is all this digitalisation about?
Association for Project Management
 
Being a digital communication superstar
Being a digital communication superstarBeing a digital communication superstar
Being a digital communication superstar
Ron McFarland
 
Digital Strategy for future business
Digital Strategy for future businessDigital Strategy for future business
Digital Strategy for future business
Ashish Bhasin
 
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
Larry Yokell
 
Big Data & IoT. Opportunities and challenges
Big Data & IoT. Opportunities and challengesBig Data & IoT. Opportunities and challenges
Big Data & IoT. Opportunities and challenges
MediaTek Labs
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
Priyesh Patel
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
Connotate
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
marukanda
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Cloudera, Inc.
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
Bala Iyer
 
Chris Day VP IT Transformation and Office of the CIO at AstraZeneca
Chris Day VP IT Transformation and Office of the CIO at AstraZenecaChris Day VP IT Transformation and Office of the CIO at AstraZeneca
Chris Day VP IT Transformation and Office of the CIO at AstraZeneca
Steve Ashton
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
Prashant Bhatmule
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
Peter Schleinitz
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
Data Science Milan
 
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Barcoding, Inc.
 
20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research Organization20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research Organization
Gregory Weiss
 
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
MicheleNati
 
GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in Graphdatenbanken
Neo4j
 
Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLA
Paul Barsch
 
Lynn Dwyer: Smarter Working: What is all this digitalisation about?
Lynn Dwyer: Smarter Working: What is all this digitalisation about?Lynn Dwyer: Smarter Working: What is all this digitalisation about?
Lynn Dwyer: Smarter Working: What is all this digitalisation about?
Association for Project Management
 
Being a digital communication superstar
Being a digital communication superstarBeing a digital communication superstar
Being a digital communication superstar
Ron McFarland
 
Digital Strategy for future business
Digital Strategy for future businessDigital Strategy for future business
Digital Strategy for future business
Ashish Bhasin
 
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
FINAL_Autumn 2015 Global AR Council Member Meeting Presentation - Optimizing ...
Larry Yokell
 
Big Data & IoT. Opportunities and challenges
Big Data & IoT. Opportunities and challengesBig Data & IoT. Opportunities and challenges
Big Data & IoT. Opportunities and challenges
MediaTek Labs
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
Priyesh Patel
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
Connotate
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
marukanda
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Cloudera, Inc.
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
Bala Iyer
 
Chris Day VP IT Transformation and Office of the CIO at AstraZeneca
Chris Day VP IT Transformation and Office of the CIO at AstraZenecaChris Day VP IT Transformation and Office of the CIO at AstraZeneca
Chris Day VP IT Transformation and Office of the CIO at AstraZeneca
Steve Ashton
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
Prashant Bhatmule
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
Peter Schleinitz
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
Data Science Milan
 
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Barcoding, Inc.
 
20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research Organization20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research Organization
Gregory Weiss
 
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
Michele Nati - Digital Catapult viewpoint on Industrie 4.0 - Digital Technolo...
MicheleNati
 
Ad

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
Ad

Recently uploaded (20)

Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 

How Nielsen Utilized Databricks for Large-Scale Research and Development with Matt VanLandeghem

  • 1. Matt VanLandeghem, Nielsen How Nielsen Utilized Databricks for Large-scale Research and Development #EUent4
  • 2. About Nielsen • Founded 1923 • Buy & Watch – Buy: Market Research – Watch: Audience Measurement • Not just TV! • Also Radio and Digital, including PC, Mobile, Connected Devices, Digital Audio, Digital TV • Digital Ad spending now meeting/exceeding TV Ad spending 2#EUent4
  • 3. What is Nielsen Digital Ad Ratings (DAR)? • Measurement of computer, mobile, and over-the-top device audience – Comparable to TV ratings – Who is behind the screen? • Advertising campaigns – Primary focus is age/gender demographic breaks – On-Target Delivery (%) is a key metric • Global product – 25 countries 3#EUent4
  • 4. How does DAR work? 4#EUent4 4.3 2.4 2.0 0 1 2 3 4 5 Third-party Demographics Report to ClientAd Impression “Big Data”Mobile, computer, over-the- top Overnight daily reporting of: Unique audience, Ad impressions, On-Target % Nielsen Bias-Correction Adjustment Focus of today’s presentation…
  • 5. Nielsen Adjustments • “Big Data” is not perfect – Needs bias correction – Where the value of Nielsen’s high-quality panels really shines – Nielsen’s panels provide a “truth set” that can be used to develop models that adjust big data • 3 sources of bias – Misrepresentation – Misattribution – Non-coverage • Nielsen adjustments are an active area of Research and Development 5#EUent4
  • 6. Nielsen Adjustments • Metered home PC behavior – Representative sample of U.S. homes – “Medium” data • Production impression data – Big data • What is the best way to create Nielsen adjustments AND test them in a production environment? • Foundation for Nielsen’s Databricks Use Case 6#EUent4
  • 7. Nielsen Business Case • Recently created new DAR adjustment methodologies – Small-scale testing showed the new methodologies are an enhancement over current methodologies • Business requirement: test new methodologies on a large # of campaigns – Need to understand client impact – Large-scale testing could identify corner or edge cases where new methodologies could break down and cause a data quality issue – Small scale testing: ~20 campaigns – Large scale testing: ~4000 campaigns 7#EUent4
  • 8. Databricks • Cluster management • Provide a friendly interface to Spark for our Data Scientists – Multiple programming languages – Create adjustment factors • Uses an algorithm not available in SQL – Link to production databases – Apply adjustment factors to production-level data – Analyze data with new adjustment factors applied 8#EUent4
  • 9. Nielsen Business Case 9#EUent4 Aggregated panel data Netezza Cloud -Combine small and large data -Run all analyses in one place using PySpark/Spark SQL Data Lake Oracle Aggregated production data
  • 12. Nielsen Business Case • Performance gains: – What would have taken 36 hours with standalone Python only took 1.5 hours in Spark/Databricks – Edge-cases identified • Advantages of one methodology over another also identified – Short turn-around if any revisions to methodology 12#EUent4
  • 13. Nielsen Business Case • Other benefits – Reduced time from idea to deployment – Enhanced support/investigation once deployed • Client data inquiries and issues addressed quicker – Collaboration • Application Development teams • International data science teams • These new methodologies being tested in other products – Enhanced skillsets of data scientists 13#EUent4
  • 14. Summary • At the end of the day, the Databricks/Spark technology allowed us to solve this business use case • The reduced R&D timeline plus extensive testing will allow enhanced methodologies to be available to our clients sooner 14#EUent4
  • 15. Copyright © 2017 The Nielsen Company. Confidential and proprietary. Special thanks: Mala Sivarajan, Anil Singh