SlideShare a Scribd company logo
Testing Spark: Best Practices
Anupama Shetty
Senior SDET, Analytics, Ooyala Inc
Neil Marshall
SDET, Analytics, Ooyala Inc
Spark Summit 2014
Agenda - Anu
1. Application Overview
● Batch mode
● Streaming mode with kafka
2. Test Overview
● Test environment setup
● Unit testing spark applications
● Integration testing spark applications
3. Best Practices
● Code coverage support with scoverage and scct
● Auto build trigger using jenkins hook via github
Agenda - Neil
4. Performance testing of Spark
● Architecture & technology overview
● Performance testing setup & run
● Result analysis
● Best practices
Company Overview
● Founded in 2007
● 300+ employees worldwide
● Global footprint of 200M unique users in 130 countries
● Ooyala works with the most successful broadcasts and media
companies in the world
● Reach, measure, monetize video business
● Cross-device video analytics and monetization products and
services
Application Overview
● Analytics ETL pipeline service
● Receives 5B+ player generated events such as plays, displays
on a daily basis.
● Computed metrics include player conversion rate, video
conversion rate and engagement metrics.
● Third party services used are
○ Spark 1.0 used to process player generated big data.
○ Kafka 0.9.0 with Zookeeper as our message queue
○ CDH5 HDFS as our intermediate storage file system
Spark based Log Processor details
● Supports two input data formats
○ Json
○ Thrift
● Batch Mode Support
○ Uses Spark Context
○ Consumes input data via a text file
● Streaming Mode Support
○ Uses Spark streaming context
○ Consumes data via kafka stream
Test pipeline setup
● Player simulation done using Watir (ruby gem based on
Selenium).
● Kafka(with zookeeper) setup as local virtual machine using
vagrant. VMs can be monitored using VirtualBox.
● Spark cluster run in local mode.
Unit test setup - Spark in Batch mode
● Spark cluster setup for testing
○ Build your spark application jar using `sbt “assembly”`
○ Create config with spark.jar set to application jar and spark.master to “local”
■ var config = ConfigFactory parseString """spark.jar = "target/scala-2.10
/SparkLogProcessor.jar",spark.master = "local" """
○ Store local spark directory path for spark context creation
■ val sparkDir = <path to local spark directory> + “spark-0.9.0-
incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.
10-0.9.0-incubating-hadoop2.2.0.jar").mkString
● Creating spark context
○ var sc: SparkContext = new SparkContext("local", getClass.getSimpleName,
sparkDir, List(config.getString("spark.jar")))
Test Setup for batch mode using Spark Context
Before block
After block
Scala test framework “FunSpec” is
used with “ShouldMatchers” (for
assertions) and “BeforeAndAfter”
(for setup/teardown).
Kafka setup for spark streaming
● Bring up Kafka virtual
machine using Vagrantfile
with following command
`vagrant up kafkavm`
● Configure Kafka
○ Create topic
■ `bin/kafka-create-topic.sh --zookeeper "localhost:2181" --topic "thrift_pings"`
○ Consume messages using
■ `bin/kafka-console-consumer.sh --zookeeper "localhost:2181" --topic "thrift_pings" --group
"testThrift" &>/tmp/thrift-consumer-msgs.log &`
Testing streaming mode with
Spark Streaming Context
Test ‘After’ block and assertion block for spark streaming
mode
After Block Test Assertion
Testing best practices - Code Coverage
● Tracking code coverage with Scoverage and/or Scct
● Enable fork = true to avoid spark exceptions caused by spark context conflicts.
● SCCT configurations
○ ScctPlugin.instrumentSettings
○ parallelExecution in ScctTest := false
○ fork in ScctTest := true
○ Command to run it - `sbt “scct:test”`
● Scoverage configurations
○ ScoverageSbtPlugin.instrumentSettings
○ ScoverageSbtPlugin.ScoverageKeys.excludedPackages in
ScoverageSbtPlugin.scoverage := ".*benchmark.*;.*util.*”
○ parallelExecution in ScoverageSbtPlugin.scoverageTest := false
○ fork in ScoverageSbtPlugin.scoverageTest := true
○ Command to run it - `sbt “scoverage:test”`
Testing best practices - Jenkins auto test build
trigger
● Requires enabling 'github-webhook' on github repo settings
page. Requires admin access for the repo.
● Jenkins job should be configured with corresponding github
repo via “GitHub Project” field.
● Test jenkins hook by triggering a test run from github repo.
● "Github pull request builder" can be used while configuring
jenkins job to auto publish test results on github pull requests
after every test run. This also lets you rerun failed tests via
github pull request.
What is a performance testing?
● A practice striving to build performance into the
implementation, design and architecture of a
system.
● Determine how a system performs in terms of
responsiveness and stability under a particular
workload.
● Can serve to investigate, measure, validate or verify
other quality attributes of a system, such as
scalability, reliability and resource usage.
What is a Gatling?
● Stress test tool
Why is Gatling selected over other
Perf Test tools as JMeter?
● Powerful scripting using Scala
● Akka + Netty
● Run multiple scenarios in one simulation
● Scenarios = code + DSL
● Graphical reports with clear & concise
graphs
How does Gatling work with Spark
● Access Web applications / services
Develop & setup a simple perf test example
A perf test will run against spark-jobserver for
word counts.
What is a spark jobserver?
● Provides a RESTful interface for submitting and
managing Apache Spark jobs, jars and job
contexts
● Scala 2.10 + CDH5/Hadoop 2.2 + Spark 0.9.0
● For more depths on jobserver, see Evan Chan
& Kelvin Chu’s Spark Query Service
presentation.
Steps to set up & run Spark-jobserver
● Clone spark-jobserver from git-hub
● Install SBT and type “sbt” in the spark-
jobserver repo
● From SBT shell, simply type “re-start”
$ git clone https://github.com/ooyala/spark-jobserver
> re-start
$ sbt
Steps to package & upload a jar to the
jobserver
● Package the test jar of the word count
example
● Upload the jar to the jobserver
$ curl --data-binary @job-server-tests/target/job-server-tests-0.3.1.jar
localhost:8090/jars/test
$ sbt job-server-tests/package
Run a request against the jobserver
$ curl -d "input.string = a b c a b see" 'http://localhost:8090/jobs?
appName=test&classPath=spark.jobserver.WordCountExample&sync=true'
{
"status": "OK",
"result": {
"a": 2,
"b": 2,
"c": 1,
"see": 1
}
}⏎
Source code of Word Count Example
Script Gatling for the Word Count Example
Scenario defines steps that Gatling does during
a runtime:
Script Gatling for the Word Count Example
Setup puts users and scenarios as workflows plus
assertions together in a performance test simulation
● Inject 10 users in 10 seconds into scenarios in 2
cycles
● Ensure successful requests greater than 80%
Test Results in Terminal Window
Gatling Graph - Indicator
Gatling Graph - Active Sessions
Best Practices on Performance Tests
● Run performance tests on Jenkins
● Set up baselines for any of performance
tests with different scenarios & users
Any Questions?
References
Contact Info:
Anupama Shetty: anupama@ooyala.com
Neil Marshall: nmarshall@ooyala.com
References:
http://www.slideshare.net/AnuShetty/spark-summit2014-techtalk-testing-spark
Ad

More Related Content

What's hot (20)

Performance tests with Gatling
Performance tests with GatlingPerformance tests with Gatling
Performance tests with Gatling
Andrzej Ludwikowski
 
NANO266 - Lecture 9 - Tools of the Modeling Trade
NANO266 - Lecture 9 - Tools of the Modeling TradeNANO266 - Lecture 9 - Tools of the Modeling Trade
NANO266 - Lecture 9 - Tools of the Modeling Trade
University of California, San Diego
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Mark Bittmann
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
Tempest scenariotests 20140512
Tempest scenariotests 20140512Tempest scenariotests 20140512
Tempest scenariotests 20140512
Masayuki Igawa
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsOptimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Chris Fregly
 
Whatthestack using Tempest for testing your OpenStack deployment
Whatthestack using Tempest for testing your OpenStack deploymentWhatthestack using Tempest for testing your OpenStack deployment
Whatthestack using Tempest for testing your OpenStack deployment
Christian Schwede
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
guest1f2740
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016
Chris Fregly
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
J On The Beach
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
MLconf
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
Spark Summit
 
Reactive programming with RxJava
Reactive programming with RxJavaReactive programming with RxJava
Reactive programming with RxJava
Jobaer Chowdhury
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
Brendan Gregg
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
Brendan Gregg
 
GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!
Matthew Broberg
 
H2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi MehtaH2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi Mehta
Sri Ambati
 
KrakenJS
KrakenJSKrakenJS
KrakenJS
PayPal
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
Kazuaki Ishizaki
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Mark Bittmann
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
Tempest scenariotests 20140512
Tempest scenariotests 20140512Tempest scenariotests 20140512
Tempest scenariotests 20140512
Masayuki Igawa
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsOptimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Chris Fregly
 
Whatthestack using Tempest for testing your OpenStack deployment
Whatthestack using Tempest for testing your OpenStack deploymentWhatthestack using Tempest for testing your OpenStack deployment
Whatthestack using Tempest for testing your OpenStack deployment
Christian Schwede
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
guest1f2740
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016
Chris Fregly
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
J On The Beach
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
MLconf
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
Spark Summit
 
Reactive programming with RxJava
Reactive programming with RxJavaReactive programming with RxJava
Reactive programming with RxJava
Jobaer Chowdhury
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
Brendan Gregg
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
Brendan Gregg
 
GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!
Matthew Broberg
 
H2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi MehtaH2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi Mehta
Sri Ambati
 
KrakenJS
KrakenJSKrakenJS
KrakenJS
PayPal
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
Kazuaki Ishizaki
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 

Viewers also liked (20)

Production Readiness Testing Using Spark
Production Readiness Testing Using SparkProduction Readiness Testing Using Spark
Production Readiness Testing Using Spark
Salesforce Engineering
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Patrick Wendell
 
Scala Bootcamp 1
Scala Bootcamp 1Scala Bootcamp 1
Scala Bootcamp 1
Knoldus Inc.
 
Coscup 2013 : Continuous Integration on top of hadoop
Coscup 2013 : Continuous Integration on top of hadoopCoscup 2013 : Continuous Integration on top of hadoop
Coscup 2013 : Continuous Integration on top of hadoop
Wisely chen
 
Spark to Production @Windward
Spark to Production @WindwardSpark to Production @Windward
Spark to Production @Windward
Demi Ben-Ari
 
Scala for dummies
Scala for dummiesScala for dummies
Scala for dummies
Javier Santos Paniego
 
Streaming ETL for All
Streaming ETL for AllStreaming ETL for All
Streaming ETL for All
Joey Echeverria
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark Summit
 
Test Automation and Continuous Integration
Test Automation and Continuous Integration Test Automation and Continuous Integration
Test Automation and Continuous Integration
TestCampRO
 
Distributed Testing Environment
Distributed Testing EnvironmentDistributed Testing Environment
Distributed Testing Environment
Łukasz Morawski
 
Production Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlibProduction Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlib
Spark Summit
 
Running Spark in Production
Running Spark in ProductionRunning Spark in Production
Running Spark in Production
DataWorks Summit/Hadoop Summit
 
天猫后端技术架构优化实践
天猫后端技术架构优化实践天猫后端技术架构优化实践
天猫后端技术架构优化实践
drewz lin
 
Beyond Parallelize and Collect by Holden Karau
Beyond Parallelize and Collect by Holden KarauBeyond Parallelize and Collect by Holden Karau
Beyond Parallelize and Collect by Holden Karau
Spark Summit
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To Production
Jen Aman
 
오픈 소스 도구를 활용한 성능 테스트 방법 및 사례
오픈 소스 도구를 활용한 성능 테스트 방법 및 사례오픈 소스 도구를 활용한 성능 테스트 방법 및 사례
오픈 소스 도구를 활용한 성능 테스트 방법 및 사례
MinWoo Byeon
 
Deep dive into sass
Deep dive into sassDeep dive into sass
Deep dive into sass
Knoldus Inc.
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
Spark Summit
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And Techniques
Knoldus Inc.
 
Production Readiness Testing Using Spark
Production Readiness Testing Using SparkProduction Readiness Testing Using Spark
Production Readiness Testing Using Spark
Salesforce Engineering
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Patrick Wendell
 
Coscup 2013 : Continuous Integration on top of hadoop
Coscup 2013 : Continuous Integration on top of hadoopCoscup 2013 : Continuous Integration on top of hadoop
Coscup 2013 : Continuous Integration on top of hadoop
Wisely chen
 
Spark to Production @Windward
Spark to Production @WindwardSpark to Production @Windward
Spark to Production @Windward
Demi Ben-Ari
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark Summit
 
Test Automation and Continuous Integration
Test Automation and Continuous Integration Test Automation and Continuous Integration
Test Automation and Continuous Integration
TestCampRO
 
Distributed Testing Environment
Distributed Testing EnvironmentDistributed Testing Environment
Distributed Testing Environment
Łukasz Morawski
 
Production Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlibProduction Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlib
Spark Summit
 
天猫后端技术架构优化实践
天猫后端技术架构优化实践天猫后端技术架构优化实践
天猫后端技术架构优化实践
drewz lin
 
Beyond Parallelize and Collect by Holden Karau
Beyond Parallelize and Collect by Holden KarauBeyond Parallelize and Collect by Holden Karau
Beyond Parallelize and Collect by Holden Karau
Spark Summit
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To Production
Jen Aman
 
오픈 소스 도구를 활용한 성능 테스트 방법 및 사례
오픈 소스 도구를 활용한 성능 테스트 방법 및 사례오픈 소스 도구를 활용한 성능 테스트 방법 및 사례
오픈 소스 도구를 활용한 성능 테스트 방법 및 사례
MinWoo Byeon
 
Deep dive into sass
Deep dive into sassDeep dive into sass
Deep dive into sass
Knoldus Inc.
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
Spark Summit
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And Techniques
Knoldus Inc.
 
Ad

Similar to Spark summit2014 techtalk - testing spark (20)

Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Openshift operator insight
Openshift operator insightOpenshift operator insight
Openshift operator insight
Ryan ZhangCheng
 
Apache Spark SQL- Installing Spark
Apache Spark SQL- Installing SparkApache Spark SQL- Installing Spark
Apache Spark SQL- Installing Spark
Experfy
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
IBM
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
Athens Big Data
 
Fullstack workshop
Fullstack workshopFullstack workshop
Fullstack workshop
Assaf Gannon
 
Scala, docker and testing, oh my! mario camou
Scala, docker and testing, oh my! mario camouScala, docker and testing, oh my! mario camou
Scala, docker and testing, oh my! mario camou
J On The Beach
 
Spock
SpockSpock
Spock
Naiyer Asif
 
Greach 2019 - Creating Micronaut Configurations
Greach 2019 - Creating Micronaut ConfigurationsGreach 2019 - Creating Micronaut Configurations
Greach 2019 - Creating Micronaut Configurations
Iván López Martín
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark Jobs
Samir Bessalah
 
Introducing Playwright's New Test Runner
Introducing Playwright's New Test RunnerIntroducing Playwright's New Test Runner
Introducing Playwright's New Test Runner
Applitools
 
One commit, one release. Continuously delivering a Symfony project.
One commit, one release. Continuously delivering a Symfony project.One commit, one release. Continuously delivering a Symfony project.
One commit, one release. Continuously delivering a Symfony project.
Javier López
 
Faster Data Integration Pipeline Execution using Spark-Jobserver
Faster Data Integration Pipeline Execution using Spark-JobserverFaster Data Integration Pipeline Execution using Spark-Jobserver
Faster Data Integration Pipeline Execution using Spark-Jobserver
Databricks
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsTracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Databricks
 
Mastering Grails 3 Plugins - G3 Summit 2016
Mastering Grails 3 Plugins - G3 Summit 2016Mastering Grails 3 Plugins - G3 Summit 2016
Mastering Grails 3 Plugins - G3 Summit 2016
Alvaro Sanchez-Mariscal
 
Continous Delivering a PHP application
Continous Delivering a PHP applicationContinous Delivering a PHP application
Continous Delivering a PHP application
Javier López
 
Gatling Performance Workshop
Gatling Performance WorkshopGatling Performance Workshop
Gatling Performance Workshop
Sai Krishna
 
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesSimplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Openshift operator insight
Openshift operator insightOpenshift operator insight
Openshift operator insight
Ryan ZhangCheng
 
Apache Spark SQL- Installing Spark
Apache Spark SQL- Installing SparkApache Spark SQL- Installing Spark
Apache Spark SQL- Installing Spark
Experfy
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
IBM
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
Athens Big Data
 
Fullstack workshop
Fullstack workshopFullstack workshop
Fullstack workshop
Assaf Gannon
 
Scala, docker and testing, oh my! mario camou
Scala, docker and testing, oh my! mario camouScala, docker and testing, oh my! mario camou
Scala, docker and testing, oh my! mario camou
J On The Beach
 
Greach 2019 - Creating Micronaut Configurations
Greach 2019 - Creating Micronaut ConfigurationsGreach 2019 - Creating Micronaut Configurations
Greach 2019 - Creating Micronaut Configurations
Iván López Martín
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark Jobs
Samir Bessalah
 
Introducing Playwright's New Test Runner
Introducing Playwright's New Test RunnerIntroducing Playwright's New Test Runner
Introducing Playwright's New Test Runner
Applitools
 
One commit, one release. Continuously delivering a Symfony project.
One commit, one release. Continuously delivering a Symfony project.One commit, one release. Continuously delivering a Symfony project.
One commit, one release. Continuously delivering a Symfony project.
Javier López
 
Faster Data Integration Pipeline Execution using Spark-Jobserver
Faster Data Integration Pipeline Execution using Spark-JobserverFaster Data Integration Pipeline Execution using Spark-Jobserver
Faster Data Integration Pipeline Execution using Spark-Jobserver
Databricks
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsTracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Databricks
 
Mastering Grails 3 Plugins - G3 Summit 2016
Mastering Grails 3 Plugins - G3 Summit 2016Mastering Grails 3 Plugins - G3 Summit 2016
Mastering Grails 3 Plugins - G3 Summit 2016
Alvaro Sanchez-Mariscal
 
Continous Delivering a PHP application
Continous Delivering a PHP applicationContinous Delivering a PHP application
Continous Delivering a PHP application
Javier López
 
Gatling Performance Workshop
Gatling Performance WorkshopGatling Performance Workshop
Gatling Performance Workshop
Sai Krishna
 
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesSimplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Databricks
 
Ad

Recently uploaded (20)

some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Journal of Soft Computing in Civil Engineering
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 

Spark summit2014 techtalk - testing spark

  • 1. Testing Spark: Best Practices Anupama Shetty Senior SDET, Analytics, Ooyala Inc Neil Marshall SDET, Analytics, Ooyala Inc Spark Summit 2014
  • 2. Agenda - Anu 1. Application Overview ● Batch mode ● Streaming mode with kafka 2. Test Overview ● Test environment setup ● Unit testing spark applications ● Integration testing spark applications 3. Best Practices ● Code coverage support with scoverage and scct ● Auto build trigger using jenkins hook via github
  • 3. Agenda - Neil 4. Performance testing of Spark ● Architecture & technology overview ● Performance testing setup & run ● Result analysis ● Best practices
  • 4. Company Overview ● Founded in 2007 ● 300+ employees worldwide ● Global footprint of 200M unique users in 130 countries ● Ooyala works with the most successful broadcasts and media companies in the world ● Reach, measure, monetize video business ● Cross-device video analytics and monetization products and services
  • 5. Application Overview ● Analytics ETL pipeline service ● Receives 5B+ player generated events such as plays, displays on a daily basis. ● Computed metrics include player conversion rate, video conversion rate and engagement metrics. ● Third party services used are ○ Spark 1.0 used to process player generated big data. ○ Kafka 0.9.0 with Zookeeper as our message queue ○ CDH5 HDFS as our intermediate storage file system
  • 6. Spark based Log Processor details ● Supports two input data formats ○ Json ○ Thrift ● Batch Mode Support ○ Uses Spark Context ○ Consumes input data via a text file ● Streaming Mode Support ○ Uses Spark streaming context ○ Consumes data via kafka stream
  • 7. Test pipeline setup ● Player simulation done using Watir (ruby gem based on Selenium). ● Kafka(with zookeeper) setup as local virtual machine using vagrant. VMs can be monitored using VirtualBox. ● Spark cluster run in local mode.
  • 8. Unit test setup - Spark in Batch mode ● Spark cluster setup for testing ○ Build your spark application jar using `sbt “assembly”` ○ Create config with spark.jar set to application jar and spark.master to “local” ■ var config = ConfigFactory parseString """spark.jar = "target/scala-2.10 /SparkLogProcessor.jar",spark.master = "local" """ ○ Store local spark directory path for spark context creation ■ val sparkDir = <path to local spark directory> + “spark-0.9.0- incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2. 10-0.9.0-incubating-hadoop2.2.0.jar").mkString ● Creating spark context ○ var sc: SparkContext = new SparkContext("local", getClass.getSimpleName, sparkDir, List(config.getString("spark.jar")))
  • 9. Test Setup for batch mode using Spark Context Before block After block Scala test framework “FunSpec” is used with “ShouldMatchers” (for assertions) and “BeforeAndAfter” (for setup/teardown).
  • 10. Kafka setup for spark streaming ● Bring up Kafka virtual machine using Vagrantfile with following command `vagrant up kafkavm` ● Configure Kafka ○ Create topic ■ `bin/kafka-create-topic.sh --zookeeper "localhost:2181" --topic "thrift_pings"` ○ Consume messages using ■ `bin/kafka-console-consumer.sh --zookeeper "localhost:2181" --topic "thrift_pings" --group "testThrift" &>/tmp/thrift-consumer-msgs.log &`
  • 11. Testing streaming mode with Spark Streaming Context
  • 12. Test ‘After’ block and assertion block for spark streaming mode After Block Test Assertion
  • 13. Testing best practices - Code Coverage ● Tracking code coverage with Scoverage and/or Scct ● Enable fork = true to avoid spark exceptions caused by spark context conflicts. ● SCCT configurations ○ ScctPlugin.instrumentSettings ○ parallelExecution in ScctTest := false ○ fork in ScctTest := true ○ Command to run it - `sbt “scct:test”` ● Scoverage configurations ○ ScoverageSbtPlugin.instrumentSettings ○ ScoverageSbtPlugin.ScoverageKeys.excludedPackages in ScoverageSbtPlugin.scoverage := ".*benchmark.*;.*util.*” ○ parallelExecution in ScoverageSbtPlugin.scoverageTest := false ○ fork in ScoverageSbtPlugin.scoverageTest := true ○ Command to run it - `sbt “scoverage:test”`
  • 14. Testing best practices - Jenkins auto test build trigger ● Requires enabling 'github-webhook' on github repo settings page. Requires admin access for the repo. ● Jenkins job should be configured with corresponding github repo via “GitHub Project” field. ● Test jenkins hook by triggering a test run from github repo. ● "Github pull request builder" can be used while configuring jenkins job to auto publish test results on github pull requests after every test run. This also lets you rerun failed tests via github pull request.
  • 15. What is a performance testing? ● A practice striving to build performance into the implementation, design and architecture of a system. ● Determine how a system performs in terms of responsiveness and stability under a particular workload. ● Can serve to investigate, measure, validate or verify other quality attributes of a system, such as scalability, reliability and resource usage.
  • 16. What is a Gatling? ● Stress test tool
  • 17. Why is Gatling selected over other Perf Test tools as JMeter? ● Powerful scripting using Scala ● Akka + Netty ● Run multiple scenarios in one simulation ● Scenarios = code + DSL ● Graphical reports with clear & concise graphs
  • 18. How does Gatling work with Spark ● Access Web applications / services
  • 19. Develop & setup a simple perf test example A perf test will run against spark-jobserver for word counts.
  • 20. What is a spark jobserver? ● Provides a RESTful interface for submitting and managing Apache Spark jobs, jars and job contexts ● Scala 2.10 + CDH5/Hadoop 2.2 + Spark 0.9.0 ● For more depths on jobserver, see Evan Chan & Kelvin Chu’s Spark Query Service presentation.
  • 21. Steps to set up & run Spark-jobserver ● Clone spark-jobserver from git-hub ● Install SBT and type “sbt” in the spark- jobserver repo ● From SBT shell, simply type “re-start” $ git clone https://github.com/ooyala/spark-jobserver > re-start $ sbt
  • 22. Steps to package & upload a jar to the jobserver ● Package the test jar of the word count example ● Upload the jar to the jobserver $ curl --data-binary @job-server-tests/target/job-server-tests-0.3.1.jar localhost:8090/jars/test $ sbt job-server-tests/package
  • 23. Run a request against the jobserver $ curl -d "input.string = a b c a b see" 'http://localhost:8090/jobs? appName=test&classPath=spark.jobserver.WordCountExample&sync=true' { "status": "OK", "result": { "a": 2, "b": 2, "c": 1, "see": 1 } }⏎
  • 24. Source code of Word Count Example
  • 25. Script Gatling for the Word Count Example Scenario defines steps that Gatling does during a runtime:
  • 26. Script Gatling for the Word Count Example Setup puts users and scenarios as workflows plus assertions together in a performance test simulation ● Inject 10 users in 10 seconds into scenarios in 2 cycles ● Ensure successful requests greater than 80%
  • 27. Test Results in Terminal Window
  • 28. Gatling Graph - Indicator
  • 29. Gatling Graph - Active Sessions
  • 30. Best Practices on Performance Tests ● Run performance tests on Jenkins ● Set up baselines for any of performance tests with different scenarios & users
  • 32. References Contact Info: Anupama Shetty: [email protected] Neil Marshall: [email protected] References: http://www.slideshare.net/AnuShetty/spark-summit2014-techtalk-testing-spark