SlideShare a Scribd company logo
Testing Spark and scala
https://github.com/ganeshayadiyala/Scalatest-library-to-unit-test-spark/
● Ganesha Yadiyala
● Big data consultant at
datamantra.io
● Consult in spark and scala
● ganeshayadiyala@gmail.com
Agenda
● What is testing
● Different types of testing process
● Unit tests using scalatest
● Different styles in scalatest
● Using assertions
● Sharing fixtures
● Matchers
● Async Testing
● Testing of spark batch operation
● Unit testing streaming operation
What is testing
Software testing is a process of executing a program or application with the intent
of finding the software bugs.
It can also be stated as the process of validating and verifying that a software
application,
● Meets the business and technical requirements that guided it’s design and
development
● Works as expected
Few of the types of tests
● Unit tests
● Integration tests
● Functional tests
Unit tests
● Unit testing simply verifies that individual units of code (mostly functions) work
as expected
● Assumes everything else works
● Tests one specific condition or flow.
Advantages :
● Codes are more reusable. In order to make unit testing possible, codes need
to be modular. This means that codes are easier to reuse.
● Debugging is easy. When a test fails, only the latest changes need to be
debugged.
Integration tests
● Tests the interoperability of multiple subsystem
● Includes real components, databases etc
● Tests the connectivity of the components
● Hard to test all the cases (combination of tests are more)
● Hard to localize the errors ( may break different reasons)
● Much slower than unit tests
Functional tests
● Functional Testing is the type of testing done against the business
requirements of application
● Use real components and real data
Unit Test in scala
Scalatest
● We use scalatest for unit tests in scala
● For every class in src/main/scala write a test class in src/test/scala
● Consists of suite (collection of test cases)
● You define test classes by composing Suite style and mixin traits.
● You can test both scala and java code
● offers deep integration with tools such as JUnit, TestNG, Ant, Maven, sbt,
ScalaCheck, JMock, EasyMock, Mockito, ScalaMock, Selenium, Eclipse,
NetBeans, and IntelliJ.
Using the scalatest maven plugin
We have to disable maven surefire plugin and enable scalatest plugin
● Specify <skipTests>true</skipTests> in maven surefire plugin
● Add the scalatest-maven plugin and set the goals to test
Different styles in scalatest
● FunSuite
● FlatSpec
● FunSpec
● WordSpec
● FreeSpec
● PropSpec
● FeatureSpec
FunSuite
● In a FunSuite, tests are function values.
● You denote tests with test and provide the name of the test as a string
enclosed in parentheses, followed by the code of the test in curly braces
Ex : com.ganesh.scalatest.specs.FunSuitTest.scala
FlatSpec
● No nesting approach contrasts with the traits FunSpec and WordSpec.
● Uses behavior of clause
Ex : com.ganesh.scalatest.specs.FlatSpecTest.scala
FunSpec
● Tests are combined with text that specifies the behavior of the test.
● Uses describe clause
Ex : com.ganesh.scalatest.specs.FunSpecTest.scala
WordSpec
● your specification text is structured by placing words after strings
● Uses should and in clause
Ex : com.ganesh.scalatest.specs.WordSpecTest.scala
Using Assertions
ScalaTest makes three assertions available by default in any style trait
● assert - for general assertion.
● assertResult - to differentiate expected from actual values.
● assertThrows - to ensure a bit of code throws an expected exception.
Scalatest assertions are defined in trait Assertions. Assertions also provide some
other API’s.
Ex : com.ganesh.scalatest.features.AssertionsTest.scala
Ignoring the test
● Scalatest allows to ignore the test.
● We can ignore the test if we want it to change it implementation and run later
or if the test case is slow.
● We use ignore clause to ignore the test
● We use @Ignore annotation to ignore all the test in a suite.
Ex : com.ganesh.scalatest.features.IgnoreTest.scala
Sharing fixture
A test fixture is composed of the objects and other artifacts, which tests use to do
their work.
When multiple tests needs to work with the same fixture, we can share the fixture
between them.
It will reduce the duplication of code.
By calling get-fixture methods
If you need to create the same mutable fixture objects in multiple tests we can use
get-fixture method
● A get-fixture method returns a new instance of a needed fixture object each
time it is called
● Not appropriate to use if we need to cleanup those objects
Ex : com.ganesh.scalatest.fixtures.GetFixtureTest.scala
By Instantiating fixture-context objects
When different tests need different combinations of fixture objects, define the
fixture objects as instance variables of fixture-context objects.
● In this approach we initialize a fixture object inside trait/class.
● We create a new instance of the fixture trait in the test we need them.
● We can even mix in these fixture traits we created.
Ex : com.ganesh.scalatest.fixtures.FixtureContextTest.scala
By using withFixture
● Allows cleaning up of fixtures at the end of the tests
● If we have no object to pass to the test case, then we can use
withFixture(NoArgTest).
● If we have one or more objects to be passed to test case, then we need to
use withFixture(OneArgTest).
Ex : com.ganesh.scalatest.fixtures.WithFicture*.scala
By using BeforeAndAfter
● Methods which we used till now for sharing fixtures are performed during the
test.
● If exception occurs while creating this fixture then it’ll be reported as test
failure.
● If we use BeforeAndAfter setup happens before the test execution starts, and
cleanup happens once the test is completed
● So if any exception happens in the setup, it’ll abort the entire suit and no more
tests are attempted.
Ex : com.ganesh.scalatest.fixtures.BeforeAndAfterTest.scala
Matchers
ScalaTest provides a domain specific language (DSL) for expressing assertions in
tests using the word should.
Ex : com.ganesh.scalatest.features.MatchersTest.scala
Asynchronous testing
● Given a Future returned by the code you are testing, you need not block until
the Future completes before performing assertions against its value.
● We can instead map those assertions onto the Future and return the resulting
Future[Assertion] to ScalaTest.
● This result is executed asynchronously.
Ex : com.ganesh.scalatest.features.AsyncTest.scala
Testing private methods
● If the method is private in a class we can test it using scalatest.
● We can use PrivateMethodTester trait to achieve this.
● We can use invokePrivate operator to call the private method
Ex : com.ganesh.scalatest.features.PrivateMethodTest.scala
Mocking
Scalatest supports following mock libraries,
● ScalaMock
● EasyMock
● JMock
● Mockito
Ex : com.ganesh.scalatest.mock.MockTest.scala
Testing Spark
Complexities
● Needs spark context for all the tests
● Testing operations such as map, flatmap and reduce.
● Testing streaming application (Dstream operations).
● Making sure that there is only one context for each test case.
Setup
● Instead of creating contexts which are needed for each test suite, we create
the trait which extends BeforeAndAfter, and all our suites will extend this trait.
● In that trait we try to initialize all the contexts in before method
● All the contexts will be destroyed in after method
● Extend this trait in all the test suites
Ex : com.ganesh.scalatest.sparkbatch.EnvironmentInitializerSC.scala
Spark Streaming test
● The full control over clock is needed to manually manage batches, slides and
windows.
● Spark Streaming provides necessary abstraction over system clock,
ManualClock class.
● But its private class, we cannot access it in our testcases
● So we use a wrapper class to use the ManualClock instance in our test case.
Ex : com.ganesh.scalatest.sparkstreaming
Summary
● We can select any of the styles provided by the scalatest, it just differs in how
we write test but will have all the features.
● Make use of assertions and matchers provided by scalatest for better test
cases.
● While testing spark we need to test the logic, so keep your code modular so
that each logic can be tested individually.
● There is a external library called spark testing base which provides many
functions to assert on dataframe level and it has traits which provides you the
contexts needed for the test.
References
● http://www.scalatest.org/
● http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/
● https://www.slideshare.net/remeniuk/testing-in-scala-adform-research
Ad

More Related Content

What's hot (20)

An introduction to Google test framework
An introduction to Google test frameworkAn introduction to Google test framework
An introduction to Google test framework
Abner Chih Yi Huang
 
Arrays in Java
Arrays in Java Arrays in Java
Arrays in Java
Hitesh-Java
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Databricks
 
Cucumber_Training_ForQA
Cucumber_Training_ForQACucumber_Training_ForQA
Cucumber_Training_ForQA
Meenakshi Singhal
 
Functional programming
Functional programmingFunctional programming
Functional programming
ijcd
 
Basic Guide to Manual Testing
Basic Guide to Manual TestingBasic Guide to Manual Testing
Basic Guide to Manual Testing
Hiral Gosani
 
Java variable types
Java variable typesJava variable types
Java variable types
Soba Arjun
 
What is JUnit? | Edureka
What is JUnit? | EdurekaWhat is JUnit? | Edureka
What is JUnit? | Edureka
Edureka!
 
Scala test
Scala testScala test
Scala test
Inphina Technologies
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data Validation
Databricks
 
Performance Testing with Tsung
Performance Testing with TsungPerformance Testing with Tsung
Performance Testing with Tsung
Opsta
 
software testing
 software testing software testing
software testing
Sara shall
 
Testing Angular
Testing AngularTesting Angular
Testing Angular
Lilia Sfaxi
 
Typescript ppt
Typescript pptTypescript ppt
Typescript ppt
akhilsreyas
 
Threading in C#
Threading in C#Threading in C#
Threading in C#
Medhat Dawoud
 
TypeScript Best Practices
TypeScript Best PracticesTypeScript Best Practices
TypeScript Best Practices
felixbillon
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
How to start performance testing project
How to start performance testing projectHow to start performance testing project
How to start performance testing project
NaveenKumar Namachivayam
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
confluent
 
Unit 1 - TypeScript & Introduction to Angular CLI.pptx
Unit 1 - TypeScript & Introduction to Angular CLI.pptxUnit 1 - TypeScript & Introduction to Angular CLI.pptx
Unit 1 - TypeScript & Introduction to Angular CLI.pptx
Malla Reddy University
 
An introduction to Google test framework
An introduction to Google test frameworkAn introduction to Google test framework
An introduction to Google test framework
Abner Chih Yi Huang
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Databricks
 
Functional programming
Functional programmingFunctional programming
Functional programming
ijcd
 
Basic Guide to Manual Testing
Basic Guide to Manual TestingBasic Guide to Manual Testing
Basic Guide to Manual Testing
Hiral Gosani
 
Java variable types
Java variable typesJava variable types
Java variable types
Soba Arjun
 
What is JUnit? | Edureka
What is JUnit? | EdurekaWhat is JUnit? | Edureka
What is JUnit? | Edureka
Edureka!
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data Validation
Databricks
 
Performance Testing with Tsung
Performance Testing with TsungPerformance Testing with Tsung
Performance Testing with Tsung
Opsta
 
software testing
 software testing software testing
software testing
Sara shall
 
TypeScript Best Practices
TypeScript Best PracticesTypeScript Best Practices
TypeScript Best Practices
felixbillon
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
confluent
 
Unit 1 - TypeScript & Introduction to Angular CLI.pptx
Unit 1 - TypeScript & Introduction to Angular CLI.pptxUnit 1 - TypeScript & Introduction to Angular CLI.pptx
Unit 1 - TypeScript & Introduction to Angular CLI.pptx
Malla Reddy University
 

Similar to Testing Spark and Scala (20)

Scala test
Scala testScala test
Scala test
Meetu Maltiar
 
Getting started with karate dsl
Getting started with karate dslGetting started with karate dsl
Getting started with karate dsl
Knoldus Inc.
 
Angular Unit testing.pptx
Angular Unit testing.pptxAngular Unit testing.pptx
Angular Unit testing.pptx
RiyaBangera
 
JAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & JasmineJAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & Jasmine
Anup Singh
 
Cypress Testing.pptx
Cypress Testing.pptxCypress Testing.pptx
Cypress Testing.pptx
JasmeenShrestha
 
JUnit- A Unit Testing Framework
JUnit- A Unit Testing FrameworkJUnit- A Unit Testing Framework
JUnit- A Unit Testing Framework
Onkar Deshpande
 
S313352 optimizing java device testing with automatic feature discovering
S313352 optimizing java device testing with automatic feature discoveringS313352 optimizing java device testing with automatic feature discovering
S313352 optimizing java device testing with automatic feature discovering
romanovfedor
 
Java Unit Test - JUnit
Java Unit Test - JUnitJava Unit Test - JUnit
Java Unit Test - JUnit
Aktuğ Urun
 
Unit testing
Unit testingUnit testing
Unit testing
Pooya Sagharchiha
 
Unit Testing and Coverage for AngularJS
Unit Testing and Coverage for AngularJSUnit Testing and Coverage for AngularJS
Unit Testing and Coverage for AngularJS
Knoldus Inc.
 
[FullStack NYC 2019] Effective Unit Tests for JavaScript
[FullStack NYC 2019] Effective Unit Tests for JavaScript[FullStack NYC 2019] Effective Unit Tests for JavaScript
[FullStack NYC 2019] Effective Unit Tests for JavaScript
Hazem Saleh
 
Unit testing in xcode 8 with swift
Unit testing in xcode 8 with swiftUnit testing in xcode 8 with swift
Unit testing in xcode 8 with swift
allanh0526
 
Intro to junit
Intro to junitIntro to junit
Intro to junit
Rakesh Srivastava
 
Kirill Rozin - Practical Wars for Automatization
Kirill Rozin - Practical Wars for AutomatizationKirill Rozin - Practical Wars for Automatization
Kirill Rozin - Practical Wars for Automatization
Sergey Arkhipov
 
Automation for developers
Automation for developersAutomation for developers
Automation for developers
Dharshana Kasun Warusavitharana
 
Annotations
AnnotationsAnnotations
Annotations
Knoldus Inc.
 
Unit testing
Unit testingUnit testing
Unit testing
Panos Pnevmatikatos
 
Unit Testing in Angular
Unit Testing in AngularUnit Testing in Angular
Unit Testing in Angular
Knoldus Inc.
 
Wso2 test automation framework internal training
Wso2 test automation framework internal trainingWso2 test automation framework internal training
Wso2 test automation framework internal training
Dharshana Kasun Warusavitharana
 
Quick tour to front end unit testing using jasmine
Quick tour to front end unit testing using jasmineQuick tour to front end unit testing using jasmine
Quick tour to front end unit testing using jasmine
Gil Fink
 
Getting started with karate dsl
Getting started with karate dslGetting started with karate dsl
Getting started with karate dsl
Knoldus Inc.
 
Angular Unit testing.pptx
Angular Unit testing.pptxAngular Unit testing.pptx
Angular Unit testing.pptx
RiyaBangera
 
JAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & JasmineJAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & Jasmine
Anup Singh
 
JUnit- A Unit Testing Framework
JUnit- A Unit Testing FrameworkJUnit- A Unit Testing Framework
JUnit- A Unit Testing Framework
Onkar Deshpande
 
S313352 optimizing java device testing with automatic feature discovering
S313352 optimizing java device testing with automatic feature discoveringS313352 optimizing java device testing with automatic feature discovering
S313352 optimizing java device testing with automatic feature discovering
romanovfedor
 
Java Unit Test - JUnit
Java Unit Test - JUnitJava Unit Test - JUnit
Java Unit Test - JUnit
Aktuğ Urun
 
Unit Testing and Coverage for AngularJS
Unit Testing and Coverage for AngularJSUnit Testing and Coverage for AngularJS
Unit Testing and Coverage for AngularJS
Knoldus Inc.
 
[FullStack NYC 2019] Effective Unit Tests for JavaScript
[FullStack NYC 2019] Effective Unit Tests for JavaScript[FullStack NYC 2019] Effective Unit Tests for JavaScript
[FullStack NYC 2019] Effective Unit Tests for JavaScript
Hazem Saleh
 
Unit testing in xcode 8 with swift
Unit testing in xcode 8 with swiftUnit testing in xcode 8 with swift
Unit testing in xcode 8 with swift
allanh0526
 
Kirill Rozin - Practical Wars for Automatization
Kirill Rozin - Practical Wars for AutomatizationKirill Rozin - Practical Wars for Automatization
Kirill Rozin - Practical Wars for Automatization
Sergey Arkhipov
 
Unit Testing in Angular
Unit Testing in AngularUnit Testing in Angular
Unit Testing in Angular
Knoldus Inc.
 
Quick tour to front end unit testing using jasmine
Quick tour to front end unit testing using jasmineQuick tour to front end unit testing using jasmine
Quick tour to front end unit testing using jasmine
Gil Fink
 
Ad

More from datamantra (20)

Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
datamantra
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
datamantra
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
datamantra
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
datamantra
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
datamantra
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
datamantra
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
datamantra
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
datamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
datamantra
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
datamantra
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
datamantra
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
datamantra
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
datamantra
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
datamantra
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
datamantra
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
datamantra
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
datamantra
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
datamantra
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
datamantra
 
Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
datamantra
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
datamantra
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
datamantra
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
datamantra
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
datamantra
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
datamantra
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
datamantra
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
datamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
datamantra
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
datamantra
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
datamantra
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
datamantra
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
datamantra
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
datamantra
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
datamantra
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
datamantra
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
datamantra
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
datamantra
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
datamantra
 
Ad

Recently uploaded (20)

Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 

Testing Spark and Scala

  • 1. Testing Spark and scala https://github.com/ganeshayadiyala/Scalatest-library-to-unit-test-spark/
  • 2. ● Ganesha Yadiyala ● Big data consultant at datamantra.io ● Consult in spark and scala ● [email protected]
  • 3. Agenda ● What is testing ● Different types of testing process ● Unit tests using scalatest ● Different styles in scalatest ● Using assertions ● Sharing fixtures ● Matchers ● Async Testing ● Testing of spark batch operation ● Unit testing streaming operation
  • 4. What is testing Software testing is a process of executing a program or application with the intent of finding the software bugs. It can also be stated as the process of validating and verifying that a software application, ● Meets the business and technical requirements that guided it’s design and development ● Works as expected
  • 5. Few of the types of tests ● Unit tests ● Integration tests ● Functional tests
  • 6. Unit tests ● Unit testing simply verifies that individual units of code (mostly functions) work as expected ● Assumes everything else works ● Tests one specific condition or flow. Advantages : ● Codes are more reusable. In order to make unit testing possible, codes need to be modular. This means that codes are easier to reuse. ● Debugging is easy. When a test fails, only the latest changes need to be debugged.
  • 7. Integration tests ● Tests the interoperability of multiple subsystem ● Includes real components, databases etc ● Tests the connectivity of the components ● Hard to test all the cases (combination of tests are more) ● Hard to localize the errors ( may break different reasons) ● Much slower than unit tests
  • 8. Functional tests ● Functional Testing is the type of testing done against the business requirements of application ● Use real components and real data
  • 9. Unit Test in scala
  • 10. Scalatest ● We use scalatest for unit tests in scala ● For every class in src/main/scala write a test class in src/test/scala ● Consists of suite (collection of test cases) ● You define test classes by composing Suite style and mixin traits. ● You can test both scala and java code ● offers deep integration with tools such as JUnit, TestNG, Ant, Maven, sbt, ScalaCheck, JMock, EasyMock, Mockito, ScalaMock, Selenium, Eclipse, NetBeans, and IntelliJ.
  • 11. Using the scalatest maven plugin We have to disable maven surefire plugin and enable scalatest plugin ● Specify <skipTests>true</skipTests> in maven surefire plugin ● Add the scalatest-maven plugin and set the goals to test
  • 12. Different styles in scalatest ● FunSuite ● FlatSpec ● FunSpec ● WordSpec ● FreeSpec ● PropSpec ● FeatureSpec
  • 13. FunSuite ● In a FunSuite, tests are function values. ● You denote tests with test and provide the name of the test as a string enclosed in parentheses, followed by the code of the test in curly braces Ex : com.ganesh.scalatest.specs.FunSuitTest.scala
  • 14. FlatSpec ● No nesting approach contrasts with the traits FunSpec and WordSpec. ● Uses behavior of clause Ex : com.ganesh.scalatest.specs.FlatSpecTest.scala
  • 15. FunSpec ● Tests are combined with text that specifies the behavior of the test. ● Uses describe clause Ex : com.ganesh.scalatest.specs.FunSpecTest.scala
  • 16. WordSpec ● your specification text is structured by placing words after strings ● Uses should and in clause Ex : com.ganesh.scalatest.specs.WordSpecTest.scala
  • 17. Using Assertions ScalaTest makes three assertions available by default in any style trait ● assert - for general assertion. ● assertResult - to differentiate expected from actual values. ● assertThrows - to ensure a bit of code throws an expected exception. Scalatest assertions are defined in trait Assertions. Assertions also provide some other API’s. Ex : com.ganesh.scalatest.features.AssertionsTest.scala
  • 18. Ignoring the test ● Scalatest allows to ignore the test. ● We can ignore the test if we want it to change it implementation and run later or if the test case is slow. ● We use ignore clause to ignore the test ● We use @Ignore annotation to ignore all the test in a suite. Ex : com.ganesh.scalatest.features.IgnoreTest.scala
  • 19. Sharing fixture A test fixture is composed of the objects and other artifacts, which tests use to do their work. When multiple tests needs to work with the same fixture, we can share the fixture between them. It will reduce the duplication of code.
  • 20. By calling get-fixture methods If you need to create the same mutable fixture objects in multiple tests we can use get-fixture method ● A get-fixture method returns a new instance of a needed fixture object each time it is called ● Not appropriate to use if we need to cleanup those objects Ex : com.ganesh.scalatest.fixtures.GetFixtureTest.scala
  • 21. By Instantiating fixture-context objects When different tests need different combinations of fixture objects, define the fixture objects as instance variables of fixture-context objects. ● In this approach we initialize a fixture object inside trait/class. ● We create a new instance of the fixture trait in the test we need them. ● We can even mix in these fixture traits we created. Ex : com.ganesh.scalatest.fixtures.FixtureContextTest.scala
  • 22. By using withFixture ● Allows cleaning up of fixtures at the end of the tests ● If we have no object to pass to the test case, then we can use withFixture(NoArgTest). ● If we have one or more objects to be passed to test case, then we need to use withFixture(OneArgTest). Ex : com.ganesh.scalatest.fixtures.WithFicture*.scala
  • 23. By using BeforeAndAfter ● Methods which we used till now for sharing fixtures are performed during the test. ● If exception occurs while creating this fixture then it’ll be reported as test failure. ● If we use BeforeAndAfter setup happens before the test execution starts, and cleanup happens once the test is completed ● So if any exception happens in the setup, it’ll abort the entire suit and no more tests are attempted. Ex : com.ganesh.scalatest.fixtures.BeforeAndAfterTest.scala
  • 24. Matchers ScalaTest provides a domain specific language (DSL) for expressing assertions in tests using the word should. Ex : com.ganesh.scalatest.features.MatchersTest.scala
  • 25. Asynchronous testing ● Given a Future returned by the code you are testing, you need not block until the Future completes before performing assertions against its value. ● We can instead map those assertions onto the Future and return the resulting Future[Assertion] to ScalaTest. ● This result is executed asynchronously. Ex : com.ganesh.scalatest.features.AsyncTest.scala
  • 26. Testing private methods ● If the method is private in a class we can test it using scalatest. ● We can use PrivateMethodTester trait to achieve this. ● We can use invokePrivate operator to call the private method Ex : com.ganesh.scalatest.features.PrivateMethodTest.scala
  • 27. Mocking Scalatest supports following mock libraries, ● ScalaMock ● EasyMock ● JMock ● Mockito Ex : com.ganesh.scalatest.mock.MockTest.scala
  • 29. Complexities ● Needs spark context for all the tests ● Testing operations such as map, flatmap and reduce. ● Testing streaming application (Dstream operations). ● Making sure that there is only one context for each test case.
  • 30. Setup ● Instead of creating contexts which are needed for each test suite, we create the trait which extends BeforeAndAfter, and all our suites will extend this trait. ● In that trait we try to initialize all the contexts in before method ● All the contexts will be destroyed in after method ● Extend this trait in all the test suites Ex : com.ganesh.scalatest.sparkbatch.EnvironmentInitializerSC.scala
  • 31. Spark Streaming test ● The full control over clock is needed to manually manage batches, slides and windows. ● Spark Streaming provides necessary abstraction over system clock, ManualClock class. ● But its private class, we cannot access it in our testcases ● So we use a wrapper class to use the ManualClock instance in our test case. Ex : com.ganesh.scalatest.sparkstreaming
  • 32. Summary ● We can select any of the styles provided by the scalatest, it just differs in how we write test but will have all the features. ● Make use of assertions and matchers provided by scalatest for better test cases. ● While testing spark we need to test the logic, so keep your code modular so that each logic can be tested individually. ● There is a external library called spark testing base which provides many functions to assert on dataframe level and it has traits which provides you the contexts needed for the test.