SlideShare a Scribd company logo
Tonight’s Meetup
Michael Sexton
"Testing Big Data in AWS"
23rd September 2021
Travel Plan
What is Big Data in AWS
What is Testing Big Data in AWS Cloud
Types of Testing
Deployment Testing and Deployments
Tools & Automation
Challenges
Conclusion
Q & A
Testing Big Data in AWS - Sept 2021
What is Big Data?
- IoT, Netflix,
Cybersecurity,
Social Media,
EuroVision,
Traffic on
GoogleMaps,
Machine
Learning, Video,
Communications,
etc.
https://bigdatapath.wordpress.com/
What Work is There?
Senior QA Engineer - Data Analytics Software
- Dublin - Selenium/Python - 65k - 75k
Senior SDET - Big Data/Analytics -
Dublin/Remote - Python, System Testing,
Performance Testing - 80k - 90k
Lead SDET - Big Data/Analytics -
Dublin/Remote - Python, System Testing,
Performance Testing - 90k - 100k
Senior QA Automation Engineer - Data
Management Software - Galway/Remote -
Python/Selenium 60k - 70k Reperio
Human Capital
Who Are the Cloud Providers?
- Various cloud providers: AWS, Azure, Google
Cloud, Alibaba, IBM, Dell, Tencent
What is Big Data in AWS?
Services Provided By AWS
What is Testing Big Data in AWS?
- Testing application that carries data works
well (no anomalies)
- Functional, Performance testing
- Testing of migration from on-prem to the
cloud
- Verifying resultant big data analysis is
correct
Functional Testing
- Testing with varied valid and invalid input
- Boundary cases, Calculations
- Scripts - Latin, Sanskrit, Arabic, encoded
- Testing against existing on-prem results
- Failure cases
Performance and SEcurity Testing
- Data ingestion (many different sources of
data e.g. v1 and v3)
- Data processing and throughput - soak
- Sub-component performance
- Security of pipeline and stored data
- Robustness (no data arrives - what then?)
Pythagorean Cup
Deployment Testing and DEployments
- Deployment Testing (upgrades, timings)
- Code merged to Master branch, deploy
scripts & documentation written & tested
- Are go-to person for devops during
deployment and upgrades
- Staging Environment, Production Environment
- Integration testing with other teams.
Tools & Automation
- AWS EC2 machine with linux/python scripts
- Pytest/Robotframework for automation &
regression testing
- AWS eco system (Athena, CloudWatch, X-Ray,
QuickSight)
- EXCEL (max=1048576 rows) & VLOOKUP
- Pyspark
Challenges
- Large varied dataset
- Automation and scripting skills
- Costs & AWS Knowledge
- Knowing the big picture
- Deployments to staging and production
- Communicating with other teams
Conclusion
- What is Big Data … in AWS?
- What is Big Data Testing in AWS
- Types of Testing (Functional & Performance)
- Deployment Testing & Deployments
- Tools & Automation
- Challenges
Any Questions?
Thanks
Ministry of Testing
Poppulo
Twitter: @MinistryCork

More Related Content

What's hot (20)

PDF
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Databricks
 
KEY
The Secrets of Building Realtime Big Data Systems
nathanmarz
 
PDF
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Databricks
 
PDF
Machine Learning with H2O, Spark, and Python at Strata 2015
Sri Ambati
 
PDF
Apache storm vs. Spark Streaming
P. Taylor Goetz
 
PDF
Machine Learning With H2O vs SparkML
Arnab Biswas
 
PDF
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Databricks
 
PDF
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Lillian Pierson
 
PDF
Scalding - Big Data Programming with Scala
Taewook Eom
 
PPTX
Functional Comparison and Performance Evaluation of Streaming Frameworks
Huafeng Wang
 
PPTX
A Developer’s View into Spark's Memory Model with Wenchen Fan
Databricks
 
PDF
Planet-scale Data Ingestion Pipeline: Bigdam
SATOSHI TAGOMORI
 
PDF
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Databricks
 
PDF
Beyond Parallelize and Collect by Holden Karau
Spark Summit
 
PDF
Hdfs high availability
Hadoop User Group
 
PPTX
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Brian O'Neill
 
PDF
[253] apache ni fi
NAVER D2
 
PDF
AWS Summit Milan - AWS RDS for your data (and your sleep)
Matteo Moretti
 
PDF
Apache Spark Performance: Past, Future and Present
Databricks
 
PDF
Journeys from Kafka to Parquet
DataWorks Summit
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Databricks
 
The Secrets of Building Realtime Big Data Systems
nathanmarz
 
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Databricks
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Sri Ambati
 
Apache storm vs. Spark Streaming
P. Taylor Goetz
 
Machine Learning With H2O vs SparkML
Arnab Biswas
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Databricks
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Lillian Pierson
 
Scalding - Big Data Programming with Scala
Taewook Eom
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Huafeng Wang
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
Databricks
 
Planet-scale Data Ingestion Pipeline: Bigdam
SATOSHI TAGOMORI
 
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Databricks
 
Beyond Parallelize and Collect by Holden Karau
Spark Summit
 
Hdfs high availability
Hadoop User Group
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Brian O'Neill
 
[253] apache ni fi
NAVER D2
 
AWS Summit Milan - AWS RDS for your data (and your sleep)
Matteo Moretti
 
Apache Spark Performance: Past, Future and Present
Databricks
 
Journeys from Kafka to Parquet
DataWorks Summit
 

Similar to Testing Big Data in AWS - Sept 2021 (20)

PPTX
BigData Testing by Shreya Pal
Agile Testing Alliance
 
PDF
Cloud as a Data Platform
Andrei Savu
 
PPTX
Testing In Production (TiP) Advances with Big Data and the Cloud
SOASTA
 
PPTX
Testing In Production (TiP) Advances with Big Data & the Cloud
SOASTA
 
PPTX
Big Data – A New Testing Challenge
TEST Huddle
 
PDF
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago
 
PDF
Infographic Things You Should Know About Big Data Testing
KiwiQA
 
PPTX
Big data solutions on cloud – the way forward
Kiththi Perera
 
PPTX
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Kiththi Perera
 
PPTX
Lessons learned from designing a QA Automation for analytics databases (big d...
Omid Vahdaty
 
PPTX
flight data analysis using big data
Sanjib Mitra
 
PDF
Big Data Analytics Lecture notes pdf notes
cseshahinfatima
 
PPTX
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
Manoj Kolhe
 
PPTX
Big Data Analytics
humerashaziya
 
PPT
Information Security Analytics
Amrit Chhetri
 
PDF
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
QA or the Highway
 
PDF
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Cognizant
 
PPTX
Solving Big Data problems on AWS by Rajnish Malik
Blazeclan Technologies Private Limited
 
PDF
Big data testing (1)
vodqancr
 
PDF
Big Data for Data Scientists - WeCloudData
WeCloudData
 
BigData Testing by Shreya Pal
Agile Testing Alliance
 
Cloud as a Data Platform
Andrei Savu
 
Testing In Production (TiP) Advances with Big Data and the Cloud
SOASTA
 
Testing In Production (TiP) Advances with Big Data & the Cloud
SOASTA
 
Big Data – A New Testing Challenge
TEST Huddle
 
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago
 
Infographic Things You Should Know About Big Data Testing
KiwiQA
 
Big data solutions on cloud – the way forward
Kiththi Perera
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Kiththi Perera
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Omid Vahdaty
 
flight data analysis using big data
Sanjib Mitra
 
Big Data Analytics Lecture notes pdf notes
cseshahinfatima
 
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
Manoj Kolhe
 
Big Data Analytics
humerashaziya
 
Information Security Analytics
Amrit Chhetri
 
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
QA or the Highway
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Cognizant
 
Solving Big Data problems on AWS by Rajnish Malik
Blazeclan Technologies Private Limited
 
Big data testing (1)
vodqancr
 
Big Data for Data Scientists - WeCloudData
WeCloudData
 
Ad

Recently uploaded (20)

PPTX
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
DOCX
INDUSTRIAL BENEFIT FROM MICROSOFT AZURE.docx
writercontent500
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PPTX
Comparative Study of ML Techniques for RealTime Credit Card Fraud Detection S...
Debolina Ghosh
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PDF
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
PDF
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
PDF
5- Global Demography Concepts _ Population Pyramids .pdf
pkhadka824
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
PDF
2025 Global Data Summit - FOM with AI.pdf
Marco Wobben
 
PDF
Group 5_RMB Final Project on circular economy
pgban24anmola
 
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
INDUSTRIAL BENEFIT FROM MICROSOFT AZURE.docx
writercontent500
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
Comparative Study of ML Techniques for RealTime Credit Card Fraud Detection S...
Debolina Ghosh
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
5- Global Demography Concepts _ Population Pyramids .pdf
pkhadka824
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
big data eco system fundamentals of data science
arivukarasi
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
2025 Global Data Summit - FOM with AI.pdf
Marco Wobben
 
Group 5_RMB Final Project on circular economy
pgban24anmola
 
Ad

Testing Big Data in AWS - Sept 2021

  • 1. Tonight’s Meetup Michael Sexton "Testing Big Data in AWS" 23rd September 2021
  • 2. Travel Plan What is Big Data in AWS What is Testing Big Data in AWS Cloud Types of Testing Deployment Testing and Deployments Tools & Automation Challenges Conclusion Q & A
  • 4. What is Big Data? - IoT, Netflix, Cybersecurity, Social Media, EuroVision, Traffic on GoogleMaps, Machine Learning, Video, Communications, etc. https://bigdatapath.wordpress.com/
  • 5. What Work is There? Senior QA Engineer - Data Analytics Software - Dublin - Selenium/Python - 65k - 75k Senior SDET - Big Data/Analytics - Dublin/Remote - Python, System Testing, Performance Testing - 80k - 90k Lead SDET - Big Data/Analytics - Dublin/Remote - Python, System Testing, Performance Testing - 90k - 100k Senior QA Automation Engineer - Data Management Software - Galway/Remote - Python/Selenium 60k - 70k Reperio Human Capital
  • 6. Who Are the Cloud Providers? - Various cloud providers: AWS, Azure, Google Cloud, Alibaba, IBM, Dell, Tencent
  • 7. What is Big Data in AWS?
  • 9. What is Testing Big Data in AWS? - Testing application that carries data works well (no anomalies) - Functional, Performance testing - Testing of migration from on-prem to the cloud - Verifying resultant big data analysis is correct
  • 10. Functional Testing - Testing with varied valid and invalid input - Boundary cases, Calculations - Scripts - Latin, Sanskrit, Arabic, encoded - Testing against existing on-prem results - Failure cases
  • 11. Performance and SEcurity Testing - Data ingestion (many different sources of data e.g. v1 and v3) - Data processing and throughput - soak - Sub-component performance - Security of pipeline and stored data - Robustness (no data arrives - what then?)
  • 13. Deployment Testing and DEployments - Deployment Testing (upgrades, timings) - Code merged to Master branch, deploy scripts & documentation written & tested - Are go-to person for devops during deployment and upgrades - Staging Environment, Production Environment - Integration testing with other teams.
  • 14. Tools & Automation - AWS EC2 machine with linux/python scripts - Pytest/Robotframework for automation & regression testing - AWS eco system (Athena, CloudWatch, X-Ray, QuickSight) - EXCEL (max=1048576 rows) & VLOOKUP - Pyspark
  • 15. Challenges - Large varied dataset - Automation and scripting skills - Costs & AWS Knowledge - Knowing the big picture - Deployments to staging and production - Communicating with other teams
  • 16. Conclusion - What is Big Data … in AWS? - What is Big Data Testing in AWS - Types of Testing (Functional & Performance) - Deployment Testing & Deployments - Tools & Automation - Challenges

Editor's Notes

  • #19: The Ministry of Testing aims to change and lead within the software testing world.   We are doing this through a strong focus on learning, collaboration and resources.   You are part of the story too, we hope you can join us along the way. We run several conferences around the world under the name of TestBash. We also over an e-learning platform called the Dojo. The Dojo offer numerous online courses for you to improve your testing skills.