SlideShare a Scribd company logo
LOGO
www.unicomlearning.com
Manoj Kolhe
Project Lead, Testing Service Line, L&T Infotech
India Testing Week 2013
Big Data Testing
December 10, 2013
Mumbai
www.unicomlearning.com
www.nextgentesting.org
www.unicomlearning.com
www.nextgentesting.org
Introduction to Big Data
Web Logs
Social N/W Data
Transactional Data
Statistical Data
Database Cluster
Database of Databases
Big Data
BI Reporting
Trend Analysis
Decision Making
www.unicomlearning.com
www.nextgentesting.org
Wikipedia:
Big Data term is applied to data sets whose size is beyond the ability of commonly used
software tools to capture, manage, and process within a tolerable elapsed time.
Gartner -
Big Data are high-volume, high-velocity, and/or high-variety information assets that
require new forms of processing to enable enhanced decision making, insight discovery
and process optimization
Gartner
..the real issue is making sense of big data and finding patterns in it that
help organizations make better business decisions. “look for patterns
that support business decisions in what we call Pattern-Based
Strategy”
Forrester
Opportunities to improve the bottom line exist in a flood of
information; however, gaining insight from data becomes challenging as it
grows extremely large. Emerging technology applies the power of
distributed, virtual computing to the problem of large data,
SAS
With Big Data Analytics, organizations can make better and faster decisions
based not only on what has happened, but what will happen next. They can
also predict the best possible outcome while remaining agile in
swiftly changing times
Defining Big Data
www.unicomlearning.com
www.nextgentesting.org
How Big is Big data?
175 million tweets a
day
465 million user
accounts
30+ PB User Data
100TB daily uploads
50 billion user photos
20+ PB Daily
Data
300 billion videos
Total runtime 47 million years
48 hrs of video each minute of
the day
5 billion users are
calling, texting,
browsing data
2.9 millions of mails
exchanged every
second
1.3 Exabyte's ~1018
bytes
Popular 5Vs:
Big Data
• Volume
• Variety
• Velocity
• Viability
• Value
90% of the data in the world
today has been created in
the last two years alone
Market Statistics & Applications
of Big Data
www.unicomlearning.com
www.nextgentesting.org
www.unicomlearning.com
www.nextgentesting.org
Big Data – Use Case
Case Study – Telematics Domain
Objectives
 Create a state-of-the-art analytic environment to support and fuel fast
growing telematics industry
 Increase operational efficiency
Solution
 Built a comprehensive scalable and reliable service platform to support
an end-to-end analytic environment everything from operational data
to robust predictive analytic applications
Technology Implemented
 MongoDB
 Enterprise Service Bus
 Big Data predictive analytics software
Outcome
 Reduced end-to-end predictive analytics process from months to days
 Improved marketing campaign effectiveness with 65% model accuracy
and efficacy
www.unicomlearning.com
www.nextgentesting.org
Traditional RDBMS vs NoSQL….
RDBMS NoSQL
Feature Row-Column & Structured Semi-structured & Unstructured in
Parallelism – Batch processing
Scalability Vertical by adding systems Horizontal by replicating nodes
Data Handling
/ Extraction
Slower for large data volumes in
analytical – Partitioning
Fast access in both operational and
analytical - Sharding
Price Mostly proprietary
e.g. Oracle, DB2, SQL Server
Open Source
e.g.Hadoop, CloudEra, Amazon,
Hortonworks
Big Data – Test Environment
www.unicomlearning.com
www.nextgentesting.org
Test Environment Setup
• Infrastructure:
• Cluster Setup – Evaluate Data nodes
• Software / Platform
• File System, NoSQL DB
• Test Data
• Off peak hours traffic testing
• Staging environments / Scaled down models
• Historical Data / Test data generation using utilities
• Test Infrastructre
• Test strategy, testing release cycle
• Volume of data consideration
• 3rd party tools
• Load Simulators
• jMeter for multi-threaded users
• Load simulation using Cloud
• Monitoring
• Hadoop Performance Monitoring
• ECL Watch (HPCC)
www.unicomlearning.com
www.nextgentesting.org
Testing Requirements in Big Data
Logging Data Data Streams Social Data Traditional RDBMS
Store Process
Analyze Reporting
Hadoop
HDFS
Enterprise Data
Warehouse
Big Data Analytics
BI Reporting
Processed Data
Pre-Hadoop
Validation
1
HDFS Data
Validation 2
ETL Data
Validation 3
Reports Data
Testing 4
Ref: Infosys Big Data Solutions
Testing Approach for Data Validation
www.unicomlearning.com
www.nextgentesting.org
Data Loading in
HDFS
Unstructured
Data
Structured
Data
HDFS
Managing the
data
Processed Data
Data Validation
Expected
Results
1
2
3
4
www.unicomlearning.com
www.nextgentesting.org
Testing Landscape
•Acquire all
available data
ACQUIRE
•Organize and
Clean data
with parallel
processing
ORGANIZE
•Analyze all
data, at once
ANALYZE
•Take business
decisions
based on
active data
BUSINESS
USE
Tapping
unused , new
datasets
Build new
relationship ,
understanding
Data driven
business
decisions
www.unicomlearning.com
www.nextgentesting.org
Testing Landscape – Acquire
•Acquire all
available data
Acquire
•Organize and
Clean data
with parallel
processing
Organize
•Analyze all
data, at once
Analyze
•Take business
decisions
based on
active data
Business
Use
Data Validation
• Data Comparison
• Extraction of right data
• HDFS loading
• Data replication
www.unicomlearning.com
www.nextgentesting.org
Testing Landscape – Organize
•Acquire all
available data
Acquire
•Organize and
Clean data
with parallel
processing
Organize
•Analyze all
data, at once
Analyze
•Take business
decisions
based on
active data
Business
Use
Data Validation
• Data Comparison
• Extraction of right data
• HDFS loading
• Data replication
MR Job Validation
• MR Job Logic
• Aggregation and
consolidated data
• Job Output against
source files
www.unicomlearning.com
www.nextgentesting.org
Testing Landscape – Analyze
•Acquire all
available data
Acquire
•Organize and
Clean data
with parallel
processing
Organize
•Analyze all
data, at once
Analyze
•Take business
decisions
based on
active data
Business
Use
Data Validation
• Data Comparison
• Extraction of right data
• HDFS loading
• Data replication
MR Job Validation
• MR Job Logic
• Aggregation and
consolidated data
• Job Output against
source files
Data Transformation
• Transformation Logic
• Data cleansing
• Data transfer
• Data integrity
www.unicomlearning.com
www.nextgentesting.org
Testing Landscape – Business Use
•Acquire all
available data
Acquire
•Organize and
Clean data
with parallel
processing
Organize
•Analyze all
data, at once
Analyze
•Take business
decisions
based on
active data
Business
Use
Data Validation
• Data Comparison
• Extraction of right data
• HDFS loading
• Data replication
MR Job Validation
• MR Job Logic
• Aggregation and
consolidated data
• Job Output against
source files
Data Transformation
• Transformation Logic
• Data cleansing
• Data transfer
• Data integrity
Reports Definition
• Report data validation
• OLAP cube testing
• Dashboard testing
www.unicomlearning.com
www.nextgentesting.org
Big Data Testing Readiness…
People
Training
Certifications
Knowledge
Base
Process
Data Centric
Testing Process
Risk Based
Testing
Approach
Good Practices
Knowledge
TDM
Test Data
Generation
Data Masking
Requirements
Data Profiling
& Extraction
Utilities
Technology
Non-traditional
automation
End to End
testing
Adaptability to
existing
solutions
Organized by
UNICOM Trainings & Seminars Pvt. Ltd.
contact@unicomlearning.com
Speaker name: Manoj Kolhe
Email ID: Manoj.Kolhe@Lntinfotech.com
www.unicomlearning.com
Thank You
www.nextgentesting.org
Ad

Recommended

02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
Big Data Evolution
Big Data Evolution
itnewsafrica
 
Sgcp14dunlea
Sgcp14dunlea
Justin Hayward
 
Big data analytics
Big data analytics
ANAND PRAKASH
 
Big data by Mithlesh sadh
Big data by Mithlesh sadh
Mithlesh Sadh
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analytics
The Marketing Distillery
 
Big Data in Business Application use case and benefits
Big Data in Business Application use case and benefits
Gaurav493374
 
New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013
IBM Sverige
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
MongoDB
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
RATISHKUMAR32
 
Overview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
DataWorks Summit
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Denodo
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
email2jl
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
Perficient, Inc.
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
☁Jake Weaver ☁
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
Agile Testing Alliance
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess Qlik
Bardess Group
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus Example
Bardess Group
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Capturing big value in big data
Capturing big value in big data
BSP Media Group
 
Data Discovery vs BI Webinar
Data Discovery vs BI Webinar
Birst
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
Prakalp Agarwal
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
Chapter 4 : Introduction to BigData.pptx
Chapter 4 : Introduction to BigData.pptx
bharatgautam204
 
Official_CC_Course_Completion_Certificate_Official_ISC2_CC_Online_Self-Paced_...
Official_CC_Course_Completion_Certificate_Official_ISC2_CC_Online_Self-Paced_...
Manoj Kolhe
 
CONSULTANT Certificate.pdf
CONSULTANT Certificate.pdf
Manoj Kolhe
 

More Related Content

Similar to Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6 (20)

Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
MongoDB
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
RATISHKUMAR32
 
Overview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
DataWorks Summit
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Denodo
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
email2jl
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
Perficient, Inc.
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
☁Jake Weaver ☁
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
Agile Testing Alliance
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess Qlik
Bardess Group
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus Example
Bardess Group
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Capturing big value in big data
Capturing big value in big data
BSP Media Group
 
Data Discovery vs BI Webinar
Data Discovery vs BI Webinar
Birst
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
Prakalp Agarwal
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
Chapter 4 : Introduction to BigData.pptx
Chapter 4 : Introduction to BigData.pptx
bharatgautam204
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
MongoDB
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
RATISHKUMAR32
 
Overview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
DataWorks Summit
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Denodo
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
email2jl
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
Perficient, Inc.
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
☁Jake Weaver ☁
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
Agile Testing Alliance
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess Qlik
Bardess Group
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus Example
Bardess Group
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Capturing big value in big data
Capturing big value in big data
BSP Media Group
 
Data Discovery vs BI Webinar
Data Discovery vs BI Webinar
Birst
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
Prakalp Agarwal
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
Chapter 4 : Introduction to BigData.pptx
Chapter 4 : Introduction to BigData.pptx
bharatgautam204
 

More from Manoj Kolhe (14)

Official_CC_Course_Completion_Certificate_Official_ISC2_CC_Online_Self-Paced_...
Official_CC_Course_Completion_Certificate_Official_ISC2_CC_Online_Self-Paced_...
Manoj Kolhe
 
CONSULTANT Certificate.pdf
CONSULTANT Certificate.pdf
Manoj Kolhe
 
FUNDAMENTALS II Certificate.pdf
FUNDAMENTALS II Certificate.pdf
Manoj Kolhe
 
UiPath_RPA Solution Architect - Completion Diploma
UiPath_RPA Solution Architect - Completion Diploma
Manoj Kolhe
 
TOGAF
TOGAF
Manoj Kolhe
 
Copado FUNDAMENTALS I Certificate
Copado FUNDAMENTALS I Certificate
Manoj Kolhe
 
COPADO ROBOTIC TESTING Certificate - Manoj Kolhe
COPADO ROBOTIC TESTING Certificate - Manoj Kolhe
Manoj Kolhe
 
Manoj Kolhe - Testing in Agile Environment
Manoj Kolhe - Testing in Agile Environment
Manoj Kolhe
 
ProGuard Code Obfuscation
ProGuard Code Obfuscation
Manoj Kolhe
 
Mongo db
Mongo db
Manoj Kolhe
 
Manoj Kolhe
Manoj Kolhe
Manoj Kolhe
 
Manoj Kolhe
Manoj Kolhe
Manoj Kolhe
 
Manoj Kolhe - Setup GitHub with Jenkins on Amazon Cloud - End-to-end Automation
Manoj Kolhe - Setup GitHub with Jenkins on Amazon Cloud - End-to-end Automation
Manoj Kolhe
 
Manoj kolhe - Continuous Integration Testing
Manoj kolhe - Continuous Integration Testing
Manoj Kolhe
 
Official_CC_Course_Completion_Certificate_Official_ISC2_CC_Online_Self-Paced_...
Official_CC_Course_Completion_Certificate_Official_ISC2_CC_Online_Self-Paced_...
Manoj Kolhe
 
CONSULTANT Certificate.pdf
CONSULTANT Certificate.pdf
Manoj Kolhe
 
FUNDAMENTALS II Certificate.pdf
FUNDAMENTALS II Certificate.pdf
Manoj Kolhe
 
UiPath_RPA Solution Architect - Completion Diploma
UiPath_RPA Solution Architect - Completion Diploma
Manoj Kolhe
 
Copado FUNDAMENTALS I Certificate
Copado FUNDAMENTALS I Certificate
Manoj Kolhe
 
COPADO ROBOTIC TESTING Certificate - Manoj Kolhe
COPADO ROBOTIC TESTING Certificate - Manoj Kolhe
Manoj Kolhe
 
Manoj Kolhe - Testing in Agile Environment
Manoj Kolhe - Testing in Agile Environment
Manoj Kolhe
 
ProGuard Code Obfuscation
ProGuard Code Obfuscation
Manoj Kolhe
 
Manoj Kolhe - Setup GitHub with Jenkins on Amazon Cloud - End-to-end Automation
Manoj Kolhe - Setup GitHub with Jenkins on Amazon Cloud - End-to-end Automation
Manoj Kolhe
 
Manoj kolhe - Continuous Integration Testing
Manoj kolhe - Continuous Integration Testing
Manoj Kolhe
 
Ad

Recently uploaded (20)

Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
Quantum AI: Where Impossible Becomes Probable
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
Quantum AI: Where Impossible Becomes Probable
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Ad

Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6

  • 1. LOGO www.unicomlearning.com Manoj Kolhe Project Lead, Testing Service Line, L&T Infotech India Testing Week 2013 Big Data Testing December 10, 2013 Mumbai www.unicomlearning.com www.nextgentesting.org
  • 2. www.unicomlearning.com www.nextgentesting.org Introduction to Big Data Web Logs Social N/W Data Transactional Data Statistical Data Database Cluster Database of Databases Big Data BI Reporting Trend Analysis Decision Making
  • 3. www.unicomlearning.com www.nextgentesting.org Wikipedia: Big Data term is applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process within a tolerable elapsed time. Gartner - Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization Gartner ..the real issue is making sense of big data and finding patterns in it that help organizations make better business decisions. “look for patterns that support business decisions in what we call Pattern-Based Strategy” Forrester Opportunities to improve the bottom line exist in a flood of information; however, gaining insight from data becomes challenging as it grows extremely large. Emerging technology applies the power of distributed, virtual computing to the problem of large data, SAS With Big Data Analytics, organizations can make better and faster decisions based not only on what has happened, but what will happen next. They can also predict the best possible outcome while remaining agile in swiftly changing times Defining Big Data
  • 4. www.unicomlearning.com www.nextgentesting.org How Big is Big data? 175 million tweets a day 465 million user accounts 30+ PB User Data 100TB daily uploads 50 billion user photos 20+ PB Daily Data 300 billion videos Total runtime 47 million years 48 hrs of video each minute of the day 5 billion users are calling, texting, browsing data 2.9 millions of mails exchanged every second 1.3 Exabyte's ~1018 bytes Popular 5Vs: Big Data • Volume • Variety • Velocity • Viability • Value 90% of the data in the world today has been created in the last two years alone
  • 5. Market Statistics & Applications of Big Data www.unicomlearning.com www.nextgentesting.org
  • 6. www.unicomlearning.com www.nextgentesting.org Big Data – Use Case Case Study – Telematics Domain Objectives  Create a state-of-the-art analytic environment to support and fuel fast growing telematics industry  Increase operational efficiency Solution  Built a comprehensive scalable and reliable service platform to support an end-to-end analytic environment everything from operational data to robust predictive analytic applications Technology Implemented  MongoDB  Enterprise Service Bus  Big Data predictive analytics software Outcome  Reduced end-to-end predictive analytics process from months to days  Improved marketing campaign effectiveness with 65% model accuracy and efficacy
  • 7. www.unicomlearning.com www.nextgentesting.org Traditional RDBMS vs NoSQL…. RDBMS NoSQL Feature Row-Column & Structured Semi-structured & Unstructured in Parallelism – Batch processing Scalability Vertical by adding systems Horizontal by replicating nodes Data Handling / Extraction Slower for large data volumes in analytical – Partitioning Fast access in both operational and analytical - Sharding Price Mostly proprietary e.g. Oracle, DB2, SQL Server Open Source e.g.Hadoop, CloudEra, Amazon, Hortonworks
  • 8. Big Data – Test Environment www.unicomlearning.com www.nextgentesting.org Test Environment Setup • Infrastructure: • Cluster Setup – Evaluate Data nodes • Software / Platform • File System, NoSQL DB • Test Data • Off peak hours traffic testing • Staging environments / Scaled down models • Historical Data / Test data generation using utilities • Test Infrastructre • Test strategy, testing release cycle • Volume of data consideration • 3rd party tools • Load Simulators • jMeter for multi-threaded users • Load simulation using Cloud • Monitoring • Hadoop Performance Monitoring • ECL Watch (HPCC)
  • 9. www.unicomlearning.com www.nextgentesting.org Testing Requirements in Big Data Logging Data Data Streams Social Data Traditional RDBMS Store Process Analyze Reporting Hadoop HDFS Enterprise Data Warehouse Big Data Analytics BI Reporting Processed Data Pre-Hadoop Validation 1 HDFS Data Validation 2 ETL Data Validation 3 Reports Data Testing 4 Ref: Infosys Big Data Solutions
  • 10. Testing Approach for Data Validation www.unicomlearning.com www.nextgentesting.org Data Loading in HDFS Unstructured Data Structured Data HDFS Managing the data Processed Data Data Validation Expected Results 1 2 3 4
  • 11. www.unicomlearning.com www.nextgentesting.org Testing Landscape •Acquire all available data ACQUIRE •Organize and Clean data with parallel processing ORGANIZE •Analyze all data, at once ANALYZE •Take business decisions based on active data BUSINESS USE Tapping unused , new datasets Build new relationship , understanding Data driven business decisions
  • 12. www.unicomlearning.com www.nextgentesting.org Testing Landscape – Acquire •Acquire all available data Acquire •Organize and Clean data with parallel processing Organize •Analyze all data, at once Analyze •Take business decisions based on active data Business Use Data Validation • Data Comparison • Extraction of right data • HDFS loading • Data replication
  • 13. www.unicomlearning.com www.nextgentesting.org Testing Landscape – Organize •Acquire all available data Acquire •Organize and Clean data with parallel processing Organize •Analyze all data, at once Analyze •Take business decisions based on active data Business Use Data Validation • Data Comparison • Extraction of right data • HDFS loading • Data replication MR Job Validation • MR Job Logic • Aggregation and consolidated data • Job Output against source files
  • 14. www.unicomlearning.com www.nextgentesting.org Testing Landscape – Analyze •Acquire all available data Acquire •Organize and Clean data with parallel processing Organize •Analyze all data, at once Analyze •Take business decisions based on active data Business Use Data Validation • Data Comparison • Extraction of right data • HDFS loading • Data replication MR Job Validation • MR Job Logic • Aggregation and consolidated data • Job Output against source files Data Transformation • Transformation Logic • Data cleansing • Data transfer • Data integrity
  • 15. www.unicomlearning.com www.nextgentesting.org Testing Landscape – Business Use •Acquire all available data Acquire •Organize and Clean data with parallel processing Organize •Analyze all data, at once Analyze •Take business decisions based on active data Business Use Data Validation • Data Comparison • Extraction of right data • HDFS loading • Data replication MR Job Validation • MR Job Logic • Aggregation and consolidated data • Job Output against source files Data Transformation • Transformation Logic • Data cleansing • Data transfer • Data integrity Reports Definition • Report data validation • OLAP cube testing • Dashboard testing
  • 16. www.unicomlearning.com www.nextgentesting.org Big Data Testing Readiness… People Training Certifications Knowledge Base Process Data Centric Testing Process Risk Based Testing Approach Good Practices Knowledge TDM Test Data Generation Data Masking Requirements Data Profiling & Extraction Utilities Technology Non-traditional automation End to End testing Adaptability to existing solutions
  • 17. Organized by UNICOM Trainings & Seminars Pvt. Ltd. [email protected] Speaker name: Manoj Kolhe Email ID: [email protected] www.unicomlearning.com Thank You www.nextgentesting.org