SlideShare a Scribd company logo
Bill Hayduk
CEO, RTTS
Business Leader, QuerySurge
(the software division of RTTS)
Testing Big Data:
Automated ETL Testing of Hadoop and NoSQL
Jeff Bocarsly, Ph.D.
Chief Architect
QuerySurge Division, RTTS
built by
QuerySurge™
• About Big Data and Hadoop
• About NoSQL
• Hadoop and DWH Use Case
• How to test Big Data
• Demo of QuerySurge
w/ Hadoop and NoSQL
AGENDA
Testing Big Data:
Automated ETL Testing of
Hadoop and NoSQL
Host: RTTS/QuerySurge
Date: July 30, 2022
Time: 1:00 pm, Eastern
Standard Time
(New York, GMT-05:00)
Session number:
630 771 732
FACTS
Founded:
1996
Headquarters:
New York
Customers:
700+
Strategic Partners:
See logos
Enterprise Software:
QuerySurge
Launched:
2012
Customers:
170+ in 30 countries
RTTS is the leading provider of software & data quality
for critical business systems
About
Technology Partners
Regional Consulting firms
Technology Partners Global System Integrators
Argentina, Australia, Belgium, Brazil, Canada, Chile, India,
Malaysia, Netherlands, New Zealand, Norway, Sweden,
Singapore, South Africa, Ukraine, US
Data Warehouse
Data Warehouse
ETL
ETL
Mainframe
Business Intelligence
& Analytics
C-level executives are using BI &
Analytics to make critical
business decisions with the
assumption that the underlying
data is fine
We know it is not
ETL
Typical data
issue areas
Big data – defined as too much
volume, velocity and variety to
work on normal database
architectures.
Size
Defined as 5 petabytes or more
1 petabyte = 1,000 terabytes
1,000 terabytes = 1,000,000 gigabytes
1,000,000 gigabytes = 1,000,000,000 megabytes
built by
built by
QuerySurge™
Handles more than 1 million customer transactions every hour.
• data imported into databases that contain > 2.5 petabytes of data
• the equivalent of 167 times the information contained in all the books in the US Library of
Congress.
Facebook handles 40 billion photos from its user base.
Google processes 1 Terabyte per hour
Twitter processes 85 million tweets per day
eBay processes 80 Terabytes per day
others
built by
QuerySurge™
Requires exceptional technologies to efficiently process large quantities of
data within tolerable elapsed times.
Technologies include:
• massively parallel processing (MPP) databases
• data warehouses
• Data mining grids
• distributed file systems
• distributed databases
• cloud computing platforms
• the Internet, and
• scalable storage system
built by
QuerySurge™
built by
QuerySurge™
• easily deals with complexities of high of data
Hadoop is an open-source project that
develops software for scalable, distributed computing.
• is a of large data sets across
clusters of computers using simple programming models.
from single servers to 1,000’s of machines, each offering local
computation and storage.
• detects and at the application layer
built by
QuerySurge™
• Redundant and reliable
• Extremely powerful
• Easy to program distributed apps
• Runs on commodity hardware
built by
QuerySurge™
“Spending on Hadoop software and subscriptions will increase to
approximately $677 million, with overall big data market
anticipated to reach the $50 billion mark.”
- Wikibon
built by
QuerySurge™
MapReduce
(Task Tracker)
HDFS
(Data
Node)
MapReduce – processing part that manages
the programming jobs. (a.k.a. Task Tracker)
HDFS (Hadoop Distributed File System) –
stores data on the machines. (a.k.a. Data
Node)
machine
built by
QuerySurge™
Cluster
Add more machines for scaling – from 1 to 100 to 1,000
Job Tracker accepts jobs, assigns tasks, identifies failed machines
Name Node
Coordination for HDFS. Inserts and extraction are communicated through the Name Node.
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Name Node
built by
QuerySurge™
MapReduce
(Task Tracker)
HDFS
(Data
Node)
HiveQL
HiveQL
HiveQL
HiveQL
HiveQL
Apache Hive - a data warehouse infrastructure built on top
of Hadoop for providing data summarization, query, and analysis.
Hive provides a mechanism to query the data using a SQL-like language
called HiveQL that interacts with the HDFS files
• create
• insert
• update
• delete
• select
What is NoSQL?
A term used to describe high-performance, non-relational databases that provide a mechanism for
storage and retrieval of data that is modeled in means other than the tabular relations used in
relational databases
NoSQL Database Types
Document databases pair each key with a complex data structure known as a document.
Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.
Graph stores are used to store information about networks of data, such as social connections.
Graph stores include Neo4J and Giraph.
Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as
an attribute name (or 'key'), together with its value. Examples of key-value stores are Riak and
Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as 'integer',
which adds functionality.
Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets,
and store columns of data together, instead of rows.
a software division of
QuerySurge™
built by
™
Source: MongoDB, Inc.
Data Warehouse Batch Aggregation
ETL from MongoDB
ETL to MongoDB
built by
QuerySurge™
built by
™
• Online real-time processing
• Data set is smaller
• Measured in milliseconds
• Offline big data processing
• Offline analytics
• Measured in minutes & hours
Source: classpattern.com
When to use NoSQL? / When to use Hadoop?
built by
QuerySurge™
built by
QuerySurge™
Data
Warehouse
Hadoop
NoSQL
Hadoop
Data
Warehouse
built by
QuerySurge™
USE CASE 1***
Use Hadoop as a landing zone for big data & raw data
1) bring all raw, big data into Hadoop
2) perform some pre-processing of this data
3) determine which data goes to Data Warehouse
4) Extract, transform and load (ETL) pertinent data into Data Warehouse
***Source: Vijay Ramaiah, IBM product manager, datanami magazine, June 10, 2013
built by
QuerySurge™
Recommended functional test strategy: Test every entry point in the system
(feeds, databases, internal messaging, front-end transactions).
The goal: provide rapid localization of data issues between points
test entry point
built by
Business
Intelligence
software
ETL
Source Data
Source Hadoop ETL Process Target DWH
built by
QuerySurge™
test entry point
test entry points
Relational DB & Data
Warehousing
Source Data
@
BI, Analytics &
Reporting
Ingestion
built by
™
test entry point
test entry point
test entry point
test entry point test entry point
built by
QuerySurge™
- we need to verify more data and to do it faster
- we need to automate the testing
effort
- We need to be able to test across different platforms
We need a testing tool!
built by
QuerySurge™
built by
built by
QuerySurge™
QuerySurge
is the smart Data Testing solution
that automates
the data validation and ETL testing
of Big Data
with full DevOps functionality
for continuous testing
built by
a software division of
QuerySurge™
Data Quality at Speed
→ Automate the launch, execution, comparison & auto-email results
Test across different platforms
→ Data Warehouse, Hadoop, NoSQL, DB, flat files, XML, JSON, BI Reports
Smart Query Wizards - no coding needed
→ Query Wizards create tests visually, without writing SQL
Data Analytics & Data Intelligence
→ Data Analytics Dashboard, Data Intelligence Reports, emailed results,
Ready-for-Analytics back-end data access
Create Custom Tests
→ Modularize functions with snippets, set thresholds, stage data, check data types
DevOps for Data & Continuous Testing
→ API Integration with Build/Release, Continuous Integration/ETL ,
Operations/DevOps Monitoring, Test Management/Issue Tracking, more
Projects
→ Multi-project support, global admin user, activity log reports
Web-based…
Supported OS...
Connects through…
…to any JDBC compliant data source
QuerySurge™
QuerySurge
Controller
QuerySurge Server
DB Server (MySQL)
App Server (Tomcat)
QuerySurge Agents
(Ships with 10 Agents)
a software division of
Installs...
…in the Cloud
…on a VM
…on a Bare Metal Server
Design
Library
Scheduler
Query
Wizards
a software division of
QuerySurge™
Data
Intelligence
Reports
Run-Time
Dashboard
DevOps for
Data
Data Analytics
Dashboard
Projects
QuerySurge™ a software division of
Multi-Project Support
Multiple projects can now be created in a single QuerySurge instance. This allows for multiple groups to
work on the same QuerySurge server without seeing each other’s assets (project-level security).
Features supported in Multi-Projects are:
• Global Admin User: This new user type administers the QuerySurge instance
across multiple projects.
• Assign Users to Projects: Users can be assigned to one or more projects. In
each assignment, a user can have a different project role (administrator,
standard user or participant user).
• Assign Agents to Projects: Agents can be shared across projects or dedicated
to specific projects.
• Project Import: Import project data into another project on the same instance
or into a different environment (Dev/QA/Prod).
• Project Export: Export entire projects and store for backup purposes.
• Activity Log Reports: Two reports that track specific changes for auditing
purposes, including manipulations to users or connections.
Fast and Easy.
No programming needed.
QuerySurge™
• Perform 80% of all data tests with no SQL coding
• Opens up testing to novices & non-technical members
• Speeds up testing for skilled coders
• provides a huge Return-On-Investment
a software division of
QuerySurge™
a software division of
Design Library
• Create custom Query Pairs (source & target
SQLs for tests that have transformations)
Scheduling
 Build groups of Query Pairs
 Schedule Test Runs
• Run immediately
• Run at set date/time
• Have event kick it off
™
a software division of
Deep-Dive Reporting
 Examine and automatically
email test results
Run Dashboard
 View real-time execution
 Analyze real-time results
™
a software division of
a software division of
QuerySurge™
QuerySurge DevOps for Data
• First full DevOps for Data testing solution
• Both RESTful and command line APIs
• Improves Data Quality at Speed
QuerySurge DevOps for Data integrates with:
• Continuous integration/ETL solutions
• Automated build/release/deployment solutions
• Operations and DevOps monitoring solutions
• Test management/issue tracking solutions
• Scheduling and workload automation solutions
60+ API calls with almost 100 different properties
that users can utilize to retrieve, edit, update, or
delete information.
QuerySurge™
• view data reliability & pass rate
• add, move, filter, zoom-in on any
data widget & underlying data
• verify build success or failure
a software division of
Large Suite March 5, 2021 16:20:44 March 5, 2021
March 5, 2021 4:24 PM
Start Time
QuerySurge™
6 minutes
(1) Trial in the Cloud of QuerySurgeTM, including self-learning
tutorial that works with sample data for 3 days
(2) Downloaded Trial of QuerySurgeTM, including self-learning
tutorial with sample data or your data for 15 days
for more information on our Trials, please visit:
www.querysurge.com/compare-trial-options
TRIAL
IN THE CLOUD
built by
QuerySurge™
http://www.rttsweb.com/training/courses/big-data-testing-courses
Big Data Testing Courses
Filled with examples and labs, this hands-on training teaches concepts
and HQL techniques used in Big Data testing.
For more information on our Big Data Testing classes, please visit:
built by
built by
QuerySurge™
To see the video of our Big Data testing webinar please visit:
http://www.querysurge.com/solutions/testing-big-data/big-data-testing-for-hadoop
Big Data is on the verge of revolutionizing enterprise data
management architectures.
- DeZyre
Ad

More Related Content

What's hot (20)

Describing the Organisation Data Landscape
Describing the Organisation Data LandscapeDescribing the Organisation Data Landscape
Describing the Organisation Data Landscape
Alan McSweeney
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
rockplace
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Mark Kromer
 
Microsoft Azure
Microsoft AzureMicrosoft Azure
Microsoft Azure
Soumya De
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
Databricks
 
Data weekender4.2 azure purview erwin de kreuk
Data weekender4.2  azure purview erwin de kreukData weekender4.2  azure purview erwin de kreuk
Data weekender4.2 azure purview erwin de kreuk
Erwin de Kreuk
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
Azure fundamentals
Azure   fundamentalsAzure   fundamentals
Azure fundamentals
Raju Kumar
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
Cloud Computing and Microsoft Azure
Cloud Computing and Microsoft AzureCloud Computing and Microsoft Azure
Cloud Computing and Microsoft Azure
Suhail Jamaldeen
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
inovex GmbH
 
Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
James Serra
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
RTTS
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
Mark Kromer
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 
Spark as a Service with Azure Databricks
Spark as a Service with Azure DatabricksSpark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
Lace Lofranco
 
Disaster Recovery Synapse
Disaster Recovery SynapseDisaster Recovery Synapse
Disaster Recovery Synapse
RicardoLinhares22
 
Describing the Organisation Data Landscape
Describing the Organisation Data LandscapeDescribing the Organisation Data Landscape
Describing the Organisation Data Landscape
Alan McSweeney
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
rockplace
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Mark Kromer
 
Microsoft Azure
Microsoft AzureMicrosoft Azure
Microsoft Azure
Soumya De
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
Databricks
 
Data weekender4.2 azure purview erwin de kreuk
Data weekender4.2  azure purview erwin de kreukData weekender4.2  azure purview erwin de kreuk
Data weekender4.2 azure purview erwin de kreuk
Erwin de Kreuk
 
Azure fundamentals
Azure   fundamentalsAzure   fundamentals
Azure fundamentals
Raju Kumar
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
Cloud Computing and Microsoft Azure
Cloud Computing and Microsoft AzureCloud Computing and Microsoft Azure
Cloud Computing and Microsoft Azure
Suhail Jamaldeen
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
inovex GmbH
 
Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
James Serra
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
RTTS
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
Mark Kromer
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 
Spark as a Service with Azure Databricks
Spark as a Service with Azure DatabricksSpark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
Lace Lofranco
 

Similar to QuerySurge Slide Deck for Big Data Testing Webinar (20)

Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
RTTS
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
Trivadis
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
James Serra
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
RTTS
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam1
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
xKinAnx
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
Wilfried Hoge
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Dataconomy Media
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
RTTS
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
Trivadis
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
James Serra
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
RTTS
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam1
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
xKinAnx
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
Wilfried Hoge
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Dataconomy Media
 
Ad

More from RTTS (20)

Leveraging AI to Simplify and Speed Up ETL Testing
Leveraging AI to Simplify and Speed Up ETL TestingLeveraging AI to Simplify and Speed Up ETL Testing
Leveraging AI to Simplify and Speed Up ETL Testing
RTTS
 
Improving Automated Testing Projects with UFT
Improving Automated Testing Projects with UFTImproving Automated Testing Projects with UFT
Improving Automated Testing Projects with UFT
RTTS
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
RTTS
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
RTTS
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
RTTS
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
RTTS
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
RTTS
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
RTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
RTTS
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
RTTS
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
RTTS
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
RTTS
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
RTTS
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
RTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
RTTS
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriver
RTTS
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
RTTS
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
RTTS
 
Leveraging AI to Simplify and Speed Up ETL Testing
Leveraging AI to Simplify and Speed Up ETL TestingLeveraging AI to Simplify and Speed Up ETL Testing
Leveraging AI to Simplify and Speed Up ETL Testing
RTTS
 
Improving Automated Testing Projects with UFT
Improving Automated Testing Projects with UFTImproving Automated Testing Projects with UFT
Improving Automated Testing Projects with UFT
RTTS
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
RTTS
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
RTTS
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
RTTS
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
RTTS
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
RTTS
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
RTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
RTTS
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
RTTS
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
RTTS
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
RTTS
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
RTTS
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
RTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
RTTS
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriver
RTTS
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
RTTS
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
RTTS
 
Ad

Recently uploaded (20)

Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 

QuerySurge Slide Deck for Big Data Testing Webinar

  • 1. Bill Hayduk CEO, RTTS Business Leader, QuerySurge (the software division of RTTS) Testing Big Data: Automated ETL Testing of Hadoop and NoSQL Jeff Bocarsly, Ph.D. Chief Architect QuerySurge Division, RTTS
  • 2. built by QuerySurge™ • About Big Data and Hadoop • About NoSQL • Hadoop and DWH Use Case • How to test Big Data • Demo of QuerySurge w/ Hadoop and NoSQL AGENDA Testing Big Data: Automated ETL Testing of Hadoop and NoSQL Host: RTTS/QuerySurge Date: July 30, 2022 Time: 1:00 pm, Eastern Standard Time (New York, GMT-05:00) Session number: 630 771 732
  • 3. FACTS Founded: 1996 Headquarters: New York Customers: 700+ Strategic Partners: See logos Enterprise Software: QuerySurge Launched: 2012 Customers: 170+ in 30 countries RTTS is the leading provider of software & data quality for critical business systems About Technology Partners
  • 4. Regional Consulting firms Technology Partners Global System Integrators Argentina, Australia, Belgium, Brazil, Canada, Chile, India, Malaysia, Netherlands, New Zealand, Norway, Sweden, Singapore, South Africa, Ukraine, US
  • 5. Data Warehouse Data Warehouse ETL ETL Mainframe Business Intelligence & Analytics C-level executives are using BI & Analytics to make critical business decisions with the assumption that the underlying data is fine We know it is not ETL Typical data issue areas
  • 6. Big data – defined as too much volume, velocity and variety to work on normal database architectures. Size Defined as 5 petabytes or more 1 petabyte = 1,000 terabytes 1,000 terabytes = 1,000,000 gigabytes 1,000,000 gigabytes = 1,000,000,000 megabytes built by built by QuerySurge™
  • 7. Handles more than 1 million customer transactions every hour. • data imported into databases that contain > 2.5 petabytes of data • the equivalent of 167 times the information contained in all the books in the US Library of Congress. Facebook handles 40 billion photos from its user base. Google processes 1 Terabyte per hour Twitter processes 85 million tweets per day eBay processes 80 Terabytes per day others built by QuerySurge™
  • 8. Requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. Technologies include: • massively parallel processing (MPP) databases • data warehouses • Data mining grids • distributed file systems • distributed databases • cloud computing platforms • the Internet, and • scalable storage system built by QuerySurge™
  • 9. built by QuerySurge™ • easily deals with complexities of high of data Hadoop is an open-source project that develops software for scalable, distributed computing. • is a of large data sets across clusters of computers using simple programming models. from single servers to 1,000’s of machines, each offering local computation and storage. • detects and at the application layer
  • 10. built by QuerySurge™ • Redundant and reliable • Extremely powerful • Easy to program distributed apps • Runs on commodity hardware
  • 11. built by QuerySurge™ “Spending on Hadoop software and subscriptions will increase to approximately $677 million, with overall big data market anticipated to reach the $50 billion mark.” - Wikibon
  • 12. built by QuerySurge™ MapReduce (Task Tracker) HDFS (Data Node) MapReduce – processing part that manages the programming jobs. (a.k.a. Task Tracker) HDFS (Hadoop Distributed File System) – stores data on the machines. (a.k.a. Data Node) machine
  • 13. built by QuerySurge™ Cluster Add more machines for scaling – from 1 to 100 to 1,000 Job Tracker accepts jobs, assigns tasks, identifies failed machines Name Node Coordination for HDFS. Inserts and extraction are communicated through the Name Node. Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Name Node
  • 14. built by QuerySurge™ MapReduce (Task Tracker) HDFS (Data Node) HiveQL HiveQL HiveQL HiveQL HiveQL Apache Hive - a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Hive provides a mechanism to query the data using a SQL-like language called HiveQL that interacts with the HDFS files • create • insert • update • delete • select
  • 15. What is NoSQL? A term used to describe high-performance, non-relational databases that provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases NoSQL Database Types Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents. Graph stores are used to store information about networks of data, such as social connections. Graph stores include Neo4J and Giraph. Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as 'integer', which adds functionality. Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows. a software division of QuerySurge™
  • 16. built by ™ Source: MongoDB, Inc. Data Warehouse Batch Aggregation ETL from MongoDB ETL to MongoDB
  • 18. built by ™ • Online real-time processing • Data set is smaller • Measured in milliseconds • Offline big data processing • Offline analytics • Measured in minutes & hours Source: classpattern.com When to use NoSQL? / When to use Hadoop?
  • 21. built by QuerySurge™ USE CASE 1*** Use Hadoop as a landing zone for big data & raw data 1) bring all raw, big data into Hadoop 2) perform some pre-processing of this data 3) determine which data goes to Data Warehouse 4) Extract, transform and load (ETL) pertinent data into Data Warehouse ***Source: Vijay Ramaiah, IBM product manager, datanami magazine, June 10, 2013 built by QuerySurge™
  • 22. Recommended functional test strategy: Test every entry point in the system (feeds, databases, internal messaging, front-end transactions). The goal: provide rapid localization of data issues between points test entry point built by Business Intelligence software ETL Source Data Source Hadoop ETL Process Target DWH built by QuerySurge™ test entry point test entry points
  • 23. Relational DB & Data Warehousing Source Data @ BI, Analytics & Reporting Ingestion built by ™ test entry point test entry point test entry point test entry point test entry point
  • 24. built by QuerySurge™ - we need to verify more data and to do it faster - we need to automate the testing effort - We need to be able to test across different platforms We need a testing tool!
  • 26. built by QuerySurge™ QuerySurge is the smart Data Testing solution that automates the data validation and ETL testing of Big Data with full DevOps functionality for continuous testing built by
  • 27. a software division of QuerySurge™ Data Quality at Speed → Automate the launch, execution, comparison & auto-email results Test across different platforms → Data Warehouse, Hadoop, NoSQL, DB, flat files, XML, JSON, BI Reports Smart Query Wizards - no coding needed → Query Wizards create tests visually, without writing SQL Data Analytics & Data Intelligence → Data Analytics Dashboard, Data Intelligence Reports, emailed results, Ready-for-Analytics back-end data access Create Custom Tests → Modularize functions with snippets, set thresholds, stage data, check data types DevOps for Data & Continuous Testing → API Integration with Build/Release, Continuous Integration/ETL , Operations/DevOps Monitoring, Test Management/Issue Tracking, more Projects → Multi-project support, global admin user, activity log reports
  • 28. Web-based… Supported OS... Connects through… …to any JDBC compliant data source QuerySurge™ QuerySurge Controller QuerySurge Server DB Server (MySQL) App Server (Tomcat) QuerySurge Agents (Ships with 10 Agents) a software division of Installs... …in the Cloud …on a VM …on a Bare Metal Server
  • 29. Design Library Scheduler Query Wizards a software division of QuerySurge™ Data Intelligence Reports Run-Time Dashboard DevOps for Data Data Analytics Dashboard Projects
  • 30. QuerySurge™ a software division of Multi-Project Support Multiple projects can now be created in a single QuerySurge instance. This allows for multiple groups to work on the same QuerySurge server without seeing each other’s assets (project-level security). Features supported in Multi-Projects are: • Global Admin User: This new user type administers the QuerySurge instance across multiple projects. • Assign Users to Projects: Users can be assigned to one or more projects. In each assignment, a user can have a different project role (administrator, standard user or participant user). • Assign Agents to Projects: Agents can be shared across projects or dedicated to specific projects. • Project Import: Import project data into another project on the same instance or into a different environment (Dev/QA/Prod). • Project Export: Export entire projects and store for backup purposes. • Activity Log Reports: Two reports that track specific changes for auditing purposes, including manipulations to users or connections.
  • 31. Fast and Easy. No programming needed. QuerySurge™ • Perform 80% of all data tests with no SQL coding • Opens up testing to novices & non-technical members • Speeds up testing for skilled coders • provides a huge Return-On-Investment a software division of
  • 33. Design Library • Create custom Query Pairs (source & target SQLs for tests that have transformations) Scheduling  Build groups of Query Pairs  Schedule Test Runs • Run immediately • Run at set date/time • Have event kick it off ™ a software division of
  • 34. Deep-Dive Reporting  Examine and automatically email test results Run Dashboard  View real-time execution  Analyze real-time results ™ a software division of
  • 35. a software division of QuerySurge™ QuerySurge DevOps for Data • First full DevOps for Data testing solution • Both RESTful and command line APIs • Improves Data Quality at Speed QuerySurge DevOps for Data integrates with: • Continuous integration/ETL solutions • Automated build/release/deployment solutions • Operations and DevOps monitoring solutions • Test management/issue tracking solutions • Scheduling and workload automation solutions 60+ API calls with almost 100 different properties that users can utilize to retrieve, edit, update, or delete information.
  • 36. QuerySurge™ • view data reliability & pass rate • add, move, filter, zoom-in on any data widget & underlying data • verify build success or failure a software division of
  • 37. Large Suite March 5, 2021 16:20:44 March 5, 2021 March 5, 2021 4:24 PM Start Time QuerySurge™ 6 minutes
  • 38. (1) Trial in the Cloud of QuerySurgeTM, including self-learning tutorial that works with sample data for 3 days (2) Downloaded Trial of QuerySurgeTM, including self-learning tutorial with sample data or your data for 15 days for more information on our Trials, please visit: www.querysurge.com/compare-trial-options TRIAL IN THE CLOUD built by QuerySurge™ http://www.rttsweb.com/training/courses/big-data-testing-courses Big Data Testing Courses Filled with examples and labs, this hands-on training teaches concepts and HQL techniques used in Big Data testing. For more information on our Big Data Testing classes, please visit:
  • 39. built by built by QuerySurge™ To see the video of our Big Data testing webinar please visit: http://www.querysurge.com/solutions/testing-big-data/big-data-testing-for-hadoop Big Data is on the verge of revolutionizing enterprise data management architectures. - DeZyre