SlideShare a Scribd company logo
Deliver Trusted Data by Leveraging
ETL Testing
Data-rich organizations seeking to assure data quality can
systemize the validation process by leveraging automated testing
to increase coverage, accuracy and competitive advantage, thus
boosting credibility with end users.
Executive Summary
All quality assurance teams use the process of
extract, transform and load (ETL) testing with
SQL scripting in conjuction with eyeballing the
data on Excel spreadsheets. This process can
take a huge amount of time and can be error-
prone due to human intervention. This process
is tedious because to validate data, the same
test SQL scripts need to be executed repeat-
edly. This can lead to a defect leakage due to
assorted, capacious and robust data. To test the
data effectively, the tester needs advanced data-
base skills that include writing complex join
queries and creating stored procedures, triggers
and SQL packages.
Manual methods of data validation can also
impact the project schedules and undermine
end-user confidence regarding data delivery (i.e.,
delivering data to users via flat files or on Web
sites). Moreover, data quality issues can under-
cut competitive advantage and have an indirect
impact on the long-term viability of a company
and its products.
Organizations can overcome these challenges by
mechanizing the data validation process. But that
raises an important question: How can this be
done without spending extra money? The answer
led us to consider Informatica‘s ETL testing tool.
This white paper demonstrates how Informatica
can be used to automate the data testing pro-
cess. It also illustrates how this tool can help
QE&A teams reduce the numbers of hours spent
on their activities, increase coverage and achieve
100% accuracy in validating the data. This means
that organizations can deliver complete, repeat-
able, auditable and trustable test coverage in less
time without extending basic SQL skill sets.
Data Validation Challenges
Consistency in the data received for ETL is a
perennial challenge. Typically, data received from
various sources lacks commonality in how it is
formatted and provided. And big data only makes
it more pressing an issue. Just a few years ago, 10
million records of data was considered a big deal.
Today, the volume of the data stored by enterpris-
es can be in the range of billions and trillions.
• Cognizant 20-20 Insights
cognizant 20-20 insights | december 2014
2cognizant 20-20 insights
Quick Take
Addressing Organizational Data Quality Issues
2cognizant 20-20 insights
DAY
1 Preparing
Data Update
PMO & Functional Managers
QA Team
ETL/DB Team
Receive Data
Apply ETL
on Data
DAY
2
Test Data in
QA Env & Sign-off
DAY
3
Test Data in
Prod Env & Sign-off
Functional Data
Validation in QA Env
Functional Data
Validation in
Prod Env (UAT)
Release to
Production
Data Release Cycle
Figure 1
Our experimentation with automated data vali-
dation with a U.S.-based client revealed that by
mechanizing the data validation process, data
quality issues can be completely eradicated. The
automation of the data validation process brings
the following value additions:
•	Provides a data validation platform which is
workable and sustainable for the long term.
•	Tailored, project-specific framework for data
quality testing.
•	Reduces turnaround time of each test execution
cycle.
•	Simplifies the test management process by
simplifying the test approach.
•	Increases test coverage along with greater
accuracy of validation.
The Data Release Cycle and Internal Challenges
This client releases product data sets on a peri-
odic basis, typically monthly. As a result, the data
volume in each release is huge. One product suite
has seven different products under its umbrella
and data is released in three phases per month.
Each phase has more than 50 million records to
be processed from each product within the suite.
Due to manual testing, the turnaround time for
each phase used to be three to five days, depend-
ing on the number of tasks involved in each phase.
Production release of the quality data is a huge
undertaking by the QE&A team, and it was a big
challenge to make business owners happy by
reducing the time-to-market (i.e., the time from
processing the data once it is received to releas-
ing it to the market). By using various automation
methods, we were able to reduce time-to-market
from between three and five days to between one
and three days (see Figure 1).
cognizant 20-20 insights 3
Reasons for accretion of voluminous data include:
•	 Executive management’s need to focus on
data-driven decision-making by using business
intelligence tools.
•	 Company-wide infrastructural changes such as
data center migrations.
•	 Mergers and acquisitions among data-produc-
ing companies.
•	 Business owners’ need to gain greater insight
into streamlining production, reducing time-to-
market and increasing product quality.
If the data is abundant, and from multiple sources,
there is a chance junk data can be present. Also,
odds are there is excessive duplication, null sets
and redundant data available in the assortment.
And due to mishandling, there is potential loss of
the data.
However, organizations must overcome these
challenges by having appropriate solutions in
place to avoid credibility issues. Thus, for data
warehousing and migration initiatives, data valida-
tion plays a vital role ensuring overall operational
effectiveness. But operational improvements are
never without their challenges, including:
•	 Data validation is significantly different from
conventional ways of testing. It requires more
advanced scripting skills in multiple SQL
servers such as Microsoft SQL 2008, Sybase
IQ, Vertica, Netizza, etc.
•	 Heterogeneity in the data sources leads to
mishandling of the interrelationships between
multiple data source formats.
•	 During application upgrades, making sure that
older application repository data is the same as
the data in the new repository.
•	 SQL query execution is tedious and
cumbersome, because of repetitious execution
of the queries.
•	 Missing test scenarios, due to manual execution
of queries.
•	 Total accuracy may not always be possible.
•	 Time taken for execution varies from one
person to another.
•	 Strict supervision is required with each test.
•	 The ETL process entails numerous stages; it
can be difficult to adopt a testing schedule
given the manual effort required.
•	 The quality assurance team needs progres-
sive elaboration (i.e., continuous improvement
of key processes) to standardize the process
due to complex architectures and multilayered
designs.
A Business-Driven Approach
to Data Validation
To meet the business demand for data validation,
we have developed a surefire and comprehensive
solution that can be utilized in various areas such
as data warehousing, data extraction, transfor-
mations, loading, database testing and flat-file
validation.
The Informatica tool that is used for the ETL pro-
cess can also be used as a validation tool to verify
the business rules associated with the data. This
tool has the capability to significantly reduce
manual effort and increase ETL productivity by
lowering costs, thereby improving the bottom line.
Our Data Validation Procedures as a Framework
There are four methods required to implement
a one-stop solution for addressing data quality
issues (see Figure 2).
Data Validation Methods
Figure 2
Informatica
Data Validation
DB Stored
Procedures
Selenium
Macros
cognizant 20-20 insights 4
Each methods has its own adoption procedures.
High-level details include the following:
Informatica Data Validation
The following activities are required to create
an Informatica data validation framework (see
Figure 3):
•	 Accrual of business rules from product/
business owners based on their expectations.
•	 Convert business rules into test scenarios and
test cases.
•	 Derive expected results of each test case
associated with each scenario.
•	 Write a SQL query for each of the test cases.
•	 Update the SQL test cases in input files (test
case basic info, SQL query).
•	 Create Informatica workflows to execute the
queries and update the results in the respective
SQL tables.
•	 Trigger Informatica workflows for executing
jobs and send e-mail notifications with
validation results.
Validate Comprehensive Data with Stored
Procedures
The following steps are required for data valida-
tion using stored procedures (see Figure 4, next
page):
•	 Prepare validation test scenarios.
•	 Convert test scenarios into test cases.
•	 Derive the expected results for all test cases.
•	 Write stored procedure-compatible SQL
queries that represent each test case.
•	 Compile all SQL queries as a package or
test build.
•	 Store all validation transact-SQL statements
in a single execution plan, calling it “stored
procedure.”
•	 Execute the stored procedure whenever any
data validation is carried out.
Source Files Apply Transformation
Rules
Staging Area
(SQL Server)
Export Flat Files
Sybase IQ
Warehouse
Quality
Assurance
(QA)
Test Cases
with Expected
Results
QA DB Tables
UPDATE TEST RESULTS
Validate Test
Cases
Test Case
Results
(Pass/Fail?)
Web ProductionEnd-Users
(External and Internal)
Test Case
Validation ReportE-mail
ETL ETL ETL
ETL ETLETL
ETL
ETL
ETL
ETL
ETL
PASS
ETL
FAIL
A Data Validation Framework: Pictorial View
Figure 3
cognizant 20-20 insights 5
Validating with Stored Procedures
Figure 4
One-to-One Data Comparision Using Macros
The following activities are required to handle
data validation (see Figure 5):
•	 Prepare validation test scenarios.
•	 Convert test scenarios into test cases.
•	 Derive a list of expected results for each test
scenario.
•	 Specify input parameters for a given test
scenario, if any.
•	 Write a macro to carry out validation work for
one-to-one data comparisions.
Selenium Functional Testing Automation
The following are required to perform data valida-
tion (see Figure 6, next page):
•	 Prepare validation test scenarios.
•	 Convert test scenarios into test cases.
•	 Derive an expected result for each test case.
•	 Specify input parameters for a given test
scenario.
•	 Derive test configuration data for setting up
the QA environment.
•	 Build a test suite that contains multiple test
builds according to test scenarios.
•	 Have a framework containing multiple test
suites.
•	 Execute the automation test suite per the
validation requirement.
•	 Analyze the test results and share those results
with project stakeholders.
Salient Solution Features,
Benefits Secured
The following features and benefits of our
framework were reinforced by a recent client
engagement (see sidebar, page 7).
Core Features
•	 Compatible with all database servers.
•	 Zero manual intervention for the execution of
validation queries.
•	 100% efficiency in validating the larger-scale
data.
•	 Reconciliation of production activities with the
help of automation.
•	 Reduces level of effort and resources required
to perform ETL testing.
Sybase IQ
Stored Procedure
EXECUTION
FACT TABLE
PRODUCTION
Test Case Test Case Desc Pass/Fail
Test_01 Accuracy Pass
Test_02 Min Period Fail
Test_03 Max Period Pass
Test_04 Time Check Pass
Test_05 Data Period Check Pass
Test_06 Change Values Pass
YES
FAIL
Marketing Files
Add Macro
Quality Assurance
Publications
Execute Macro
Sybase IQ
Data Warehouse
DATA 2
DATA 1
PASS
FAIL
Applying Macro Magic to
Data Validation
Figure 5
cognizant 20-20 insights 6
•	 Comprehensive test coverage ensures lower
business risk and greater confidence in data
quality.
•	 Remote scheduling of test activities.
Benefits Reaped
•	 Test case validation results are in user-friendly
formats like .csv, .xlsx and HTML.
•	 Validation results are stored for future
purposes.
•	 Reuse of test cases and SQL scripts for
regression testing.
•	 No scope for human errors.
•	 Supervision isn’t required while executing test
cases.
•	 100% accuracy in test case execution at all
times.
•	 Easy maintainance of SQL scripts and related
test cases.
•	 No variation in time taken for execution of test
cases.
•	 100% reliability on testing and its coverage.
The Bottom Line
As the digital age proceeds, it is very important
for organizations to progressively elaborate their
processes with suitable information and aware-
ness to drive business success. Hence, business
data collected from various operational sources
is cleansed and condolidated per the business
requirement to separate signals from noise. This
data is then stored in a protected environment,
for an extended time period.
Fine-tuning this data will help facilitate per-
formance management, tactical and strategic
decisions and the execution thereof for busi-
ness advantage. Well-organized business data
enables and empowers business owners to make
well-informed decisions. These decisions have
the capacity to drive competitive advantage for
an organization. On an average, organizations
lose $8.2 million annually due to poor data qual-
ity, according to industry research on the subject.
A study by B2B research firm Sirius Decisions
shows that by following best practices in data
quality, a company can boost its revenue by
66%.1
And market research by Information Week
found that 46% of those surveyed believe data
quality is a barrier that undermines business
intelligence mandates.2
Hence, it is safe to assume poor data quality is
undercutting many enterprises. Few have taken
the necessary steps to avoid jeopardizing their
businesses. By implementing the types of data
testing frameworks discussed above, compa-
nies can improve their processes by reducing the
time taken for ETL. This, in turn, will dramatically
reduce their time-to-market turnaround and sup-
port the management mantra of ”under-promise
and over-deliver.” Moreover, few organizations
need to spend extra money on these frame-
works given that existing infrastructure is being
used. This has a direct positive impact on a com-
pany’s bottom line since no additional overhead
is required to hire new human resources or add
additional infrastructure.
GUI: Web-Based
Application
Selenium
Automation
Functional Validation
by Selenium
Production GUI Users
FAIL
PASS
Facilitating Functional Test Automation
Figure 6
cognizant 20-20 insights 7
Looking Ahead
In the Internet age, where data is considered a
business currency, organizations must capitalize
on their return on investment in a most efficient
way to maintain their competitive edge. Hence,
data quality plays a pivotal role when making
strategic decisions.
The impact of poor information quality on busi-
ness can be measured with four different
magnitudes: increased costs, decreased revenues,
decreased confidence and increased risk. Thus it
is crucial for any organization to implement a
foolproof solution where a company can use its
own product to validate the quality and capabili-
ties of the product. In other words, adopting an
“eating your own dog food” ideology.
Having said that, it is necessary for any data-
driven business to focus on data quality, as poor
quality has a high probability of becoming the
major bottleneck.
Fixing an Information Services Data Quality Issue
Quick Take
This client is a U.S.-based leading financial ser-
vices provider for real estate professionals. The
services it provides include comprehensive data,
analytical and other related services. Powerful
insight gained from this knowledge provides the
perspective necessary to identify, understand and
take decisive action to effectively solve key busi-
ness challenges.
Challenges Faced
This client faced major challenges in the end-to-
end quality assurance of its data, as the majority
of the company’s test procedures were manual.
Because of these manual methods, turnaround
time or time-to-market of the data was greater
than its competitors. As such, the client wanted a
long-term solution to overcome this.
The Solution
Our QE&A team offered various solutions. The
focus areas were database and flat file valida-
tions. As we explained above, database testing
was automated by using Informatica and other
conventional methods such as the creation of
stored procedures and macros which were used
for validating the flat files.
Benefits
•	50% reduction in time-to-market.
•	100% test coverage.
•	82% automation capability.
•	Highly improved data quality.
Figure 7 illustrates the breakout of each method
used and their contributions to the entire QE&A
process.
Figure 7
18%
38%
11%
3%
30%
Manual
DB – Stored Procedure
Selenium
Macro
DB – Informatica
cognizant 20-20 insights 8
Footnotes
1	
“Data Quality Best Practices Boost Revenue by 66 Percent,” http://www.destinationcrm.com/Articles/
CRM-News/Daily-News/Data-Quality-Best-Practices-Boost-Revenue-by-66-Percent-52324.aspx.
2 	
Douglas Henschen, “Research: 2012 BI and Information Management,” http://reports.informationweek.
com/abstract/81/8574/Business-Intelligence-and-Information-Management/research-2012-bi-and-infor-
mation-management.html.
References
•	 CoreLogic U.S., Technical & Product Management, Providing IT Infrastructural Support and Business
Knowledge on the Data.
•	 Cognizant Quality Engineering & Assurance (QE&A) and Enterprise Information Management (EIM),
ETL QE&A Architectural Set Up and QE & A Best Practices.
•	 Ravi Kalakota & Marcia Robinson, E-Business 2.0: Roadmap for Success, “Chapter Four: Thinking
E-Business Design — More Than a Technology” and “Chapter Five: Constructing The E-Business Archi-
tecture-Enterprise Apps.”
•	 Jonathan G. Geiger, “Data Quality Management, The Most Critical Initiative You Can Implement,” Intel-
ligent Solutions, Inc., Boulder, CO, www2.sas.com/proceedings/sugi29/098-29.pdf.
•	 www.informatica.com/in/etl-testing/. (An article on Informatica’s proprietary Data Validation Option
available in its Data Integration Tool.)
•	 www.expressanalytics.net/index.php?option=com_content&view=article&id=10&Itemid=8. (Literature
on the importance of the data warehouse and business intelligence.)
•	 http://spotfire.tibco.com/blog/?p=7597. (Understanding the benefits of data warehousing.)
•	 www.ijsce.org/attachments/File/v3i1/A1391033113.pdf. (Significance of data warehousing and data
mining in business applications.)
•	 www.corelogic.com/about-us/our-company.aspx#container-Overview. (About the CoreLogic client.)
•	 www.cognizant.com/InsightsWhitepapers/Leveraging-Automated-Data-Validation-to-Reduce-Soft-
ware-Development-Timelines-and-Enhance-Test-Coverage.pdf. (A white paper on dataTestPro, a pro-
prietary tool by Cognizant used for automating the data validation process.)
About Cognizant
Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out-
sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in
Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry
and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 75
delivery centers worldwide and approximately 199,700 employees as of September 30, 2014, Cognizant is a member of
the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing
and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.
World Headquarters
500 Frank W. Burr Blvd.
Teaneck, NJ 07666 USA
Phone: +1 201 801 0233
Fax: +1 201 801 0243
Toll Free: +1 888 937 3277
Email: inquiry@cognizant.com
European Headquarters
1 Kingdom Street
Paddington Central
London W2 6BD
Phone: +44 (0) 20 7297 7600
Fax: +44 (0) 20 7121 0102
Email: infouk@cognizant.com
India Operations Headquarters
#5/535, Old Mahabalipuram Road
Okkiyam Pettai, Thoraipakkam
Chennai, 600 096 India
Phone: +91 (0) 44 4209 6000
Fax: +91 (0) 44 4209 6060
Email: inquiryindia@cognizant.com
­­© Copyright 2014, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is
subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.
About the Author
Vijay Kumar T V is a Senior Business Analyst on Cognizant’s QE&A team within the company’s Banking
and Financial Services Business Unit. He has 11-plus years of experience in business analysis, consult-
ing and quality engineering/assurance. Vijay has worked in various industry segments such as retail,
corporate, core banking, rental and mortage, and has an analytic background, predominantly in the
areas of data warehousing and business intelligence. His expertise involves automating the data ware-
house and business intelligence test practices to align with the client’s strategic business goals. Vijay
has also worked with a U.S.-based client on product development, business process optimization and
business requirement management. He holds a bachelor’s degree in mechanical engineering from
Bangalore University and a post-graduate certificate in business management from XLRI, Xavier School
of Management. Vijay can be reached at Vijay-20.Kumar-20@cognizant.com.
Ad

More Related Content

What's hot (20)

ETL QA
ETL QAETL QA
ETL QA
dillip kar
 
Performance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabasePerformance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL Database
Tung Nguyen Thanh
 
ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
Chetan Gadodia
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
RTTS
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
Komal Choudhary
 
Introduction of ssis
Introduction of ssisIntroduction of ssis
Introduction of ssis
deepakk073
 
Oracle GoldenGate
Oracle GoldenGate Oracle GoldenGate
Oracle GoldenGate
oracleonthebrain
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview training
Mondy Holten
 
SSIS Presentation
SSIS PresentationSSIS Presentation
SSIS Presentation
BarbaraBederman
 
Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti Bhushan
Kirti Bhushan
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
BizTalk360
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
ETL Testing Interview Questions and Answers
ETL Testing Interview Questions and AnswersETL Testing Interview Questions and Answers
ETL Testing Interview Questions and Answers
H2Kinfosys
 
ETL
ETLETL
ETL
Mallikarjuna G D
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Vivek Aanand Ganesan
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
BRIJESH KUMAR
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
CloverDX (formerly known as CloverETL)
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
obieefans
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
pcherukumalla
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
Adam Doyle
 
Performance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabasePerformance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL Database
Tung Nguyen Thanh
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
RTTS
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
Komal Choudhary
 
Introduction of ssis
Introduction of ssisIntroduction of ssis
Introduction of ssis
deepakk073
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview training
Mondy Holten
 
Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti Bhushan
Kirti Bhushan
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
BizTalk360
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
ETL Testing Interview Questions and Answers
ETL Testing Interview Questions and AnswersETL Testing Interview Questions and Answers
ETL Testing Interview Questions and Answers
H2Kinfosys
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
BRIJESH KUMAR
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
obieefans
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
pcherukumalla
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
Adam Doyle
 

Viewers also liked (20)

ClimbingRoutes
ClimbingRoutesClimbingRoutes
ClimbingRoutes
Debatri Mitra
 
The State of Logistics Outsourcing; 2010 Third Party Logistics Study
The State of Logistics Outsourcing; 2010 Third Party Logistics StudyThe State of Logistics Outsourcing; 2010 Third Party Logistics Study
The State of Logistics Outsourcing; 2010 Third Party Logistics Study
Dennis Wereldsma
 
Class 9
Class 9Class 9
Class 9
Dr. Ajith Sundaram
 
Class 7
Class 7Class 7
Class 7
Dr. Ajith Sundaram
 
Class 10
Class 10Class 10
Class 10
Dr. Ajith Sundaram
 
SAFE-SERIES: Offer Anti-Slip Solutions For The Industries
SAFE-SERIES: Offer Anti-Slip Solutions For The IndustriesSAFE-SERIES: Offer Anti-Slip Solutions For The Industries
SAFE-SERIES: Offer Anti-Slip Solutions For The Industries
treadwellgroup
 
1570265619
15702656191570265619
1570265619
Bob Gill, P.Eng. FEC
 
Tgs semantic pragmatic
Tgs semantic pragmaticTgs semantic pragmatic
Tgs semantic pragmatic
tugasCALL
 
Mycelium journey1
Mycelium journey1Mycelium journey1
Mycelium journey1
playwize
 
Design_writeup (1)
Design_writeup (1)Design_writeup (1)
Design_writeup (1)
Debatri Mitra
 
Assignment 4
Assignment  4Assignment  4
Assignment 4
Brandie Atkinson
 
User Experience-updated
User Experience-updatedUser Experience-updated
User Experience-updated
Michelle Lu
 
Reboarding the Sales Force: Leveraging Onboarding Principles to Keep Seasoned...
Reboarding the Sales Force: Leveraging Onboarding Principles to Keep Seasoned...Reboarding the Sales Force: Leveraging Onboarding Principles to Keep Seasoned...
Reboarding the Sales Force: Leveraging Onboarding Principles to Keep Seasoned...
QstreamInc
 
Poisoning
PoisoningPoisoning
Poisoning
Christian Villamera
 
Демографические угрозы современности
Демографические угрозы современностиДемографические угрозы современности
Демографические угрозы современности
sv_los
 
Sketching cakes and biscuits
Sketching cakes and biscuitsSketching cakes and biscuits
Sketching cakes and biscuits
lineandwash
 
Sim compile section b v01
Sim compile section b v01Sim compile section b v01
Sim compile section b v01
cloudyouzardn
 
Ieee 2013 2014 final year me,mtech students for cse,it java project titles
Ieee 2013 2014 final year me,mtech students for cse,it java project titlesIeee 2013 2014 final year me,mtech students for cse,it java project titles
Ieee 2013 2014 final year me,mtech students for cse,it java project titles
RICHBRAINTECH
 
Sim compile section c v01
Sim compile section c v01Sim compile section c v01
Sim compile section c v01
cloudyouzardn
 
The State of Logistics Outsourcing; 2010 Third Party Logistics Study
The State of Logistics Outsourcing; 2010 Third Party Logistics StudyThe State of Logistics Outsourcing; 2010 Third Party Logistics Study
The State of Logistics Outsourcing; 2010 Third Party Logistics Study
Dennis Wereldsma
 
SAFE-SERIES: Offer Anti-Slip Solutions For The Industries
SAFE-SERIES: Offer Anti-Slip Solutions For The IndustriesSAFE-SERIES: Offer Anti-Slip Solutions For The Industries
SAFE-SERIES: Offer Anti-Slip Solutions For The Industries
treadwellgroup
 
Tgs semantic pragmatic
Tgs semantic pragmaticTgs semantic pragmatic
Tgs semantic pragmatic
tugasCALL
 
Mycelium journey1
Mycelium journey1Mycelium journey1
Mycelium journey1
playwize
 
User Experience-updated
User Experience-updatedUser Experience-updated
User Experience-updated
Michelle Lu
 
Reboarding the Sales Force: Leveraging Onboarding Principles to Keep Seasoned...
Reboarding the Sales Force: Leveraging Onboarding Principles to Keep Seasoned...Reboarding the Sales Force: Leveraging Onboarding Principles to Keep Seasoned...
Reboarding the Sales Force: Leveraging Onboarding Principles to Keep Seasoned...
QstreamInc
 
Демографические угрозы современности
Демографические угрозы современностиДемографические угрозы современности
Демографические угрозы современности
sv_los
 
Sketching cakes and biscuits
Sketching cakes and biscuitsSketching cakes and biscuits
Sketching cakes and biscuits
lineandwash
 
Sim compile section b v01
Sim compile section b v01Sim compile section b v01
Sim compile section b v01
cloudyouzardn
 
Ieee 2013 2014 final year me,mtech students for cse,it java project titles
Ieee 2013 2014 final year me,mtech students for cse,it java project titlesIeee 2013 2014 final year me,mtech students for cse,it java project titles
Ieee 2013 2014 final year me,mtech students for cse,it java project titles
RICHBRAINTECH
 
Sim compile section c v01
Sim compile section c v01Sim compile section c v01
Sim compile section c v01
cloudyouzardn
 
Ad

Similar to Etl testing strategies (20)

Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
Cognizant
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Cognizant
 
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQAutomate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
iceDQ
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesBuilding a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Cognizant
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
RTTS
 
DWBI Testing and Analytics Testing Services
DWBI Testing and Analytics Testing ServicesDWBI Testing and Analytics Testing Services
DWBI Testing and Analytics Testing Services
CODETRU Software Solutions
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
RTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
RTTS
 
rizwan cse exp resume
rizwan cse exp resumerizwan cse exp resume
rizwan cse exp resume
shaik rizwan
 
Copy of Alok_Singh_CV
Copy of Alok_Singh_CVCopy of Alok_Singh_CV
Copy of Alok_Singh_CV
Alok Singh
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward
Vinny (Gurvinder) Ahuja
 
Jithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL TestingJithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL Testing
jithenderReddy Gunda
 
Aniruddha roy resume
Aniruddha roy resumeAniruddha roy resume
Aniruddha roy resume
ANIRUDDHA ROY
 
Varsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8YearsVarsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8Years
Varsha Hiremath
 
Resume sailaja
Resume sailajaResume sailaja
Resume sailaja
SailajaPrasadMohanty
 
Soumya sree Sridharala
Soumya sree SridharalaSoumya sree Sridharala
Soumya sree Sridharala
Soumya Sree Sridharala
 
Anu_Sharma2016_DWH
Anu_Sharma2016_DWHAnu_Sharma2016_DWH
Anu_Sharma2016_DWH
Anu Sharma
 
Test data management
Test data managementTest data management
Test data management
Rohit Gupta
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 
Etl testing
Etl testingEtl testing
Etl testing
Krishna Prasad
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
Cognizant
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Cognizant
 
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQAutomate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
iceDQ
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesBuilding a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Cognizant
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
RTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
RTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
RTTS
 
rizwan cse exp resume
rizwan cse exp resumerizwan cse exp resume
rizwan cse exp resume
shaik rizwan
 
Copy of Alok_Singh_CV
Copy of Alok_Singh_CVCopy of Alok_Singh_CV
Copy of Alok_Singh_CV
Alok Singh
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward
Vinny (Gurvinder) Ahuja
 
Aniruddha roy resume
Aniruddha roy resumeAniruddha roy resume
Aniruddha roy resume
ANIRUDDHA ROY
 
Varsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8YearsVarsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8Years
Varsha Hiremath
 
Anu_Sharma2016_DWH
Anu_Sharma2016_DWHAnu_Sharma2016_DWH
Anu_Sharma2016_DWH
Anu Sharma
 
Test data management
Test data managementTest data management
Test data management
Rohit Gupta
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 
Ad

Recently uploaded (20)

Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 

Etl testing strategies

  • 1. Deliver Trusted Data by Leveraging ETL Testing Data-rich organizations seeking to assure data quality can systemize the validation process by leveraging automated testing to increase coverage, accuracy and competitive advantage, thus boosting credibility with end users. Executive Summary All quality assurance teams use the process of extract, transform and load (ETL) testing with SQL scripting in conjuction with eyeballing the data on Excel spreadsheets. This process can take a huge amount of time and can be error- prone due to human intervention. This process is tedious because to validate data, the same test SQL scripts need to be executed repeat- edly. This can lead to a defect leakage due to assorted, capacious and robust data. To test the data effectively, the tester needs advanced data- base skills that include writing complex join queries and creating stored procedures, triggers and SQL packages. Manual methods of data validation can also impact the project schedules and undermine end-user confidence regarding data delivery (i.e., delivering data to users via flat files or on Web sites). Moreover, data quality issues can under- cut competitive advantage and have an indirect impact on the long-term viability of a company and its products. Organizations can overcome these challenges by mechanizing the data validation process. But that raises an important question: How can this be done without spending extra money? The answer led us to consider Informatica‘s ETL testing tool. This white paper demonstrates how Informatica can be used to automate the data testing pro- cess. It also illustrates how this tool can help QE&A teams reduce the numbers of hours spent on their activities, increase coverage and achieve 100% accuracy in validating the data. This means that organizations can deliver complete, repeat- able, auditable and trustable test coverage in less time without extending basic SQL skill sets. Data Validation Challenges Consistency in the data received for ETL is a perennial challenge. Typically, data received from various sources lacks commonality in how it is formatted and provided. And big data only makes it more pressing an issue. Just a few years ago, 10 million records of data was considered a big deal. Today, the volume of the data stored by enterpris- es can be in the range of billions and trillions. • Cognizant 20-20 Insights cognizant 20-20 insights | december 2014
  • 2. 2cognizant 20-20 insights Quick Take Addressing Organizational Data Quality Issues 2cognizant 20-20 insights DAY 1 Preparing Data Update PMO & Functional Managers QA Team ETL/DB Team Receive Data Apply ETL on Data DAY 2 Test Data in QA Env & Sign-off DAY 3 Test Data in Prod Env & Sign-off Functional Data Validation in QA Env Functional Data Validation in Prod Env (UAT) Release to Production Data Release Cycle Figure 1 Our experimentation with automated data vali- dation with a U.S.-based client revealed that by mechanizing the data validation process, data quality issues can be completely eradicated. The automation of the data validation process brings the following value additions: • Provides a data validation platform which is workable and sustainable for the long term. • Tailored, project-specific framework for data quality testing. • Reduces turnaround time of each test execution cycle. • Simplifies the test management process by simplifying the test approach. • Increases test coverage along with greater accuracy of validation. The Data Release Cycle and Internal Challenges This client releases product data sets on a peri- odic basis, typically monthly. As a result, the data volume in each release is huge. One product suite has seven different products under its umbrella and data is released in three phases per month. Each phase has more than 50 million records to be processed from each product within the suite. Due to manual testing, the turnaround time for each phase used to be three to five days, depend- ing on the number of tasks involved in each phase. Production release of the quality data is a huge undertaking by the QE&A team, and it was a big challenge to make business owners happy by reducing the time-to-market (i.e., the time from processing the data once it is received to releas- ing it to the market). By using various automation methods, we were able to reduce time-to-market from between three and five days to between one and three days (see Figure 1).
  • 3. cognizant 20-20 insights 3 Reasons for accretion of voluminous data include: • Executive management’s need to focus on data-driven decision-making by using business intelligence tools. • Company-wide infrastructural changes such as data center migrations. • Mergers and acquisitions among data-produc- ing companies. • Business owners’ need to gain greater insight into streamlining production, reducing time-to- market and increasing product quality. If the data is abundant, and from multiple sources, there is a chance junk data can be present. Also, odds are there is excessive duplication, null sets and redundant data available in the assortment. And due to mishandling, there is potential loss of the data. However, organizations must overcome these challenges by having appropriate solutions in place to avoid credibility issues. Thus, for data warehousing and migration initiatives, data valida- tion plays a vital role ensuring overall operational effectiveness. But operational improvements are never without their challenges, including: • Data validation is significantly different from conventional ways of testing. It requires more advanced scripting skills in multiple SQL servers such as Microsoft SQL 2008, Sybase IQ, Vertica, Netizza, etc. • Heterogeneity in the data sources leads to mishandling of the interrelationships between multiple data source formats. • During application upgrades, making sure that older application repository data is the same as the data in the new repository. • SQL query execution is tedious and cumbersome, because of repetitious execution of the queries. • Missing test scenarios, due to manual execution of queries. • Total accuracy may not always be possible. • Time taken for execution varies from one person to another. • Strict supervision is required with each test. • The ETL process entails numerous stages; it can be difficult to adopt a testing schedule given the manual effort required. • The quality assurance team needs progres- sive elaboration (i.e., continuous improvement of key processes) to standardize the process due to complex architectures and multilayered designs. A Business-Driven Approach to Data Validation To meet the business demand for data validation, we have developed a surefire and comprehensive solution that can be utilized in various areas such as data warehousing, data extraction, transfor- mations, loading, database testing and flat-file validation. The Informatica tool that is used for the ETL pro- cess can also be used as a validation tool to verify the business rules associated with the data. This tool has the capability to significantly reduce manual effort and increase ETL productivity by lowering costs, thereby improving the bottom line. Our Data Validation Procedures as a Framework There are four methods required to implement a one-stop solution for addressing data quality issues (see Figure 2). Data Validation Methods Figure 2 Informatica Data Validation DB Stored Procedures Selenium Macros
  • 4. cognizant 20-20 insights 4 Each methods has its own adoption procedures. High-level details include the following: Informatica Data Validation The following activities are required to create an Informatica data validation framework (see Figure 3): • Accrual of business rules from product/ business owners based on their expectations. • Convert business rules into test scenarios and test cases. • Derive expected results of each test case associated with each scenario. • Write a SQL query for each of the test cases. • Update the SQL test cases in input files (test case basic info, SQL query). • Create Informatica workflows to execute the queries and update the results in the respective SQL tables. • Trigger Informatica workflows for executing jobs and send e-mail notifications with validation results. Validate Comprehensive Data with Stored Procedures The following steps are required for data valida- tion using stored procedures (see Figure 4, next page): • Prepare validation test scenarios. • Convert test scenarios into test cases. • Derive the expected results for all test cases. • Write stored procedure-compatible SQL queries that represent each test case. • Compile all SQL queries as a package or test build. • Store all validation transact-SQL statements in a single execution plan, calling it “stored procedure.” • Execute the stored procedure whenever any data validation is carried out. Source Files Apply Transformation Rules Staging Area (SQL Server) Export Flat Files Sybase IQ Warehouse Quality Assurance (QA) Test Cases with Expected Results QA DB Tables UPDATE TEST RESULTS Validate Test Cases Test Case Results (Pass/Fail?) Web ProductionEnd-Users (External and Internal) Test Case Validation ReportE-mail ETL ETL ETL ETL ETLETL ETL ETL ETL ETL ETL PASS ETL FAIL A Data Validation Framework: Pictorial View Figure 3
  • 5. cognizant 20-20 insights 5 Validating with Stored Procedures Figure 4 One-to-One Data Comparision Using Macros The following activities are required to handle data validation (see Figure 5): • Prepare validation test scenarios. • Convert test scenarios into test cases. • Derive a list of expected results for each test scenario. • Specify input parameters for a given test scenario, if any. • Write a macro to carry out validation work for one-to-one data comparisions. Selenium Functional Testing Automation The following are required to perform data valida- tion (see Figure 6, next page): • Prepare validation test scenarios. • Convert test scenarios into test cases. • Derive an expected result for each test case. • Specify input parameters for a given test scenario. • Derive test configuration data for setting up the QA environment. • Build a test suite that contains multiple test builds according to test scenarios. • Have a framework containing multiple test suites. • Execute the automation test suite per the validation requirement. • Analyze the test results and share those results with project stakeholders. Salient Solution Features, Benefits Secured The following features and benefits of our framework were reinforced by a recent client engagement (see sidebar, page 7). Core Features • Compatible with all database servers. • Zero manual intervention for the execution of validation queries. • 100% efficiency in validating the larger-scale data. • Reconciliation of production activities with the help of automation. • Reduces level of effort and resources required to perform ETL testing. Sybase IQ Stored Procedure EXECUTION FACT TABLE PRODUCTION Test Case Test Case Desc Pass/Fail Test_01 Accuracy Pass Test_02 Min Period Fail Test_03 Max Period Pass Test_04 Time Check Pass Test_05 Data Period Check Pass Test_06 Change Values Pass YES FAIL Marketing Files Add Macro Quality Assurance Publications Execute Macro Sybase IQ Data Warehouse DATA 2 DATA 1 PASS FAIL Applying Macro Magic to Data Validation Figure 5
  • 6. cognizant 20-20 insights 6 • Comprehensive test coverage ensures lower business risk and greater confidence in data quality. • Remote scheduling of test activities. Benefits Reaped • Test case validation results are in user-friendly formats like .csv, .xlsx and HTML. • Validation results are stored for future purposes. • Reuse of test cases and SQL scripts for regression testing. • No scope for human errors. • Supervision isn’t required while executing test cases. • 100% accuracy in test case execution at all times. • Easy maintainance of SQL scripts and related test cases. • No variation in time taken for execution of test cases. • 100% reliability on testing and its coverage. The Bottom Line As the digital age proceeds, it is very important for organizations to progressively elaborate their processes with suitable information and aware- ness to drive business success. Hence, business data collected from various operational sources is cleansed and condolidated per the business requirement to separate signals from noise. This data is then stored in a protected environment, for an extended time period. Fine-tuning this data will help facilitate per- formance management, tactical and strategic decisions and the execution thereof for busi- ness advantage. Well-organized business data enables and empowers business owners to make well-informed decisions. These decisions have the capacity to drive competitive advantage for an organization. On an average, organizations lose $8.2 million annually due to poor data qual- ity, according to industry research on the subject. A study by B2B research firm Sirius Decisions shows that by following best practices in data quality, a company can boost its revenue by 66%.1 And market research by Information Week found that 46% of those surveyed believe data quality is a barrier that undermines business intelligence mandates.2 Hence, it is safe to assume poor data quality is undercutting many enterprises. Few have taken the necessary steps to avoid jeopardizing their businesses. By implementing the types of data testing frameworks discussed above, compa- nies can improve their processes by reducing the time taken for ETL. This, in turn, will dramatically reduce their time-to-market turnaround and sup- port the management mantra of ”under-promise and over-deliver.” Moreover, few organizations need to spend extra money on these frame- works given that existing infrastructure is being used. This has a direct positive impact on a com- pany’s bottom line since no additional overhead is required to hire new human resources or add additional infrastructure. GUI: Web-Based Application Selenium Automation Functional Validation by Selenium Production GUI Users FAIL PASS Facilitating Functional Test Automation Figure 6
  • 7. cognizant 20-20 insights 7 Looking Ahead In the Internet age, where data is considered a business currency, organizations must capitalize on their return on investment in a most efficient way to maintain their competitive edge. Hence, data quality plays a pivotal role when making strategic decisions. The impact of poor information quality on busi- ness can be measured with four different magnitudes: increased costs, decreased revenues, decreased confidence and increased risk. Thus it is crucial for any organization to implement a foolproof solution where a company can use its own product to validate the quality and capabili- ties of the product. In other words, adopting an “eating your own dog food” ideology. Having said that, it is necessary for any data- driven business to focus on data quality, as poor quality has a high probability of becoming the major bottleneck. Fixing an Information Services Data Quality Issue Quick Take This client is a U.S.-based leading financial ser- vices provider for real estate professionals. The services it provides include comprehensive data, analytical and other related services. Powerful insight gained from this knowledge provides the perspective necessary to identify, understand and take decisive action to effectively solve key busi- ness challenges. Challenges Faced This client faced major challenges in the end-to- end quality assurance of its data, as the majority of the company’s test procedures were manual. Because of these manual methods, turnaround time or time-to-market of the data was greater than its competitors. As such, the client wanted a long-term solution to overcome this. The Solution Our QE&A team offered various solutions. The focus areas were database and flat file valida- tions. As we explained above, database testing was automated by using Informatica and other conventional methods such as the creation of stored procedures and macros which were used for validating the flat files. Benefits • 50% reduction in time-to-market. • 100% test coverage. • 82% automation capability. • Highly improved data quality. Figure 7 illustrates the breakout of each method used and their contributions to the entire QE&A process. Figure 7 18% 38% 11% 3% 30% Manual DB – Stored Procedure Selenium Macro DB – Informatica
  • 8. cognizant 20-20 insights 8 Footnotes 1 “Data Quality Best Practices Boost Revenue by 66 Percent,” http://www.destinationcrm.com/Articles/ CRM-News/Daily-News/Data-Quality-Best-Practices-Boost-Revenue-by-66-Percent-52324.aspx. 2 Douglas Henschen, “Research: 2012 BI and Information Management,” http://reports.informationweek. com/abstract/81/8574/Business-Intelligence-and-Information-Management/research-2012-bi-and-infor- mation-management.html. References • CoreLogic U.S., Technical & Product Management, Providing IT Infrastructural Support and Business Knowledge on the Data. • Cognizant Quality Engineering & Assurance (QE&A) and Enterprise Information Management (EIM), ETL QE&A Architectural Set Up and QE & A Best Practices. • Ravi Kalakota & Marcia Robinson, E-Business 2.0: Roadmap for Success, “Chapter Four: Thinking E-Business Design — More Than a Technology” and “Chapter Five: Constructing The E-Business Archi- tecture-Enterprise Apps.” • Jonathan G. Geiger, “Data Quality Management, The Most Critical Initiative You Can Implement,” Intel- ligent Solutions, Inc., Boulder, CO, www2.sas.com/proceedings/sugi29/098-29.pdf. • www.informatica.com/in/etl-testing/. (An article on Informatica’s proprietary Data Validation Option available in its Data Integration Tool.) • www.expressanalytics.net/index.php?option=com_content&view=article&id=10&Itemid=8. (Literature on the importance of the data warehouse and business intelligence.) • http://spotfire.tibco.com/blog/?p=7597. (Understanding the benefits of data warehousing.) • www.ijsce.org/attachments/File/v3i1/A1391033113.pdf. (Significance of data warehousing and data mining in business applications.) • www.corelogic.com/about-us/our-company.aspx#container-Overview. (About the CoreLogic client.) • www.cognizant.com/InsightsWhitepapers/Leveraging-Automated-Data-Validation-to-Reduce-Soft- ware-Development-Timelines-and-Enhance-Test-Coverage.pdf. (A white paper on dataTestPro, a pro- prietary tool by Cognizant used for automating the data validation process.)
  • 9. About Cognizant Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out- sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 75 delivery centers worldwide and approximately 199,700 employees as of September 30, 2014, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant. World Headquarters 500 Frank W. Burr Blvd. Teaneck, NJ 07666 USA Phone: +1 201 801 0233 Fax: +1 201 801 0243 Toll Free: +1 888 937 3277 Email: [email protected] European Headquarters 1 Kingdom Street Paddington Central London W2 6BD Phone: +44 (0) 20 7297 7600 Fax: +44 (0) 20 7121 0102 Email: [email protected] India Operations Headquarters #5/535, Old Mahabalipuram Road Okkiyam Pettai, Thoraipakkam Chennai, 600 096 India Phone: +91 (0) 44 4209 6000 Fax: +91 (0) 44 4209 6060 Email: [email protected] ­­© Copyright 2014, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners. About the Author Vijay Kumar T V is a Senior Business Analyst on Cognizant’s QE&A team within the company’s Banking and Financial Services Business Unit. He has 11-plus years of experience in business analysis, consult- ing and quality engineering/assurance. Vijay has worked in various industry segments such as retail, corporate, core banking, rental and mortage, and has an analytic background, predominantly in the areas of data warehousing and business intelligence. His expertise involves automating the data ware- house and business intelligence test practices to align with the client’s strategic business goals. Vijay has also worked with a U.S.-based client on product development, business process optimization and business requirement management. He holds a bachelor’s degree in mechanical engineering from Bangalore University and a post-graduate certificate in business management from XLRI, Xavier School of Management. Vijay can be reached at [email protected].