0% found this document useful (0 votes)

112 views

ETL Interview Questions

The document discusses Extract, Transform, Load (ETL) processes. It defines ETL as extracting data from source systems, transforming or modifying the data, and loading it into a target database. It describes different types of transformations and loads in ETL, including active vs passive transformations and full vs incremental loads. It also provides examples of ETL tools and components of the Informatica ETL tool.

Uploaded by

suman duggi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views

ETL Interview Questions

Uploaded by

suman duggi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

What is ETL?

Extract – Extracting the data from source system

Transform – Transforming or modifying the data into format what business required
Load – Loading into target database

What are the transformation types?

Active Transformation
The output record count of the transformation may or may not equal to input record count.
For example, when we apply filter transformation for age column with the condition of age
between 25 and 40. In this case, the data will come out which satisfies this condition, hence the
outcome count cannot be predicted.
Passive Transformation
The output record count of the transformation is equal to input record count.
For example, when we apply expression transformation to concatenate first name and last name
columns, in this case, the data will come out even though the columns do not have values.
Connected Transformation
A transformation which is being linked with other transformation or target component is called
connected.
Unconnected Transformation
A transformation which is not being linked with any other transformation or target component is
called unconnected.
What are the types of load?
Full Load (Initial Load or Bulk Load or Fresh Load) –
The data loading process when we do it at very first time. It can be referred as Bulk load or Fresh
load.
The job extracts entire volume of data from source table or file and loading into truncated target
table after applying transformations logics
Incremental Load (Refresh Load or Daily Load or Change Data Capture) –
The modified data alone will be updated in target followed by full load. The changes will be
captured by comparing created or modified date against last run date of the job.
The modified data alone extracted from the source, the job will look for changes in the source
table against job run table, if change exists then data will be extracted and that data alone will be
updated in the target without impacting the existing data.
Name some ETL tools
Informatica Power center
Talend Open Studio
IBM Datastage
SQL Server Integration Services (SSIS)
Ab-initio
Oracle Data Integrator
SAS – Data Integration Studio
SAP – Business Object Integrator
Clover ETL
Pentaho Data Integration

Explain the scenarios for testing source to a staging table.

– Verify the table structure of staging table (columns, data type, length, constraints, index)
-Verify the successful workflow (ETL job) run
-Verify the data count between source and staging table
-Verify the data comparison between source table and staging table
-Verify the duplicate data checking, duplicate data should not be loaded into staging table
-Verify the excess trailing space trimmed for all Varchar data type columns
-Verify the job consistency by performing subsequent run
-Verify the job failure runs behavior
-Verify the job re-run success scenario after failure correction
-Verify the job run with bad data (NULL values, exceeding precisions, lookup or reference
data not exists)
-Verify the job performance timing
How do you ensure that all source table data’s are loaded into target table?

-Using SET operator MINUS – if both source and target tables are in the same database server.

-Using macro – both source table and target table data will be copied into an excel and compared
with macro
-Using Automation tools – tool will fetch data and compares internally with own algorithm

-Using utility tools – develop an automation utility tool using Java or any scripting language
along with database drivers
Give an example for Low severity and High priority defect.
-There is a requirement where email notification needs to be triggered in case of job failure
-There is a deviation found during testing that the email notification has been received but the
number of records count in the content is not matching
–Low severity – since it does not affect any functionality
–High priority – since the wrong data count shows the wrong picture to the management team
What are the components of Informatica?
One of the major tool in worldwide. Majorly this tool is using for ETL, data masking, and data
quality.
It has four major components,
1. Repository manager – to add repository and managing folders
2. Designer – creating mappings
3. Workflow manager – creating workflow with task and mappings
4. Workflow monitor – workflow run status tracker

What are tasks available in Informatica?

The below are major tasks available in Informatica power center tool.
1.Session
2.Email
3.Command
4.Control
5.Decision
6.Timer
Database testing vs ETL testing
ETL Testing – Making sure that the data from source to target is being loaded properly or not
along with the business transformation rules.
Database Testing – Testing whether the data is being stored properly in the database when we do
some operations from the front end or back end along with testing of procedures, functions, and
triggers. Testing whether the data is being retrieved properly in UI.
What are the responsibilities of an ETL tester?
 Understanding Requirement
 Estimating
 Planning
 Test case preparation
 Test execution
 Giving Sign off

What does a mapping document contain?

A mapping document contains,
1. Columns mapping between source and target
2. Data type and length for all columns of source and target
3. Transformation logic for each column
4. ETL job or workflow information
5. Input parameter file information

What kind of defects can expect?

 Table structure issue
 Index unable to drop issue
 Index is not created after job run
 Data issue in source table
 Data count mismatch between source and target
 Data not matching between source and target
 Duplicate data loaded issue
 Trim and NULL issue
 Data precision issue
 Date format issue
 Business transformation rules issue
 Subsequent job run not working properly
 Running job with bad data does not kick off the bad data’s properly
 Rollback is not happening in case of job failure
 Performance issue
 Log file and content issue
 Mail notification and content issue

1000 records are in the source table, but only 900 records are loaded into the target table.
How do you find the missing 100 records?

-Using SET operator MINUS – if both source and target tables are in the same database server.

-Using Excel macro – both source table and target table data will be copied into an excel and
compared with macro
-Using Automation tool – tool will fetch data and compares internally with own algorithm

-Using Utility tool – develop an automation utility tool using Java or any scripting language
along with database drivers
Can you give few test cases to test the incremental load table?
–Insert few records and validate the data after job run
–Update non-primary column values and validate the data after job run
–Update primary column values and validate the data after job run
–Delete few records and validate the data after job run
–Insert/update few records to create duplicate entries and validate the data after job run
–Update with bad data – NULL values, blank spaces, lookup data missing
How do you compare a flat file and database table?

-Manual Sampling method – manually compared in sampling basis

-Using Excel macro – flat file data and target table data will be copied into an excel and
compared with macro

-Using Automation tool – tool will fetch data and compares internally with own algorithm

-Using Utility tool – develop an automation utility tool using Java or any scripting language
along with database drivers

Slowly Changing Dimensions (SCD)

The "Slowly Changing Dimension" problem is a common one particular to data warehousing. In a
nutshell, this applies to cases where the attribute for a record varies over time. We give an example
below:

Christina is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in the
customer lookup table has the following record:

Customer Key Name State

1001 Christina Illinois
At a later date, she moved to Los Angeles, California on January, 2003. How should ABC Inc. now
modify its customer table to reflect this change? This is the "Slowly Changing Dimension" problem.
There are in general three ways to solve this type of problem, and they are categorized as follows:

Slowly Changing Dimension Type 1: The new record replaces the original record. No trace
of the old record exists.

In Type 1 Slowly Changing Dimension, the new information simply overwrites the original
information. In other words, no history is kept.

In our example, recall we originally have the following table:

Customer Key Name State

1001 Christina Illinois
After Christina moved from Illinois to California, the new information replaces the new record, and
we have the following table:

Customer Key Name State

1001 Christina California
Advantages:
- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to
keep track of the old information.

Disadvantages:
- All history is lost. By applying this methodology, it is not possible to trace back in history. For
example, in this case, the company would not be able to know that Christina lived in Illinois before.

Usage:
About 50% of the time.

When to use Type 1:

Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to
keep track of historical changes.

Slowly Changing Dimension Type 2: A new record is added into the
customer dimension table. Therefore, the customer is treated essentially as two people.

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new
information. Therefore, both the original and the new record will be present. The newe record gets its
own primary key.

In our example, recall we originally have the following table:

Customer Key Name State

1001 Christina Illinois
After Christina moved from Illinois to California, we add the new information as a new row into the
table:

Customer Key Name State

1001 Christina Illinois
1005 Christina California
Advantages:
- This allows us to accurately keep all historical information.

Disadvantages:
- This will cause the size of the table to grow fast. In cases where the number of rows for the table is
very high to start with, storage and performance can become a concern.
- This necessarily complicates the ETL process.

Usage:
About 50% of the time.
When to use Type 2:
Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to
track historical changes.

Slowly Changing Dimension Type 3: The original record is modified to reflect the change.

In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute
of interest, one indicating the original value, and one indicating the current value. There will also be
a column that indicates when the current value becomes active.

In our example, recall we originally have the following table:

Customer Key Name State

1001 Christina Illinois
To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:
o Customer Key
o Name
o Original State
o Current State
o Effective Date
After Christina moved from Illinois to California, the original information gets updated, and we have
the following table (assuming the effective date of change is January 15, 2003):

Customer Key Name Original State Current State Effective Date

1001 Christina Illinois California 15-JAN-2003
Advantages:
- This does not increase the size of the table, since new information is updated.
- This allows us to keep some part of history.

Disadvantages:
- Type 3 will not be able to keep all history where an attribute is changed more than once. For
example, if Christina later moves to Texas on December 15, 2003, the California information will be
lost.

Usage:
Type 3 is rarely used in actual practice.

When to use Type 3:

Type III slowly changing dimension should only be used when it is necessary for the data warehouse
to track historical changes, and when such changes will only occur for a finite number of time.

1. What is Data warehouse

Ans: A Data warehouse is a subject oriented, integrated ,time variant, non volatile collection of data
in support of management's decision making process.
Subject oriented : means that the data addresses a specific subject such as sales, inventory etc.
Integrated : means that the data is obtained from a variety of sources.
Time variant : implies that the data is stored in such a way that when some data is changed.
Non volatile : implies that data is never removed. i.e., historical data is also kept.

2. What is the difference between database and data warehouse

Ans: A database is a collection of related data. Where as Data Warehouse stores historical data, the
business users take their decisions based on historical data only.

3. What is the difference between dimensional table and fact table

Ans: A dimension table consists of tuples of attributes of the dimension. A fact table can be thought
of as having tuples, one per a recorded fact. This fact contains some measured or observed variables
and identifies them with pointers to dimension tables.

4. What is the difference between Data Mining and Data Warehousing

Ans: Data mining - analyzing data from different perspectives and concluding it into useful decision
making information. It can be used to increase revenue, cost cutting, increase productivity or improve
any business process. There are lot of tools available in market for various industries to do data
mining. Basically, it is all about finding correlations or patterns in large relational databases.

Data warehousing comes before data mining. It is the process of compiling and organizing data into
one database from various source systems where as data mining is the process of extracting
meaningful data from that database (data warehouse).
5. What is Data Mart
Ans : A data mart is a simple form of a data warehouse that is focused on a single subject (or
functional area), such as Sales, Finance, or Marketing. Data marts are often built and controlled by a
single department within an organization. Given their single-subject focus, data marts usually draw
data from only a few sources. The sources could be internal operational systems, a central data
warehouse, or external data.

6. Difference between OLTP and OLAP

Ans: Online transactional processing (OLTP) is designed to efficiently process high volumes of
transactions, instantly recording business events (such as a sales invoice payment) and reflecting
changes as they occur.
Online analytical processing (OLAP) is designed for analysis and decision support, allowing
exploration of often hidden relationships in large amounts of data by providing unlimited views of
multiple relationships at any cross-section of defined business dimensions.

7. What is ETL?
Ans: ETL - extract, transform, and load.
Extracting data from outside source systems.
Transforming raw data to make it fit for use by different departments.
Loading transformed data into target systems like data mart or data warehouse.

8. Why ETL testing is required

Ans: To verify the correctness of data transformation against the signed off business requirements
and rules.

To verify that expected data is loaded into data mart or data warehouse without loss of any data.

To validate the accuracy of reconciliation reports (if any e.g. in case of comparison of report of
transactions made via bank ATM – ATM report vs. Bank Account Report).

To make sure complete process meet performance and scalability requirements

Data security is also sometimes part of ETL testing

To evaluate the reporting efficiency

9. What are ETL tester responsibilities

Ans :An ETL tester is responsible for writing SQL queries for various scenarios. They run a
number of tests including primary key, duplicate, default, and attribute tests of the process. In
addition, they are in charge of running record count checks as well as reconciling records with
source data. They also confirm the quality of the data and the loading process overall.
10. What are the Key benefits of ETL Testing
Ans: Minimise the risk of Data loss
Data Security
Data Accuracy
Reporting effciency

11. To get the list of tables and views in Database

Ans : SELECT * FROM information_schema.tables (will display both tables,views)
SELECT * FROM information_schema.views (will display on views)

12. List the details about “SMITH”

Ans: Select * from employee where last_name=’SMITH’;

13. List out the employees who are working in department 20

Ans: Select * from employee where department_id=20

14. List out the employees who are earning salary between 3000 and 4500
Ans: Select * from employee where salary between 3000 and 4500

15. List out the employees who are working in department 10 or 20

Ans: Select * from employee where department_id in (20,30)

16. Find out the employees who are not working in department 10 or 30

Ans :Select last_name, salary, commission, department_id from employee where department_id not
in (10,30)

17. List out the employees whose name starts with “S”

Ans: Select * from employee where last_name like ‘S%’

18. List out the employees whose name start with “S” and end with “H”
Ans: Select * from employee where last_name like ‘S%H’

19.   List out the employees whose name length is 4 and start with “S”
Ans : Select * from employee where last_name like ‘S___’
20.   List out the employees who are working in department 10 and draw the salaries more than
3500
Ans:  Select * from employee where department_id=10 and salary>3500

21. List out the employees who are not receiving commission.

Ans: Select * from employee where commission is Null

22.   List out the employee id, last name in ascending order based on the employee id.
Ans:  Select employee_id, last_name from employee order by employee_id
23.   List out the employee id, name in descending order based on salary column
Ans : Select employee_id, last_name, salary from employee order by salary desc
24.   list out the employee details according to their last_name in ascending order and salaries in
descending order
Ans:  Select employee_id, last_name, salary from employee order by last_name, salary desc

25. list out the employee details according to their last_name in ascending order and then on
department_id in descending order.
Ans: Select employee_id, last_name, salary from employee order by last_name, department_id desc

26. How many employees who are working in different departments wise in the organization
Ans : Select department_id, count(*), from employee group by department_id

27. List out the department wise maximum salary, minimum salary, average salary of the
employees
Ans: Select department_id, count(*), max(salary), min(salary), avg(salary) from employee group by
department_id

28. List out the job wise maximum salary, minimum salary, average salaries of the employees.
Ans: Select job_id, count(*), max(salary), min(salary), avg(salary) from employee group by
job_id

29. List out the no.of employees joined in every month in ascending order.
Ans: Select to_char(hire_date,’month’)month, count(*) from employee group by
to_char(hire_date,’month’) order by month

30. List out the no.of employees for each month and year, in the ascending order based on the
year, month.
Ans: Select to_char(hire_date,’yyyy’) Year, to_char(hire_date,’mon’) Month, count(*) “No. of
employees” from employee group by to_char(hire_date,’yyyy’), to_char(hire_date,’mon’).
What is a data warehouse?
A data warehouse is a database which,
1.Maintains history of data
2.Contains Integrated data (data from multiple business lines)
3.Contains Heterogeneous data (data from different source formats)
4.Contains Aggregated data
5.Allows only select to restrict data manipulation
6.Data will be stored in de-normalized format

Definition of a data warehouse:

1. Subject-oriented
2. Integrated
3. Non-volatile
4. Time-Variant
Main Usage of a data warehouse:
1. Data Analysis
2. Decision Makings
3. Planning or Forecasting
What is a dimension?
A Dimension table is a table where it contains only non-quantifying data and category of
information which are key for analysis. A dimension table contains primary key and non-
quantifying columns. If the primary key does not exist in source table then surrogate key would
exist.
What are the types of dimension?
Based on what type of data it stores there is two major types dimension table,
1.Confirmed dimension
2.Junk dimension
Based on where it’s being derived there is one dimension category,
3.Degenerated dimension
Based on how frequently the data in the dimension can be divided into 2 types,
4.Rapidly Changing Dimension (RCD)
5.Slowly Changing Dimension (SCD)
What is a fact and what are the types of fact?
A fact is a column or attribute which can be quantifiable or measurable and will be used as key
analysis factor. We can call it as a measure.
Types of Fact:
1. Additive
2. Semi-additive
3. Non-additive
What does a fact table contain?
A table which contains facts is called fact table. Typically a fact table has facts and foreign
keys of dimension tables.
Fact table structure:
Foriegn_key1
Forign_keyN
Fact1
FactN
What are the types of a fact table?
Transactional
The fact table will contain data’s in very detail level without any rollup/aggregation the way how
transactional database stores.
Accumulating
Accumulating refers storing multiple entries for a single record to track the changes throughout
the workflow.
Periodic snapshot
The data will be extracted and loaded for a particular period of a time. It describes what would
be the state of the record in that specific period.
Factless fact table
When a fact table does not have any fact is called Factless fact table. It has only foreign keys of
dimension tables.
Why staging table is required?
1. To reduce the complexity of Job (It will be more complex when we move directly from Source
to Target)
2. To avoid the source database update.
3. To perform any calculations.
4. To perform data cleansing process as per business need.
5. When the data has been corrupted in Target after the load, we can delete the corrupted data in
Target database after that we can just load the unloaded/deleted data alone into Target from
staging database.
What is a surrogate key?
In most of the table, the primary key will be loaded from source schema, but some source table
might not have a primary key in such has by using sequence generator the primary key will be
created, such keys are called Surrogate key.
In terms of usage, there is no difference between these two types of keys. Both differ in the way
of loading primary key loaded from the source table, whereas surrogate key loaded by the
sequence generator.
OLTP vs DW database

OLTP DW

Dedicated database
available for specific Integrated from
subject area or business different business
application applications

It keeps history of data

for analyzing past
It does not keep history performance

It allows user to perform

the below DML
operations (Select, Insert, It allows only Select
Update,Delete) for end users

The main purpose is for

using day to day Purpose is for analysis
transactions and reporting

Data volume will be less Data volume is huge

Data stored in normalized Data stored in de-

format normalized format

Explain about star schemaOperational Data Store (ODS) vs Staging database

ODS Staging

Based on type of load it

It will have limited period stores incremental data or
of data (30 to 90 days) full volume of data

Temporary data storage

and for doing data
cleansing and other
Operational processing calculations
Based on business need,
normally the each
Integrated from different business line would have
business lines dedicated staging
This type schema contains the fact table in center position. As we know that fact table contains
a reference to dimension tables. Then the fact table will be surrounded by dimension tables with
foreign key reference. The dimension table will not have a reference with any other dimension.
Explain about snowflake schema
This type also contains a fact table in center position. The fact table has a reference to dimension
tables. The dimension table will have a reference to another dimension. The data will be stored
in the more normalized form.
What is the difference between star and snowflake?

Star Snowflake

As there is no relationship between

dimensions to other dimensions the Due to multiple links between dimensions
performance will be high. the performance will be low.

The number of joins will be less which The number of joins will be more which
makes query complexity low makes query complexity high

Consider the Project dimension mentioned in

above example it has Role column where the
Role name value will be stored against for The role information is separately stored in a
each project in case of start schema, the size table and the reference will be linked in
of the table will be high Project dimension, it reduces the table size

Data will be stored in de-normalized format Data will be stored in more normalized
in dimension table format in dimension tables
What is data cleansing?
Data cleansing is a process of removing irrelevant and redundant data, and correcting the
incorrect and incomplete data. It is also called as data cleaning or data scrubbing. All
organizations are growing drastically with huge competitions, they take business decisions based
on their past performance data and future projection
What is data masking?
What does data masking mean? Organizations never want to disclose highly confidential
information into all users. All sensitive data will be restricted to access in all environments other
than production. The process of masking/hiding/encrypting sensitive data is called data
masking.
Why Data mart?
1. The data warehouse database contains integrated data for all business lines, for example, a
banking data warehouse contains data for all saving, credit and loan accounts databases.
2. The reporting access level will be given to a person who has authority or needs to see the
comparison of data for all three types of accounts.
3. Meanwhile, a loan account branch manager does not require to see the saving and credit card
details, he wants to see only the past performance of loan account alone.
4. In that case for his analysis, we need to apply data level security to protect saving and credit
information’s data warehouse.
5. At the same time, the number of end users across three accounts will access the same data
warehouse, it will end up in poor performance.
6. To avoid these issues, the separate database will be built on top of data warehouse, named as the
data mart. The access will be given for respective business line resources not for everyone.
What is data purging and archiving?
Data purging means deleting data from a database which crosses the defined retention time.
Archiving means moving the data which crosses the defined retention time to another database
(archival database).
What are the types of SCD?
SCD Type 1
-Modifications will be done on the same record
-Here no history of changes will be maintained
SCD Type 2
-An existing record will be marked as expired with is_active flag or Expired_date column
-This type allows tracking the history of changes
SCD Type 3
-A new value will be tracked as a column
-Here history of changes will be maintained
What type of schema and SCD type used in your project?
In my current project, we are using type2 to keep the history of changes.
There are two major types of data load available based on the load process.

1.Full Load (Bulk Load)

The data loading process when we do it at very first time. It can be referred as Bulk load or Fresh
load.

The job extracts entire volume of data from a source table or file and loading into truncated target
table after applying transformations logics.
In most of the case, it could be a one time job run after then changes alone will be captured as part of
an incremental load. But again based on business need, it will be scheduled to run.

2. Incremental load (Refresh load)

The modified data alone will be updated in target followed by full load. The changes will be captured
by comparing created or modified date against last run date of the job.

The modified data alone extracted from the source, the job will look for changes in the source table
against job run table, if change exists then data will be extracted and that data alone will be updated
in the target without impacting the existing data.

If no changes are available then the ETL job will send a notification with no change available
between source and stage/target message.

There are multiple tools available in the market for ETL process. Tools are developed with different
technologies and offering more features for a smooth end to end data integration. Here are few ETL
tools,

ETL tools in data warehouse

1. Informatica Power center

One of the major tool in worldwide. Majorly this tool is using for ETL, data masking, and data
quality.It has four major components,

1. Repository manager – to add repository and managing folders

2. Designer – creating mappings
3. Workflow manager – creating workflow with task and mappings
4. Workflow monitor – workflow run status tracker

2. Talend Open Studio

One of the open source tool for data integration (ETL) which has been developed in JAVA.
Widely this tool is used for ETL, Data migration, and big data.

3. IBM Datastage
It has four major components,

 Manager – to manage the repository

 Designer – developing job
 Director – job scheduling, job running, and monitoring
 Administrator – creating users and managing projects/folders

4. SQL Server Integration Services (SSIS)

SQL server offers this tool for data integration with wide features of extract, transform and
load.

5. Ab-initio
6. Oracle Data Integrator
7. SAS – Data Integration Studio
8. Business Object Data Integrator
9. Clover ETL
10. Pentaho Data Integration

Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Etl Tools Ab Initio PDF
No ratings yet
Etl Tools Ab Initio PDF
2 pages
KCSE 2022 - Nekta Management System
100% (1)
KCSE 2022 - Nekta Management System
55 pages
Ab Initio Developer Resume - Hire IT People - We Get IT Done
No ratings yet
Ab Initio Developer Resume - Hire IT People - We Get IT Done
7 pages
Teradata Basics Exam - Sample Question Set 1 (Answers in Italic Font)
No ratings yet
Teradata Basics Exam - Sample Question Set 1 (Answers in Italic Font)
5 pages
19 Teradata Interview Questions and Answers For Experienced
No ratings yet
19 Teradata Interview Questions and Answers For Experienced
3 pages
26 Months Exp ETL-ABinitio Vidushi Gupta UPD
No ratings yet
26 Months Exp ETL-ABinitio Vidushi Gupta UPD
4 pages
Interview_Qs_1
100% (1)
Interview_Qs_1
48 pages
Ab Initio Session 4 Introduction To Ab Initio
No ratings yet
Ab Initio Session 4 Introduction To Ab Initio
37 pages
Nagendras - Resume - Nagendra Babu Mokam
No ratings yet
Nagendras - Resume - Nagendra Babu Mokam
4 pages
Unix Commands
No ratings yet
Unix Commands
5 pages
Around 11 Years of Experience in End-To-End Application Delivery Using Abinitio, Talend Along With Skills in
No ratings yet
Around 11 Years of Experience in End-To-End Application Delivery Using Abinitio, Talend Along With Skills in
5 pages
Sequential Datawarehousing
No ratings yet
Sequential Datawarehousing
25 pages
Shared by My Frd Who Recently Aced Interviews (5-7yrs Exp)
100% (1)
Shared by My Frd Who Recently Aced Interviews (5-7yrs Exp)
25 pages
Abinitio L2
No ratings yet
Abinitio L2
5 pages
Newgen Management Trainee: Oracle Technical Orientation Program
No ratings yet
Newgen Management Trainee: Oracle Technical Orientation Program
41 pages
Ab Initio - V1.2
No ratings yet
Ab Initio - V1.2
29 pages
Project Xplanation
No ratings yet
Project Xplanation
4 pages
CV Abinitio
No ratings yet
CV Abinitio
1 page
Abintio 2
No ratings yet
Abintio 2
4 pages
Vaisakh Vijayakumar: Contact Synopsis
No ratings yet
Vaisakh Vijayakumar: Contact Synopsis
3 pages
AB Initio Online Training By: PMR IT Solutions
No ratings yet
AB Initio Online Training By: PMR IT Solutions
9 pages
Sambasivarao Ravipati PH: Sr. Etl Bigdata Qa Consultant Emailid
No ratings yet
Sambasivarao Ravipati PH: Sr. Etl Bigdata Qa Consultant Emailid
7 pages
Akash Kumar: Jagat Narayan Road, Kadam Kuan Patna, Bihar - 800003
No ratings yet
Akash Kumar: Jagat Narayan Road, Kadam Kuan Patna, Bihar - 800003
4 pages
Manoranjanbhola Application Production Support MPHASIS
No ratings yet
Manoranjanbhola Application Production Support MPHASIS
3 pages
Unix Linux Training Course Best Unix Linux Training Institute Hyderabad India Nareshit PDF Free
No ratings yet
Unix Linux Training Course Best Unix Linux Training Institute Hyderabad India Nareshit PDF Free
5 pages
Ab Initio Sample Resume 3
No ratings yet
Ab Initio Sample Resume 3
5 pages
Job Description Abinitio Syntel
No ratings yet
Job Description Abinitio Syntel
1 page
Unix Interview Questions
100% (10)
Unix Interview Questions
16 pages
Mahesh ETL
No ratings yet
Mahesh ETL
5 pages
01 Ab Initio Basics
No ratings yet
01 Ab Initio Basics
146 pages
Informatica Faqs Part1
No ratings yet
Informatica Faqs Part1
19 pages
Most Frequently Asked SQL Interview Questions
No ratings yet
Most Frequently Asked SQL Interview Questions
43 pages
DW Concepts
No ratings yet
DW Concepts
40 pages
CDB Aia - Data Integration - Internship - Detailed Handbook
No ratings yet
CDB Aia - Data Integration - Internship - Detailed Handbook
3 pages
Interview QA On Informatica
No ratings yet
Interview QA On Informatica
21 pages
SCD-2 Implementation To Insert Update and Logically Delete Rows
No ratings yet
SCD-2 Implementation To Insert Update and Logically Delete Rows
8 pages
Sampath Polishetty BigData Consultant
No ratings yet
Sampath Polishetty BigData Consultant
7 pages
Datastage Common Errors
No ratings yet
Datastage Common Errors
18 pages
Interview Quetions
No ratings yet
Interview Quetions
4 pages
NLopt Tutorial - AbInitio
No ratings yet
NLopt Tutorial - AbInitio
13 pages
Abinitio Vs Informatica
No ratings yet
Abinitio Vs Informatica
1 page
Introduction To DBMS
No ratings yet
Introduction To DBMS
154 pages
Co Operating System
No ratings yet
Co Operating System
1 page
Teradata Bteq, Mload, Fload, Fexport, Tpump and Sampling
No ratings yet
Teradata Bteq, Mload, Fload, Fexport, Tpump and Sampling
49 pages
Informatica Questions Answers
No ratings yet
Informatica Questions Answers
85 pages
SSIS Online Training PDF
No ratings yet
SSIS Online Training PDF
4 pages
Surayya's Resume-1
No ratings yet
Surayya's Resume-1
4 pages
Ab Initio Faqs - v03
No ratings yet
Ab Initio Faqs - v03
8 pages
01 Data Warehoudingand Ab Initio Concepts
100% (1)
01 Data Warehoudingand Ab Initio Concepts
76 pages
Abinitio
100% (1)
Abinitio
28 pages
Manoj ETL Resume 3.8
No ratings yet
Manoj ETL Resume 3.8
3 pages
Ab Initio Online Training
No ratings yet
Ab Initio Online Training
2 pages
SQL Interview Question and Answers
No ratings yet
SQL Interview Question and Answers
17 pages
Ravindra Bhargava: Personal Website
No ratings yet
Ravindra Bhargava: Personal Website
2 pages
UNIX For Testers
100% (1)
UNIX For Testers
141 pages
What Is ETL?: ETL Is A Process That Extracts The Data From Different Source Systems, Then
No ratings yet
What Is ETL?: ETL Is A Process That Extracts The Data From Different Source Systems, Then
7 pages
Ab Initio
100% (1)
Ab Initio
128 pages
ETL Testing or Datawarehouse Testing Ult
No ratings yet
ETL Testing or Datawarehouse Testing Ult
13 pages
Etl Real Time Q
No ratings yet
Etl Real Time Q
13 pages
ETL Testing and Datawarehouse Testing
100% (1)
ETL Testing and Datawarehouse Testing
15 pages
Etlpresentation 150731190020 Lva1 App6891
No ratings yet
Etlpresentation 150731190020 Lva1 App6891
36 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
ISO 30042-2008
No ratings yet
ISO 30042-2008
98 pages
MCSA 70-764: Administering An SQL Database Infrastructure
No ratings yet
MCSA 70-764: Administering An SQL Database Infrastructure
8 pages
Curriculum Vitae: Educational Background
No ratings yet
Curriculum Vitae: Educational Background
2 pages
Schemas For Multidimensional Databases
No ratings yet
Schemas For Multidimensional Databases
5 pages
ICT285 Assignment 1
No ratings yet
ICT285 Assignment 1
9 pages
JAVA - JDBC and VB - ADO - NET Comparison
No ratings yet
JAVA - JDBC and VB - ADO - NET Comparison
2 pages
قواعد البیانات النظري محاضرة 4
No ratings yet
قواعد البیانات النظري محاضرة 4
4 pages
TNP Portal Using Web Development and Machine Learning
No ratings yet
TNP Portal Using Web Development and Machine Learning
9 pages
Chapter 3 DB
No ratings yet
Chapter 3 DB
74 pages
E-Commerce-Data-Analysis
No ratings yet
E-Commerce-Data-Analysis
4 pages
CS Record Term 2
No ratings yet
CS Record Term 2
3 pages
ActiveX Promodel 2018[迅合资料]
No ratings yet
ActiveX Promodel 2018[迅合资料]
187 pages
College Management System: By: Monil Paghdar
No ratings yet
College Management System: By: Monil Paghdar
40 pages
Mulesoft Course Syllabus Content Inventateq
No ratings yet
Mulesoft Course Syllabus Content Inventateq
10 pages
SQL Activities
No ratings yet
SQL Activities
8 pages
Document 2380444.1
No ratings yet
Document 2380444.1
3 pages
Mini Project Report: Medical Store Management System
No ratings yet
Mini Project Report: Medical Store Management System
9 pages
Aptitude Test Solution
No ratings yet
Aptitude Test Solution
21 pages
SQL Server Backup
No ratings yet
SQL Server Backup
26 pages
The Research Lifecycleandthe Futureof Research Libraries
No ratings yet
The Research Lifecycleandthe Futureof Research Libraries
11 pages
Orcad 16.3 Install
No ratings yet
Orcad 16.3 Install
1 page
ASSIGNMENT-3 BDA
No ratings yet
ASSIGNMENT-3 BDA
5 pages
Solved Question Paper Questions
No ratings yet
Solved Question Paper Questions
55 pages
Cloud Interview Guide_v4
No ratings yet
Cloud Interview Guide_v4
35 pages
AdvancedInstallationAndConfigurationManual en
No ratings yet
AdvancedInstallationAndConfigurationManual en
87 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
It Resume Formate
No ratings yet
It Resume Formate
3 pages

ETL Interview Questions

Uploaded by

ETL Interview Questions

Uploaded by

What is ETL?

Extract – Extracting the data from source system

What are the transformation types?

Explain the scenarios for testing source to a staging table.

What are tasks available in Informatica?

What does a mapping document contain?

What kind of defects can expect?

-Manual Sampling method – manually compared in sampling basis

Slowly Changing Dimensions (SCD)

Customer Key Name State

In our example, recall we originally have the following table:

Customer Key Name State

Customer Key Name State

When to use Type 1:

In our example, recall we originally have the following table:

Customer Key Name State

Customer Key Name State

In our example, recall we originally have the following table:

Customer Key Name State

Customer Key Name Original State Current State Effective Date

When to use Type 3:

1. What is Data warehouse

2. What is the difference between database and data warehouse

3. What is the difference between dimensional table and fact table

4. What is the difference between Data Mining and Data Warehousing

6. Difference between OLTP and OLAP

8. Why ETL testing is required

Data security is also sometimes part of ETL testing

9. What are ETL tester responsibilities

11. To get the list of tables and views in Database

12. List the details about “SMITH”

13. List out the employees who are working in department 20

15. List out the employees who are working in department 10 or 20

16. Find out the employees who are not working in department 10 or 30

17. List out the employees whose name starts with “S”

21. List out the employees who are not receiving commission.

Definition of a data warehouse:

It keeps history of data

It allows user to perform

The main purpose is for

Data volume will be less Data volume is huge

Data stored in normalized Data stored in de-

Explain about star schemaOperational Data Store (ODS) vs Staging database

Based on type of load it

Temporary data storage

As there is no relationship between

Consider the Project dimension mentioned in

1.Full Load (Bulk Load)

2. Incremental load (Refresh load)

ETL tools in data warehouse

1. Informatica Power center

1. Repository manager – to add repository and managing folders

2. Talend Open Studio

 Manager – to manage the repository

4. SQL Server Integration Services (SSIS)

You might also like