0% found this document useful (0 votes)

134 views21 pages

ETL Testing - InterviewQuestion PDF

This document provides an overview of ETL (Extract, Transform, Load) testing and data warehousing concepts. It includes 11 questions and answers that cover topics like: 1. The definition of a data warehouse as a central repository for integrated data from multiple sources used for reporting and analysis. 2. The differences between a data mart which is limited to a single department, and a data warehouse which encompasses an entire organization. 3. What ETL testing involves, such as validating correct data transformation, load times, and handling of invalid or missing data. 4. Common ETL tools and the purpose of staging areas in the ETL process.

Uploaded by

chiranjeev mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

134 views21 pages

ETL Testing - InterviewQuestion PDF

Uploaded by

chiranjeev mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

ETL Interview Question Bank

Author: - Sheetal Shirke

Version: - Version 0.1
ETL Architecture

Diagram 1

ETL Testing Questions

1. What is Data WareHouse?
Ans:
A data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a
system used for reporting and data. DWs are central repositories of integrated data from
one or more disparate sources. They store current and historical data and are used for
creating analytical reports for knowledge workers throughout the enterprise (Refer Diagram
1)

2. Difference between Dataware House and Data Mart?

Ans:
Refer Diagram 1
 Data mart and data warehousing are tools to assist management to come up with
relevant information about the organization at any point of time
 While data marts are limited for use of a department only, data warehousing applies
to an entire organization
 Data marts are easy to design and use while data warehousing is complex and
difficult to manage
 Data warehousing is more useful as it can come up with information from any
department

3. What is ETL/Data Warehouse Testing?

Ans:
ETL stands for Extract Transformation and Load, It collect the different source data from
Heterogeneous System (DB), Transform the data into Data warehouse (Target)
At the Time of Transformation, Data are first transform to Staging Table (temporary table)
Based on Business rules the data are mapped into target table, this process are manually
mapped / we configure using ETL Tool.
ETL not transformed the Duplicate data
Data Transformation process speed based on Source and Target Data ware House
We need to consider the OLAP(Online Analytic Processing) Structure .Data warehouse Model
Source data consist of (XML, Flat file ,Database.Excel Report.Dataware House
We need to set the validation at time of data transformation like ‘Avoid the ‘NULL’ values in
the table, validate the data type as using Tiny int instead of integer .etc
Based on the user requirement, ETL process starts.

4. Explain what are the ETL testing operations include?

Ans:
 Verify whether the data is transforming correctly according to business
requirements
 Verify that the projected data is loaded into the data warehouse without any
truncation and data loss
 Make sure that ETL application reports invalid data and replaces with default values
 Make sure that data loads at expected time frame to improve scalability and
performance

5. What are the various tools used in ETL.

Ans:
 Cognos Decision Stream
 Oracle Warehouse Builder/Oracle Data Integrator
 Business Objects XI
 SAS business warehouse
 SAS Enterprise ETL server

6. What are staging area in ETL testing and its purpose?

Ans:
Staging area is place where you hold temporary tables on data warehouse server. Staging
tables are connected to work area or fact tables. We basically need staging area to hold the
data, and perform data cleansing and merging, before loading the data into warehouse.

7. What is Primary, Foreign keys and difference among them?

Ans:
Primary Key: A primary key is a field or combination of fields that uniquely identify a record
in a table, so that an individual record can be located without confusion.

Foreign Key: A foreign key (sometimes called a referencing key) is a key used to link two
tables together. Typically you take the primary key field from one table and insert it into the
other table where it becomes a foreign key (it remains a primary key in the original table).
8. What is Surrogate Key?
Ans:
Surrogate key is a substitution for the natural primary key.
It is just a unique identifier or number for each row that can be used for the primary key to
the table. The only requirement for a surrogate primary key is that it is unique for each row
in the table.
Data warehouses typically use a surrogate, (also known as artificial or identity key), key for
the dimension tables primary keys.

9. What is Fact and Dimensions?

Ans:
Fact:
Facts are the metrics that business users would use for making business decisions. Generally,
facts are mere numbers. The facts cannot be used without their dimensions

Dimension:
Dimensions are those attributes that qualify facts. They give structure to the facts.
Dimensions give different views of the facts.

The facts & Dimension tables are linked by means of key called surrogate keys. Each fact
table would have a column surrogate key that would have a corresponding key in the
dimension tables.

10.Types of Dimensions.
Ans:
Slowly Changing Dimensions:
Attributes of a dimension that would undergo changes over time. It depends on the business
requirement whether particular attribute history of changes should be preserved in the data
warehouse. This is called a Slowly Changing Attribute and a dimension containing such an
attribute is called a Slowly Changing Dimension.

Rapidly Changing Dimensions:

A dimension attribute that changes frequently is a Rapidly Changing Attribute. If you don’t
need to track the changes, the Rapidly Changing Attribute is no problem, but if you do need
to track the changes, using a standard Slowly Changing Dimension technique can result in a
huge inflation of the size of the dimension. One solution is to move the attribute to its own
dimension, with a separate foreign key in the fact table. This new dimension is called a
Rapidly Changing Dimension.

Junk Dimensions:
A junk dimension is a single table with a combination of different and unrelated attributes to
avoid having a large number of foreign keys in the fact table. Junk dimensions are often
created to manage the foreign keys created by Rapidly Changing Dimensions.
Inferred Dimensions:
While loading fact records, a dimension record may not yet be ready. One solution is to
generate a surrogate key with Null for all the other attributes. This should technically be
called an inferred member, but is often called an inferred dimension.

Conformed Dimensions:
A Dimension that is used in multiple locations is called a conformed dimension. A conformed
dimension may be used with multiple fact tables in a single database, or across multiple data
marts or data warehouses.

Degenerate Dimensions:
A degenerate dimension is when the dimension attribute is stored as part of fact table, and
not in a separate dimension table. These are essentially dimension keys for which there are
no other attributes. In a data warehouse, these are often used as the result of a drill through
query to analyze the source of an aggregated number in a report. You can use these values
to trace back to transactions in the OLTP system.

Role Playing Dimensions:

A role-playing dimension is one where the same dimension key — along with its associated
attributes — can be joined to more than one foreign key in the fact table. For example, a
fact table may include foreign keys for both Ship Date and Delivery Date. But the same date
dimension attributes apply to each foreign key, so you can join the same dimension table to
both foreign keys. Here the date dimension is taking multiple roles to map ship date as well
as delivery date, and hence the name of Role Playing dimension.

Shrunken Dimensions:
A shrunken dimension is a subset of another dimension. For example, the Orders fact table
may include a foreign key for Product, but the Target fact table may include a foreign key
only for Product Category, which is in the Product table, but much less granular. Creating a
smaller dimension table, with Product Category as its primary key, is one way of dealing with
this situation of heterogeneous grain. If the Product dimension is snowflake, there is
probably already a separate table for Product Category, which can serve as the Shrunken
Dimension.

Static Dimensions:
Static dimensions are not extracted from the original data source, but are created within the
context of the data warehouse. A static dimension can be loaded manually — for example
with Status codes — or it can be generated by a procedure, such as a Date or Time
dimension.

11.Types of Facts.
Ans:
Additive:
Additive facts are facts that can be summed up through all of the dimensions in the fact
table. A sales fact is a good example for additive fact.
Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the dimensions in the fact
table, but not the others.
Eg: Daily balances fact can be summed up through the customers dimension but not through
the time dimension.

Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the dimensions present in
the fact table.
Eg: Facts which have percentages, ratios calculated.

Factless Fact Table:

In the real world, it is possible to have a fact table that contains no measures or facts. These
tables are called “Factless Fact tables”.
Eg: A fact table which has only product key and date key is a factless fact. There are no
measures in this table. But still you can get the number products sold over a period of time.

Based on the above classifications, fact tables are categorized into two:

Cumulative:
This type of fact table describes what has happened over a period of time. For example, this
fact table may describe the total sales by product by store by day. The facts for this type of
fact tables are mostly additive facts. The first example presented here is a cumulative fact
table.

Snapshot:
This type of fact table describes the state of things in a particular instance of time, and
usually includes more semi-additive and non-additive facts. The second example presented
here is a snapshot fact table.

12.What SCD (Slowly Changing Dimensions) and its types?

Ans:

Slowly Changing Dimensions (SCD) - dimensions that change slowly over time, rather than changing
on regular schedule, time-base. In Data Warehouse there is a need to track changes in dimension
attributes in order to report historical data. In other words, implementing one of the SCD types
should enable users assigning proper dimensions attribute value for given date? Example of such
dimensions could be: customer, geography, and employee.

There are many approaches how to deal with SCD. The most popular are:

Type 0 - The passive method

Type 1 - Overwriting the old value
Type 2 - Creating a new additional record
Type 3 - Adding a new column
Type 4 - Using historical table
Type 6 - Combine approaches of types 1, 2, 3 (1+2+3=6)

Type 0 - The passive method. In this method no special action is performed upon dimensional
changes. Some dimension data can remain the same as it was first time inserted, others may be
overwritten.

Type 1 - Overwriting the old value. In this method no history of dimension changes is kept in the
database. The old dimension value is simply overwritten be the new one. This type is easy to
maintain and is often use for data which changes are caused by processing corrections (e.g. removal
special characters, correcting spelling errors).

Before the change:

Customer_ID Customer_Name Customer_Type
1 Cust_1 Corporate

After the change:

Customer_ID Customer_Name Customer_Type
1 Cust_1 Retail

Type 2 - Creating a new additional record. In this methodology all history of dimension changes is
kept in the database. You capture attribute change by adding a new row with a new surrogate key
to the dimension table. Both the prior and new rows contain as attributes the natural key (or other
durable identifier). Also 'effective date' and 'current indicator' columns are used in this method.
There could be only one record with current indicator set to 'Y'. For 'effective date' columns, i.e.
start_date and end_date, the end_date for current record usually is set to value 9999-12-31.
Introducing changes to the dimensional model in type 2 could be very expensive database
operation so it is not recommended to use it in dimensions where a new attribute could be added
in the future.

Before the change:

Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag
1 Cust_1 Corporate 22-07-2010 31-12-9999 Y

After the change:

Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag
1 Cust_1 Corporate 22-07-2010 17-05-2012 N
2 Cust_1 Retail 18-05-2012 31-12-9999 Y

Type 3 - Adding a new column. In this type usually only the current and previous value of dimension
is kept in the database. The new value is loaded into 'current/new' column and the old one into
'old/previous' column. Generally speaking the history is limited to the number of column created
for storing historical data. This is the least commonly needed technique.
Before the change:
Customer_ID Customer_Name Current_Type Previous_Type
1 Cust_1 Corporate Corporate

After the change:

Customer_ID Customer_Name Current_Type Previous_Type
1 Cust_1 Retail Corporate

Type 4 - Using historical table. In this method a separate historical table is used to track all
dimensions attribute historical changes for each of the dimension. The 'main' dimension table keeps
only the current data e.g. customer and customer_history tables.

Current table:
Customer_ID Customer_Name Customer_Type
1 Cust_1 Corporate

Historical table:
Customer_ID Customer_Name Customer_Type Start_Date End_Date
1 Cust_1 Retail 01-01-2010 21-07-2010
1 Cust_1 Other 22-07-2010 17-05-2012
1 Cust_1 Corporate 18-05-2012 31-12-9999

Type 6 - Combine approaches of types 1, 2, 3 (1+2+3=6). In this type we have in dimension table
such additional columns as:
 current_type - for keeping current value of the attribute. All history records for given item
of attribute have the same current value.
 historical_type - for keeping historical value of the attribute. All history records for given
item of attribute could have different values.
 start_date - for keeping start date of 'effective date' of attribute's history.
 end_date - for keeping end date of 'effective date' of attribute's history.
 current_flag - for keeping information about the most recent record.
In this method to capture attribute change we add a new record as in type 2. The
current_type information is overwritten with the new one as in type 1. We store the history
in a historical_column as in type 3.

Customer_I Customer_Nam
Current_Type Historical_Type Start_Date End_Date Current_Flag
D e
01-01- 21-07-
1 Cust_1 Corporate Retail N
2010 2010
22-07- 17-05-
2 Cust_1 Corporate Other N
2010 2012
18-05- 31-12-
3 Cust_1 Corporate Corporate Y
2012 9999

13.What are OLTP and OLAP?

Ans:

OLTP (On-line Transaction Processing) is characterized by a large number of short on-line

transactions (INSERT, UPDATE, and DELETE). The main emphasis for OLTP systems is put on very fast
query processing, maintaining data integrity in multi-access environments and an effectiveness
measured by number of transactions per second. In OLTP database there is detailed and current
data, and schema used to store transactional databases is the entity model (usually 3NF).

OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions.

Queries are often very complex and involve aggregations. For OLAP systems a response time is an
effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP
database there is aggregated, historical data, stored in multi-dimensional schemas (usually star
schema).

The following table summarizes the major differences between OLTP and OLAP system design.

OLAP System
OLTP System
Online Analytical
Online Transaction Processing
Processing
(Operational System)
(Data Warehouse)
Consolidation data; OLAP data
Operational data; OLTPs are the original source of
Source of data comes from the various OLTP
the data.
Databases
Purpose of To help with planning, problem
To control and run fundamental business tasks
data solving, and decision support
Multi-dimensional views of
What the data Reveals a snapshot of ongoing business processes various kinds of business
activities
Inserts and Short and fast inserts and updates initiated by end Periodic long-running batch jobs
Updates users refresh the data
Relatively standardized and simple queries Often complex queries involving
Queries
Returning relatively few records aggregations
Depends on the amount of data
involved; batch data
Processing refreshes and complex queries
Typically very fast
Speed may take many hours; query
speed can be improved by
creating indexes
Larger due to the existence of
Space aggregation structures and
Can be relatively small if historical data is archived
Requirements history data; requires more
indexes than OLTP
Typically de-normalized with
Database
Highly normalized with many tables fewer tables; use of star and/or
Design
snowflake schemas
Instead of regular backups, some
Backup religiously; operational data is critical to
Backup and environments may consider
run the business, data loss is likely to entail
Recovery simply reloading the OLTP data
significant monetary loss and legal liability
as a recovery method

14.What is Grain of Fact OR Fact Granularity?

Ans:
Level of granularity means level of detail that you put into the fact table in a data
warehouse. For example: Based on design you can decide to put the sales data in each
transaction. Now, level of granularity would mean what detail you are willing to put for each
transactional fact. Product sales with respect to each minute or you want to aggregate it up
to minute and put that data.

It also means that we can have (for example) data aggregated for a year for a given product
as well as the data can be drilled down to Monthly, weekly and daily basis...the lowest level
is known as the grain. Going down to details is Granularity

15.STAR Schema and Snowflake Schema?

Ans:
STAR Schema:
Star Schema has single fact table connected to dimension tables and it visualize as a star. In
star schema only one link establishes the relationship between the fact table and any of the
dimension tables.

SNOW-FLAKE Schema:
Snowflake Schema is an extension of the star schema. In this model, dimension tables are
not necessarily fully flattened. Here, very large dimension tables are normalized into
multiple sub dimensional tables. It is used when a dimensional table becomes very big. Also,
every dimension table is associated with sub dimension table and has multiple links.
16.What are Transformation and its types?
Ans:
 Aggregator Transformation
 Application Source Qualifier Transformation
 Custom Transformation
 Data Masking Transformation
 Expression Transformation
 External Procedure Transformation
 Filter Transformation
 HTTP Transformation
 Input Transformation
 Java Transformation
 Joiner Transformation
 Lookup Transformation
 Normaliser Transformation
 Output Transformation
 Rank Transformation
 Reusable Transformation
 Router Transformation
 Sequence Generator Transformation
 Sorter Transformation
 Source Qualifier Transformation
 SQL Transformation
 Stored Procedure Transformation
 Transaction Control Transaction
 Union Transformation
 Unstructured Data Transformation
 Update Strategy Transformation
 XML Generator Transformation
 XML Parser Transformation
 XML Source Qualifier Transformation
 Advanced External Procedure Transformation
 External Transformation

17.Explain what is the difference between OLAP tools and ETL tools?
Ans:
ETL tool is meant for extraction data from the legacy systems and load into specified data
base with some process of cleansing data.
Eg: Informatica, data stage ....etc
OLAP is meant for Reporting purpose. In OLAP data available in Multidimensional model. So
that u can write simple query to extract data from the data base.
Eg: Business objects, Cognos....etc

18.Explain these terms Session, Worklet, Mapplet and Workflow?

Ans:
Mapplet : It arranges or creates sets of transformation
Worklet: It represents a specific set of tasks given
Workflow: It’s a set of instructions that tell the server how to execute tasks
Session: It is a set of parameters that tells the server how to move data from sources to
target

19.What are Data ware House and Data Mining?

Ans:
Data mining is the process of finding patterns in a given data set. These patterns can often
provide meaningful and insightful data to whoever is interested in that data. Data mining is
used today in a wide variety of contexts – in fraud detection, as an aid in marketing
campaigns, and even supermarkets use it to study their consumers.

Data warehousing can be said to be the process of centralizing or aggregating data from
multiple sources into one common repository.

20.What are the basic checks done during ETL Testing?

Ans:
Reconciliation testing: Sometimes, it is also referred as ‘Source to Target count testing’. In
this check, matching of count of records is checked. Although this is not the best way, but in
case of time crunch, it helps.
Eg:

Constraint testing: Here test engineer, maps data from source to target and identify
whether the data is mapped or not. Following are the key checks: UNIQUE, NULL, NOT NULL,
Primary Key, Foreign key, DEFAULT, CHECK

Validation testing (source to target data): It is generally executed in mission critical or

financial projects. Here, test engineer, validates each data point and match source to target
data.

Testing for duplicate check: It is done to ensure that there are no duplicate values for
unique columns. Duplicate data can arise due to any reason like missing primary key etc.
Below is one
Eg:

Testing for attribute check: To check if all attributes of source system are present in target
table.

Logical or transformation testing: To test any logical gaps in the. Here, depending upon the
scenario, following methods can be used: boundary value analysis, equivalence partitioning,
comparison testing, error guessing or sometimes, graph based testing methods. It also
covers testing for look-up conditions.

Incremental and historical data testing: Test to check the data integrity of old & new data
with the addition of new data. It also covers the validation of purging policy related
scenarios.

GUI / navigation testing: To check the navigation or GUI aspects of the front end reports.

Note: In case of ETL or data warehouse testing, re-testing or regression testing is also part of
this effort. Their concept / definition remain the same.
21.What is Active and Passive Transformation? List various types of them.
Ans:
If something is changing for the row in a transformation then it’s an active Transformation.
But what is changing?
 A transformation that changes the number of rows passing through it.
 Changing the order of the rows passing through it also consider in active
transformation.

When a row enters a transformation, Informatica assigns a row number. If this number
change for a row, that's an Active transformation. In other words the nth row coming in will
go as n'th row, and then the transformation is Passive

Filter Transformation:
The number of rows getting in the transformation and coming out is different. And as
specified above it satisfies the criteria for being an active transformation.

But this is not the case, if all the rows in filter transformation will satisfy the True filter
condition then it’s behave as a Passive Transformation.

Aggregator Transformation:
Aggregator transformation is used to get the aggregate value based on the group by ports.
Thus if we have duplicates on the group by columns then it will pass only the distinct
records.

So here also the records coming into the transformation and going out are different and acts
as an active transformation.

But similar to filter transformation here also there can be an exception. That is if there are
no duplicates on the group by ports then all the rows will be passed.

Sorter Transformation:
This is one transformation which can satisfy both the criteria of being an active
transformation.

 The sorter transformation is also provided to output only the distinct rows, where it can
filter the duplicate rows and send the unique set.
 As it sorts the data so the order of the rows changes which satisfy our second criteria.

Union Transformation:
This is a transformation which becomes active only due to second criteria. In the union
transformation the order of rows always not same as it came from source. Unlike Joiner
transformation which restricts the data flow from one source until it gets all the data from
the other source and so the order of the rows doesn’t change. While on other hand in Union
transformation it does not restrict the flow of any data and keeps on passing the data as it
receives. So the order of the rows keeps on changing satisfying the second criteria for being
an active transformation.
Router Transformation:
Router Transformation becomes active

 Due to filter condition we are changing the input rows and output rows.
 For multiple groups if condition satisfies for more than one group the we will send the
data in multiple output transformation. So for example we get 50 rows and we have 4
groups with a condition TRUE then all the groups will pass 50 rows that is total of 200
rows will come out of the Router making it an active transformation.

22.What is erroneous data in ETL testing?

Ans:
Data which fails to satisfy the business rules is called as Erroneous Data. This data is
captured in Error Table during transformation for further analysis.

Database related questions:

1. What are joins and its types?
Ans:
SQL JOIN is a method to retrieve data from two or more database tables.
a. JOIN or INNER JOIN
b. OUTER JOIN
i. LEFT OUTER JOIN or LEFT JOIN
ii. RIGHT OUTER JOIN or RIGHT JOIN
iii. FULL OUTER JOIN or FULL JOIN
c. NATURAL JOIN
d. CROSS JOIN
e. SELF JOIN

JOIN or INNER JOIN:

In this kind of a JOIN, we get all records that match the condition in both the tables, and
records in both the tables that do not match are not reported.

In other words, INNER JOIN is based on the single fact that: ONLY the matching entries in
BOTH the tables SHOULD be listed.

Note that a JOIN without any other JOIN keywords (like INNER, OUTER, LEFT, etc) is an
INNER JOIN. In other words, INNER JOIN is a Syntactic sugar for JOIN

OUTER JOIN:
OUTER JOIN retrieves either, the matched rows from one table and all rows in the other
table OR, all rows in all tables (it doesn't matter whether or not there is a match).

There are three kinds of Outer Join:

LEFT OUTER JOIN or LEFT JOIN:
This join returns all the rows from the left table in conjunction with the matching rows from
the right table. If there are no columns matching in the right table, it returns NULL values.

RIGHT OUTER JOIN or RIGHT JOIN:

This JOIN returns all the rows from the right table in conjunction with the matching rows
from the left table. If there are no columns matching in the left table, it returns NULL values.

FULL OUTER JOIN or FULL JOIN:

This JOIN combines LEFT OUTER JOIN and RIGHT OUTER JOIN. It returns row from either
table when the conditions are met and returns NULL value when there is no match.
In other words, OUTER JOIN is based on the fact that : ONLY the matching entries in ONE OF
the tables (RIGHT or LEFT) or BOTH of the tables(FULL) SHOULD be listed.

Note that `OUTER JOIN` is a loosened form of `INNER JOIN`.

NATURAL JOIN:
It is based on the two conditions:
The JOIN is made on all the columns with the same name for equality.
Removes duplicate columns from the result.
This seems to be more of theoretical in nature and as a result (probably) most DBMS don't
even bother supporting this.

CROSS JOIN:
It is the Cartesian product of the two tables involved. The result of a CROSS JOIN will not
make sense in most of the situations. Moreover, we won’t need this at all (or needs the
least, to be precise).

SELF JOIN:
It is not a different form of JOIN; rather it is a JOIN (INNER, OUTER, etc) of a table to itself.
JOINs based on Operators
Depending on the operator used for a JOIN clause, there can be two types of JOINs. They are
Equi JOIN and Theta JOIN

Equi JOIN:
For whatever JOIN type (INNER, OUTER, etc), if we use ONLY the equality operator (=), then
we say that the JOIN is an EQUI JOIN.

Theta JOIN:
This is same as EQUI JOIN but it allows all other operators like >, <, >= etc.

2. What is difference between delete, truncate and drop command?

Ans:
DELETE:
The DELETE command is used to remove rows from a table. A WHERE clause can be used to
only remove some rows. If no WHERE condition is specified, all rows will be removed. After
performing a DELETE operation you need to COMMIT or ROLLBACK the transaction to make
the change permanent or to undo it. Note that this operation will cause all DELETE triggers
on the table to fire.

TRUNCATE:
TRUNCATE removes all rows from a table. The operation cannot be rolled back and no
triggers will be fired. As such, TRUCATE is faster and doesn't use as much undo space as a
DELETE.
DROP:
The DROP command removes a table from the database. All the tables' rows, indexes and
privileges will also be removed. No DML triggers will be fired. The operation cannot be rolled
back.

3. What are constraints and different types of it?

Ans:
Constraints enable the RDBMS enforce the integrity of the database automatically, without
needing you to create triggers, rule or defaults.

Types of constraints:
 PRIMARY KEY
 UNIQUE
 FOREIGN KEY
 CHECK
 NOT NULL

A PRIMARY KEY constraint is a unique identifier for a row within a database table. Every
table should have a primary key constraint to uniquely identify each row and only one
primary key constraint can be created for each table. The primary key constraints are used
to enforce entity integrity.

A UNIQUE constraint enforces the uniqueness of the values in a set of columns, so no

duplicate values are entered. The unique key constraints are used to enforce entity integrity
as the primary key constraints.

A FOREIGN KEY constraint prevents any actions that would destroy link between tables with
the corresponding data values. A foreign key in one table points to a primary key in another
table. Foreign keys prevent actions that would leave rows with foreign key values when
there are no primary keys with that value. The foreign key constraints are used to enforce
referential integrity.

A CHECK constraint is used to limit the values that can be placed in a column. The check
constraints are used to enforce domain integrity.

A NOT NULL constraint enforces that the column will not accept null values. The not null
constraints are used to enforce domain integrity, as the check constraints.

You can create constraints when the table is created, as part of the table definition by using
the CREATE TABLE statement.
4. What does UNION do? What is the difference between UNION and
UNION ALL?
Ans:
UNION merges the contents of two structurally-compatible tables into a single combined
table. The difference between UNION and UNION ALL is that UNION will omit duplicate
records whereas UNION ALL will include duplicate records.

It is important to note that the performance of UNION ALL will typically be better than
UNION, since UNION requires the server to do the additional work of removing any
duplicates. So, in cases where is is certain that there will not be any duplicates, or where
having duplicates is not a problem, use of UNION ALL would be recommended for
performance reasons.

5. Write a query to find nth highest salary.

Ans:
Select max(SALARY) from
(Select emp.SALARY, ROW_NUMBER() over (order by emp.SALARY desc) rno
From Employee emp) a
Where rno = n;
OR
Select SALARY from
(Select emp.SALARY, ROW_NUMBER() over (order by emp.SALARY desc) rno
From Employee emp) a
Where rno = n order by SALARY desc;

Note: where rno =n , you can replace n by highest number you want to find.

6. Write a query to find duplicate records in table.

Ans:
Select <column_name>, count(*)
From <Table_Name>
Group by <column_name>
Having count(*)>1;

7. Write a query to display unique records from table.

Ans:
Select distinct Table_Name.*
From Table_Name;

8. What is difference between ROW_NUMBER(), RANK(), DENSE_RANK()?

Ans:
ROW_NUMBER():
This function will assign a unique id to each row returned from the query.

RANK():
This function will assign a unique number to each distinct row, but it leaves a gap between
the groups.

DENSE_RANK():
This function is similar to Rank with only difference, this will not leave gaps between groups.

DEPTNO RN R DR
10 1 1 1
10 2 1 1
10 3 1 1
10 4 1 1
10 5 1 1
10 6 1 1
11 7 7 2
11 8 7 2
12 9 9 3

Where
RN = ROW_NUMBER()
R= RANK()
DR= DENSE_RANK()

9. How to retrieve dropped table from database?

Ans:
Using Flashback command.
Eg: FLASHBACK <TABLE_NAME>;

NOTE: https://www.toptal.com/sql/interview-questions, helpful for SQL Queries.

Iti Pdfs
No ratings yet
Iti Pdfs
10 pages
53 SQL Questions-Answers
No ratings yet
53 SQL Questions-Answers
89 pages
JSP-Servlet Interview Questions You'll Most Likely Be Asked
From Everand
JSP-Servlet Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Database Testing Checklist
No ratings yet
Database Testing Checklist
4 pages
General Energy Equation: 2005 Pearson Education South Asia Pte LTD
No ratings yet
General Energy Equation: 2005 Pearson Education South Asia Pte LTD
56 pages
DW Basic Questions
No ratings yet
DW Basic Questions
9 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
John Pual
No ratings yet
John Pual
10 pages
Introduction To Big Data and Hadoop
100% (1)
Introduction To Big Data and Hadoop
29 pages
Accenture Interview Etl Testing
No ratings yet
Accenture Interview Etl Testing
1 page
Datatypes in Hive
No ratings yet
Datatypes in Hive
31 pages
Commonly Asked Power Bi Interview Question
No ratings yet
Commonly Asked Power Bi Interview Question
7 pages
BI Mini Project-Wardrobe Analysis
No ratings yet
BI Mini Project-Wardrobe Analysis
11 pages
Bihar STET Computer Science 2019 Official Paper 2 (English)
No ratings yet
Bihar STET Computer Science 2019 Official Paper 2 (English)
47 pages
Srikanth
No ratings yet
Srikanth
7 pages
Python Placement Ques
No ratings yet
Python Placement Ques
45 pages
11.SAP BO InfoView PDF
No ratings yet
11.SAP BO InfoView PDF
46 pages
Javascript Complete Notes
No ratings yet
Javascript Complete Notes
40 pages
Top 50 Data Warehousing Interview Questions & Answers
No ratings yet
Top 50 Data Warehousing Interview Questions & Answers
8 pages
Azure Data Engineer Mock Interview - Project Special
No ratings yet
Azure Data Engineer Mock Interview - Project Special
11 pages
Modeling With UML: Solutions
No ratings yet
Modeling With UML: Solutions
6 pages
Cognos Interview Questions Scenario Based
No ratings yet
Cognos Interview Questions Scenario Based
1 page
HDFS Exercises - Basic
No ratings yet
HDFS Exercises - Basic
5 pages
Tableau Interview Questions 1
No ratings yet
Tableau Interview Questions 1
22 pages
W Purch Cost F
100% (1)
W Purch Cost F
25 pages
Intro to ETL
No ratings yet
Intro to ETL
43 pages
Bhargavi Resume
No ratings yet
Bhargavi Resume
3 pages
Hive Mock Test
100% (1)
Hive Mock Test
6 pages
Tableau Interview Questions
No ratings yet
Tableau Interview Questions
31 pages
Learn PySpark: Build Python-Based Machine Learning and Deep Learning Models 1st Edition Pramod Singh All Chapter Instant Download
100% (4)
Learn PySpark: Build Python-Based Machine Learning and Deep Learning Models 1st Edition Pramod Singh All Chapter Instant Download
52 pages
OLTP
No ratings yet
OLTP
12 pages
SQL Assignment
50% (2)
SQL Assignment
3 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
Etl VS Elt
No ratings yet
Etl VS Elt
8 pages
Python Oops Function
No ratings yet
Python Oops Function
14 pages
Big Data Hadoop Interview Questions and Answers
100% (1)
Big Data Hadoop Interview Questions and Answers
25 pages
Migrating SSIS Projects and Parameters
No ratings yet
Migrating SSIS Projects and Parameters
14 pages
Hadoop Hive Cheat Sheet - Developer Guide For SQL To HiveQL - Qubole
No ratings yet
Hadoop Hive Cheat Sheet - Developer Guide For SQL To HiveQL - Qubole
19 pages
Untitled
No ratings yet
Untitled
13 pages
Python PDF
No ratings yet
Python PDF
6 pages
Bihar STET PGT (Computer Science) Official Paper-II (Held On_ 12 Sept, 2023 Shift 2)
No ratings yet
Bihar STET PGT (Computer Science) Official Paper-II (Held On_ 12 Sept, 2023 Shift 2)
36 pages
Python Full Stack
0% (1)
Python Full Stack
6 pages
Informatica 9.x Course Curriculum
No ratings yet
Informatica 9.x Course Curriculum
8 pages
Please Can Someone List What Are All The Testing Types Performed On ETL/DW Testing?
No ratings yet
Please Can Someone List What Are All The Testing Types Performed On ETL/DW Testing?
3 pages
Top SQL Interview Questions
No ratings yet
Top SQL Interview Questions
35 pages
Num Py
No ratings yet
Num Py
46 pages
SQL - Ineuron - Final
No ratings yet
SQL - Ineuron - Final
72 pages
Detailed Curriculum PDF
No ratings yet
Detailed Curriculum PDF
6 pages
Prathap Reddy.C: Rofessional Ummary
No ratings yet
Prathap Reddy.C: Rofessional Ummary
4 pages
Mongodb Cheat Sheet
No ratings yet
Mongodb Cheat Sheet
10 pages
4 Data-Testing PDF
No ratings yet
4 Data-Testing PDF
79 pages
Tech C, C#,java
No ratings yet
Tech C, C#,java
471 pages
DWH Basics by Suresh
No ratings yet
DWH Basics by Suresh
17 pages
Node-Js-React-Js-Django - LAB
No ratings yet
Node-Js-React-Js-Django - LAB
38 pages
SQL Questions
No ratings yet
SQL Questions
4 pages
Ds Material PDF
No ratings yet
Ds Material PDF
243 pages
Python Syllbus by Lokesh
No ratings yet
Python Syllbus by Lokesh
5 pages
Etl Resume
No ratings yet
Etl Resume
2 pages
Data Stage
No ratings yet
Data Stage
10 pages
dwh
No ratings yet
dwh
25 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
7 pages
Types of Dimensions - Data Warehouse
No ratings yet
Types of Dimensions - Data Warehouse
8 pages
Solved Interview Questions-1
No ratings yet
Solved Interview Questions-1
41 pages
Data Analysis & Business Intelligence: Ombir Rathee
100% (1)
Data Analysis & Business Intelligence: Ombir Rathee
24 pages
What Is The Difference Between Inner and Outer Join? Explain With Example
No ratings yet
What Is The Difference Between Inner and Outer Join? Explain With Example
14 pages
QuerySurge Models Mappings Document
100% (1)
QuerySurge Models Mappings Document
28 pages
Asphalt RRL
No ratings yet
Asphalt RRL
11 pages
Lab Manual (Fdy-2)
No ratings yet
Lab Manual (Fdy-2)
57 pages
Lab 1 Introduction of Elecrical Machines
100% (1)
Lab 1 Introduction of Elecrical Machines
10 pages
Standard Height of Bathroom Fittings
No ratings yet
Standard Height of Bathroom Fittings
2 pages
IPTV Presentation
100% (1)
IPTV Presentation
27 pages
Brick Foundation & Brick Wall Defects & Failures
No ratings yet
Brick Foundation & Brick Wall Defects & Failures
9 pages
Hardness Testing
No ratings yet
Hardness Testing
3 pages
Network Analysis Assignments
No ratings yet
Network Analysis Assignments
5 pages
Gremlin Perform02
No ratings yet
Gremlin Perform02
7 pages
Differential Amplifier
No ratings yet
Differential Amplifier
4 pages
mini-controlador-de-ph-BL931700-manual
No ratings yet
mini-controlador-de-ph-BL931700-manual
19 pages
The Tinkerers: The Amateurs, DIYers and Inventors Who Make America Great
No ratings yet
The Tinkerers: The Amateurs, DIYers and Inventors Who Make America Great
4 pages
2 Way Poppet Type Solenoid Valve
No ratings yet
2 Way Poppet Type Solenoid Valve
16 pages
Transmission Method: What Is Pulse Oximeter?
No ratings yet
Transmission Method: What Is Pulse Oximeter?
3 pages
MDSReport 149489949 PDF
No ratings yet
MDSReport 149489949 PDF
0 pages
Is.15369.2003 Vault Room
100% (1)
Is.15369.2003 Vault Room
14 pages
GT Report Open Cutting at Ch. 34 KM Mizoram
No ratings yet
GT Report Open Cutting at Ch. 34 KM Mizoram
15 pages
Bioc 211
No ratings yet
Bioc 211
6 pages
KAT - 80010303v02 Spec
No ratings yet
KAT - 80010303v02 Spec
2 pages
Lazydax: Learn Dax From Examples
No ratings yet
Lazydax: Learn Dax From Examples
33 pages
5090B FrontShovelEngels
No ratings yet
5090B FrontShovelEngels
3 pages
Case Study To Find Tank Nozzle Stiffness, Flexibility and Allowable Loads by API650 Using PASS START PROF
100% (1)
Case Study To Find Tank Nozzle Stiffness, Flexibility and Allowable Loads by API650 Using PASS START PROF
15 pages
Electrostatic Generator
100% (1)
Electrostatic Generator
56 pages
Time Series and Stochastic Processes
No ratings yet
Time Series and Stochastic Processes
46 pages
Interesting Facts About Computer
No ratings yet
Interesting Facts About Computer
2 pages
Cec 400 2015 037 CMF
100% (1)
Cec 400 2015 037 CMF
289 pages
Setting I T Protection For Direct PWM Commutated Motors
No ratings yet
Setting I T Protection For Direct PWM Commutated Motors
2 pages
22 MM Push Button Specifications: Technical Data
No ratings yet
22 MM Push Button Specifications: Technical Data
124 pages
High-Rise Buildings Raffles City in Hangzhou China
No ratings yet
High-Rise Buildings Raffles City in Hangzhou China
16 pages

ETL Testing - InterviewQuestion PDF

Uploaded by

ETL Testing - InterviewQuestion PDF

Uploaded by

ETL Interview Question Bank

Author: - Sheetal Shirke

ETL Testing Questions

2. Difference between Dataware House and Data Mart?

3. What is ETL/Data Warehouse Testing?

4. Explain what are the ETL testing operations include?

5. What are the various tools used in ETL.

6. What are staging area in ETL testing and its purpose?

7. What is Primary, Foreign keys and difference among them?

9. What is Fact and Dimensions?

Rapidly Changing Dimensions:

Role Playing Dimensions:

Factless Fact Table:

12.What SCD (Slowly Changing Dimensions) and its types?

Type 0 - The passive method

Before the change:

After the change:

Before the change:

After the change:

After the change:

13.What are OLTP and OLAP?

OLTP (On-line Transaction Processing) is characterized by a large number of short on-line

OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions.

14.What is Grain of Fact OR Fact Granularity?

15.STAR Schema and Snowflake Schema?

18.Explain these terms Session, Worklet, Mapplet and Workflow?

19.What are Data ware House and Data Mining?

20.What are the basic checks done during ETL Testing?

Validation testing (source to target data): It is generally executed in mission critical or

22.What is erroneous data in ETL testing?

Database related questions:

JOIN or INNER JOIN:

There are three kinds of Outer Join:

RIGHT OUTER JOIN or RIGHT JOIN:

FULL OUTER JOIN or FULL JOIN:

Note that `OUTER JOIN` is a loosened form of `INNER JOIN`.

2. What is difference between delete, truncate and drop command?

3. What are constraints and different types of it?

A UNIQUE constraint enforces the uniqueness of the values in a set of columns, so no

5. Write a query to find nth highest salary.

6. Write a query to find duplicate records in table.

7. Write a query to display unique records from table.

8. What is difference between ROW_NUMBER(), RANK(), DENSE_RANK()?

9. How to retrieve dropped table from database?

NOTE: https://www.toptal.com/sql/interview-questions, helpful for SQL Queries.

You might also like