ETL Testing Interview Questions

ETL Testing Interview Questions and Answers

1) What is ETL?
In data warehousing architecture, ETL is an important component, which manages the data for any business process. ETL stands for Extract, Transform and Load.  Extract does the process of reading data from a database.  Transform does the converting of data into a format that could be appropriate for reporting and analysis. While, load does the process of writing the data into the target database.

2) Why ETL testing is required?
•    To keep a check on the Data which are being transferred from one system to the other.
•    To keep a track on the efficiency and speed of the process.
•    To be well acquainted with the ETL process before it gets implemented into your business and production.

3) What is a three-tier data warehouse?
Most data warehouses are considered to be a three-tier system. This is essential to their structure. The first layer is where the data lands. This is the collection point where data from outside sources is compiled. The second layer is known as the ‘integration layer.’ This is where the data that has been stored is transformed to meet company needs. The third layer is called the ‘dimension layer,’ and is where the transformed information is stored for internal use.

4) Mention what are the types of data warehouse applications?
The types of data warehouse applications are
•    Info Processing
•    Analytical Processing
•    Data Mining

5) What are ETL tester responsibilities?
•    Requires in depth knowledge on the ETL tools and processes.
•    Needs to write the SQL queries for the various given scenarios during the testing phase.
•    Should be able to carry our different types of tests such as Primary Key, defaults and keep a check on the other functionality of the ETL process.
•    Quality Check

6) What are the types of data warehouse?
There are mainly three types of Data Warehouse they are,
•    Enterprise Data Warehouse
•    Operational data store
•    Data Mart

7) What is a Data mart?
A Data Mart is a subset of a data warehouse that can provide data for reporting and analysis on a section, unit or a department like Sales Dept., HR Dept., etc. The Data Mart are sometimes also called as HPQS (Higher Performance Query Structure).

8) What is the difference between data mining and data warehousing?
Data warehousing comes before the mining process. This is the act of gathering data from various exterior sources and organizing it into one specific location that is the warehouse. Data mining is when that data is analyzed and used as information for making decisions.

9) Explain what is data purging?
Data purging is a process of deleting data from data warehouse. It deletes junk data’s like rows with null values or extra spaces.

10) What is partitioning?
Partitioning is when an area of data storage is sub-divided to improve performance. Think of it as an organizational tool. If all your collected data is in one large space without organization the digital tools used for analyzing it will have a more difficult time finding the information in order to analyze it. Partitioning your warehouse will create an organizational structure that will make locating and analyzing easier and faster.

11) What are some types of partitioning?
Two types of partitioning are round-robin partitioning and Hash Partitioning.
•    Round-robin partitioning is when the data is evenly distributed among all partitions. This means that the number of rows in each partition is relatively the same.
•    Hash partitioning is when the server applies a hash function in order to create partition keys to group data.

12) What are the various tools used in ETL?
•    Cognos Decision Stream
•    Oracle Warehouse Builder
•    Business Objects XI
•    SAS business warehouse
•    SAS Enterprise ETL server.

13) What is fact?
It is a central component of a multi-dimensional model which contains the measures to be analyzed.  Facts are related to dimensions.

14) What are the types of Facts?
The types of Facts are as follows.
•    Additive Facts: A Fact which can be summed up for any of the dimension available in the fact table.
•    Semi-Additive Facts: A Fact which can be summed up to a few dimensions and not for all dimensions available in the fact table.
•    Non-Additive Fact: A Fact which cannot be summed up for any of the dimensions available in the fact table.

15) What are Fact Tables?
A Fact Table is a table that contains summarized numerical (facts) and historical data. This Fact Table has a foreign key-primary key relation with a dimension table. The Fact Table maintains the information in 3rd normal form.
A star schema is defined is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.

16) What are the types of Fact Tables?
The types of Fact Tables are,
•    Cumulative Fact Table: This type of fact tables generally describes what was happened over the period of time. They contain additive facts.
•    Snapshot Fact Table: This type of fact table deals with the particular period of time. They contain non-additive and semi-additive facts.

17) What is Grain of Fact?
The Grain of Fact is defined as the level at which the fact information is stored in a fact table. This is also called as Fact Granularity or Fact Event Level.

18) What is Fact less Fact table?
The Fact Table which does not contains facts is called as Fact Table. Generally when we need to combine two data marts, then one data mart will have a fact less fact table and other one with common fact table.

19) What are Dimensions?
Dimensions are categories by which summarized data can be viewed. For example a profit Fact table can be viewed by a time dimension.

20) What are Confirmed Dimensions?
The Dimensions which are reusable and fixed in nature Example customer, time, geography dimensions.

21) Explain what is transformation?
A transformation is a repository object which generates, modifies or passes data.  Transformation are of two types Active and Passive

22) What are active and passive transformations?
In an active transformation, the number of rows that is created as output can be changed once a transformation has occurred. This does not happen during a passive transformation; the information passes through the same number given to it as input.

23) Explain the use of Lookup Transformation?
The Lookup Transformation is useful for
•    Getting a related value from a table using a column value
•    Update slowly changing dimension table
•    Verify whether records already exist in the table

24) What is the difference between dimensional table and fact table?

A dimension table consists of tuples of attributes of the dimension. A fact table can be thought of as having tuples, one per a recorded fact. This fact contains some measured or observed variables and identifies them with pointers to dimension tables.

25) What is OLAP?

OLAP stands for Online Analytical Processing. It uses database tables (Fact and Dimension tables) to enable multidimensional viewing, analysis and querying of large amount of data.

26) What is OLTP?

OLTP stands for Online Transaction Processing Except data warehouse databases the other databases are OLTPs. These OLTP uses normalized schema structure. These OLTP databases are designed for recording the daily operations and transactions of a business.

27) What is Operational Data Store [ODS]?

It is a collection of integrated databases designed to support operational monitoring. Unlike the OLTP databases, the data in the ODS are integrated, subject oriented and enterprise wide data.

28) What are Measures?

Measures are numeric data based on columns in a fact table.

29) Explain what are Cubes and OLAP Cubes?

Cubes are data processing units comprised of fact tables and dimensions from the data warehouse.  It provides multi-dimensional analysis.
OLAP stands for Online Analytics Processing, and OLAP cube stores large data in multi-dimensional form for reporting purposes.  It consists of facts called as measures categorized by dimensions.

30) What are Virtual Cubes?

These are combination of one or more real cubes and require no disk space to store them. They store only definition and not the data.

31) What is Bus Schema?

For the various business process to identify the common dimensions, BUS schema is used.  It comes with a conformed dimensions along with a standardized definition of information

32) What is a Star schema design?

A Star schema is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.

33) What is Snow Flake schema Design?

In a Snow Flake design the dimension table (de-normalized table) will be further divided into one or more dimensions (normalized tables) to organize the information in a better structural format. To design snow flake we should first design star schema design.

34) Explain what are Schema Objects?

Schema objects are the logical structure that directly refer to the databases data.  Schema objects includes tables, views, sequence synonyms, indexes, clusters, functions packages and database links

35) Explain what staging area is and what is the purpose of a staging area?

Data staging is an area where you hold the data temporary on data warehouse server.  Data staging includes following steps
•    Source data extraction and data transformation ( restructuring )
•    Data transformation (data cleansing, value transformation )
•    Surrogate key assignments

36) Explain ETL Mapping Sheets?

ETL mapping sheets contains all the required information from the source file including all the rows and columns. This sheet helps the experts in writing the SQL queries for the ETL tools testing.

37) What is Denormalization?

Denormalization means a table with multi duplicate key. The dimension table follows Denormalization method with the technique of surrogate key.

38) What is Surrogate Key?

A Surrogate Key is a sequence generated key which is assigned to be a primary key in the system (table).

39) Explain these terms Mapping, Session, Work let, Mapplet and Workflow?

•    Mapping is the movement of data from the source to the destination.
•    Session is the parameters set to instruct the data on during the above movement.
•    Work let represents a specific set of tasks given.
•    A workflow is a set of instructions that tell the server how to execute tasks.
•    A mapplet creates or arranges sets of transformation.

40) List few ETL bugs?

Calculation Bug, User Interface Bug, Source Bugs, Load condition bug, ECP related bug are some of the ETL bugs.


Post a Comment