0% found this document useful (0 votes)
9 views

DAM UNIT - IV

This document discusses data warehousing, including its purpose, components, and differences from data lakes and databases. It highlights the importance of data warehouses for business intelligence, reporting, and decision-making, while also explaining the roles of data marts and data lakes. Key benefits of data warehousing include data integration, historical analysis, improved data quality, and support for complex queries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

DAM UNIT - IV

This document discusses data warehousing, including its purpose, components, and differences from data lakes and databases. It highlights the importance of data warehouses for business intelligence, reporting, and decision-making, while also explaining the roles of data marts and data lakes. Key benefits of data warehousing include data integration, historical analysis, improved data quality, and support for complex queries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

BA III SEM UNIT - IV DAM

UNIT – IV
DATA WAREHOUSING
Data Warehousing: Identify purpose of data warehousing - Identify between key components
of a data warehouse - Distinguish between data warehouses and data lakes - Determine the
role of different warehousing techniques - Data Warehousing Tools: Differentiate between
utility of relational DW, cubes, and in-memory scenarios - Compare techniques for data
integration with regards to warehousing - Use warehousing tools - Use integration tools for
warehousing.

A data lake is a centralized repository that allows you to store all your structured and unstructured
data at any scale. You can store your data as-is, without having to first structure the data, and run
different types of analytics—from dashboards and visualizations to big data processing, real-time
analytics, and machine learning to guide better decisions.

Q) What about Data warehouse , Data Marts, Data lakes and databases?
How are they different?
A) There are a lot of data sorting, storage, and accessing options available. Which will benefit your
business most depends on what you use your data for.
Data ware house : A data warehouse is a centralized and integrated repository that stores large
volumes of structured and sometimes unstructured data from various sources within an organization.
The data stored in a data warehouse is used for analytical and reporting purposes rather than
operational transactions. It is designed to support complex querying, data analysis, and reporting,
providing a comprehensive view of an organization's historical and current data.
Data mart. As already indicated, a data mart is part of a data warehouse, generally geared towards
giving a group, team, or line of business and the specific information they require. Also called mini-
data warehouses, they both improve response time within the already low-latency data warehouse
and ensure queries are sufficiently focused to be useful to end users.

NEELIMA 1
BA III SEM UNIT - IV DAM
Data lake. Data lakes are simply repositories filled with unorganized, unclassified data; they’re
generally helpful for collecting data the value of which isn’t yet known. Data lake data may not be
cleansed, corrected, or deduplicated; useful for applications like machine learning, data lake analytics
queries can produce poor results for users looking for usable, trustworthy business insights.
Database. Databases log frequent transactions and provide quick access to specific, repetitive
business transactions. While designed to be good at receiving data, databases simply aren’t built to be
sources from which to pull insights.
Data cube: A data cube in a data warehouse is a multidimensional structure used to store
data. The data cube was initially planned for the OLAP tools that could easily access the
multidimensional data. But the data cube can also be used for data mining.
Q) What is Data warehouse?
A) A data warehouse is a data management system that stores current and historical data from
multiple sources in a business friendly manner for easier insights and reporting.
Data warehouses are typically used for business intelligence (BI), reporting and data analysis.
Data warehouses make it possible to quickly and easily analyze business data uploaded from
operational systems such as point-of-sale systems, inventory management systems, or marketing or
sales databases. Data may pass through an operational data store and require data cleansing to ensure
data quality before it can be used in the data warehouse for reporting.
Data warehouses are used in BI, reporting, and data analysis to extract and summarize data from
operational databases. Information that is difficult to obtain directly from transactional databases can
be obtained via data warehouses. For example, management wants to know the total revenues
generated by each salesperson on a monthly basis for each product category. Transactional databases
may not capture this data, but the data warehouse does.

Benefits of data warehouses


• Consolidate data obtained from many sources; acting as a single point of access for all
data, rather than requiring users to connect to dozens or even hundreds of individual data stores.
• Historical intelligence. A data warehouse integrates data from many sources to show historic
trends.
• Separate analytics processing from transactional databases, improving the
performance of both systems.
• Data quality, consistency, and accuracy. Data warehouses use a standard set of semantics
around data, including consistency in naming conventions, codes for various product types,
languages, currencies, and so on.

Need for Data Warehouse

Data Warehouse is needed for the following reasons:

NEELIMA 2
BA III SEM UNIT - IV DAM

1) Business User: Business users require a data warehouse to view summarized data from the past.
Since these people are non-technical, the data may be presented to them in an elementary form.
2) Store historical data: Data Warehouse is required to store the time variable data from the past.
This input is made to be used for various purposes.
3) Make strategic decisions: Some strategies may be depending upon the data in the data
warehouse. So, data warehouse contributes to making strategic decisions.
4) For data consistency and quality: Bringing the data from different sources at a commonplace,
the user can effectively undertake to bring the uniformity and consistency in data.
5) High response time: Data warehouse has to be ready for somewhat unexpected loads and types
of queries, which demands a significant degree of flexibility and quick response time.

Q) Explain Purpose of Data warehouse in organizations.


A) The primary purpose of a data warehouse is to enable companies to access and analyse all of their
data to derive the most accurate business insights and forecasting models.

The purpose of data warehousing is to provide a centralized and integrated repository for storing,
managing, and analyzing large volumes of structured and sometimes unstructured data from
various sources within an organization. Data warehousing serves several important purposes:

• Data Integration: Organizations often have data stored in different systems, databases,
and formats. Data warehousing allows for the integration of data from multiple sources into a
single, unified structure. This integration facilitates cross-functional analysis and reporting
by providing a consistent view of the data.
• Historical Analysis: Data warehouses store historical data over time, allowing
organizations to analyze trends, patterns, and changes in business operations. This historical
context is crucial for making informed decisions and understanding the evolution of the
organization.
• Business Intelligence and Reporting: Data warehouses provide a platform for
generating reports, dashboards, and visualizations that offer insights into business

NEELIMA 3
BA III SEM UNIT - IV DAM
performance, customer behavior, and market trends. These insights support data-driven
decision-making at various levels of the organization.
• Complex Queries: Data warehouses are optimized for complex queries and analytical
processing. Users can perform advanced analytics, such as data mining, statistical analysis,
and predictive modeling, to extract valuable insights from the data.
• Data Cleansing and Transformation: Before data is loaded into a data warehouse, it
often undergoes cleansing, transformation, and enrichment processes to ensure data
accuracy and consistency. This improves the quality of the data available for analysis.
• Support for Decision-Making: Data warehouses provide decision-makers with a
comprehensive view of the organization's data, enabling them to make informed choices that
align with business goals and strategies.
• Scalability: Data warehouses are designed to handle large volumes of data efficiently. As an
organization's data needs grow, a well-designed data warehouse can scale to accommodate
the increased data load.
• Data Security and Governance: Centralized data storage in a data warehouse can
improve data security and governance by providing a controlled environment for data access
and ensuring compliance with regulations and policies.
• Operational Performance: By separating analytical workloads from operational
databases, data warehousing reduces the impact on transactional systems, allowing them to
focus on core operations without being burdened by resource-intensive analytical queries.
• Support for Different User Roles: Data warehouses support different user roles, such as
executives, analysts, and business users, by providing them with tailored access to the data
and tools they need for their specific tasks.

Overall, the primary purpose of data warehousing is to enable organizations to harness the power of
their data for strategic decision-making, business insights, and improved operational efficiency.

Q) Explain briefly about components of Data warehouse.


Or
Explain the Architecture of Data Warehouse with diagram.

A) The major components of a data warehouse are as follows

NEELIMA 4
BA III SEM UNIT - IV DAM

Data Warehouse Components


The most important data warehouse components and their roles in the system.

ETL Tools

ETL stands for Extract, Transform, and Load. The staging layer uses ETL tools to extract the
needed data from various formats and checks the quality before loading it into the data warehouse.

The data coming from the data source layer can come in a variety of formats. Before merging all the
data collected from multiple sources into a single database, the system must clean and organize the
information.

The Database

The most crucial component and the heart of each architecture is the database. The warehouse is
where the data is stored and accessed.

When creating the data warehouse system, you first need to decide what kind of database you want
to use.

There are four types of databases you can choose from:

1. Relational databases (row-centered databases).


2. Analytics databases (developed to sustain and manage analytics).
3. Data warehouse applications (software for data management and hardware for storing
data offered by third-party dealers).
4. Cloud-based databases (hosted on the cloud).

NEELIMA 5
BA III SEM UNIT - IV DAM
Data

Once the system cleans and organizes the data, it stores it in the data warehouse. The data
warehouse represents the central repository that stores metadata, summary data, and raw data
coming from each source.

• Metadata is the information that defines the data. Its primary role is to simplify working
with data instances. It allows data analysts to classify, locate, and direct queries to the
required data.
• Summary data is generated by the warehouse manager. It updates as new data loads into
the warehouse. This component can include lightly or highly summarized data. Its main role
is to speed up query performance.
• Raw data is the actual data loading into the repository, which has not been processed.
Having the data in its raw form makes it accessible for further processing and analysis.

Access Tools

Users interact with the gathered information through different tools and technologies. They can
analyze the data, gather insight, and create reports.

Some of the tools used include:

• Reporting tools. They play a crucial role in understanding how your business is doing and
what should be done next. Reporting tools include visualizations such as graphs and charts
showing how data changes over time.
• OLAP tools. Online analytical processing tools which allow users to analyze
multidimensional data from multiple perspectives. These tools provide fast processing and
valuable analysis. They extract data from numerous relational data sets and reorganize it into
a multidimensional format.
• Data mining tools. Examine data sets to find patterns within the warehouse and the
correlation between them. Data mining also helps establish relationships when analyzing
multidimensional data.

NEELIMA 6
BA III SEM UNIT - IV DAM
Data Marts

A Data Mart is a subset of a directorial information store, generally oriented to a specific purpose or
primary data subject which may be distributed to provide business needs. Data Marts are analytical
record stores designed to focus on particular business functions for a specific community within an
organization. Data marts are derived from subsets of data in a data warehouse, though in the bottom-
up data warehouse design methodology, the data warehouse is created from the union of
organizational data marts.

The fundamental use of a data mart is Business Intelligence (BI) applications. BI is used to
gather, store, access, and analyze record. It can be used by smaller businesses to utilize the data they
have accumulated since it is less expensive than implementing a data warehouse.

Data marts allow you to have multiple groups within the system by segmenting the data in the
warehouse into categories. It partitions data, producing it for a particular user group.

For instance, you can use data marts to categorize information by departments within the company.

NEELIMA 7
BA III SEM UNIT - IV DAM

Q) What is Data lake?


A) A data lake is a centralized repository designed to store, process, and secure large
amounts of structured, semistructured, and unstructured data. It can store data in its
native format and process any variety of it, ignoring size limits.

Purpose :

A data lake is a comprehensive way to explore, refine, and analyze petabytes of


information constantly arriving from multiple data sources. One petabyte of data is
equivalent to 1 million gigabytes: about 500 billion pages of standard, printed text or
58,333 high-definition, two-hour movies. Data lakes are for business users to explore and
analyze petabytes of data.
FEATURES
The characteristics of data lakes that distinguishes them from other types of big data
storage are:

• Open to all data, regardless of type or source


• Data is stored in its original raw, untransformed state
• Data is transformed only when provided for analysis based on matching query criteria
DATA LAKE BENEFITS
The source- and format-agnostic nature of data stored in a data lake offers several
benefits for businesses, including:

• Flexibility, as data scientists can quickly and easily configure queries


• Accessibility, as all users can access all data
• Affordability, as many data lake technologies are open source
• Compatibility with most data analytics methods
• Comprehensive, combining data from all of an enterprise’s data sources including IoT

Q) Distinguish between Data Warehouses and Data lakes


A) Difference between Data Lake and Data Warehouse

NEELIMA 8
BA III SEM UNIT - IV DAM
Here are key differences between data lakes vs data warehouse:

Parameters Data Lake Data Warehouse


A data warehouse will consist of data
In the data lake, all data is kept
that is extracted from transactional
irrespective of the source and its
systems or data which consists of
Storage structure. Data is kept in its raw form.
quantitative metrics with their
It is only transformed when it is ready
attributes. The data is cleaned and
to be used.
transformed
Big data technologies used in data lakes Data warehouse concept, unlike big
History
is relatively new. data, had been used for decades.
Captures all kinds of data and
Captures structured information and
Data structures, semi-structured and
organizes them in schemas as defined
Capturing unstructured in their original form
for data warehouse purposes
from source systems.
Data lakes can retain all data. This
includes not only the data that is in use In the data warehouse development
Data
but also data that it might use in the process, significant time is spent on
Timeline
future. Also, data is kept for all time, to analyzing various data sources.
go back in time and do an analysis.
Data lake is ideal for the users who
indulge in deep analysis. Such users The data warehouse is ideal for
include data scientists who need operational users because of being
Users
advanced analytical tools with well structured, easy to use and
capabilities such as predictive modeling understand.
and statistical analysis.
Data storing in big data technologies
Storage Storing data in Data warehouse is
are relatively inexpensive then storing
Costs costlier and time-consuming.
data in a data warehouse.
Data lakes can contain all data and data
Data warehouses can provide insights
types; it empowers users to access data
Task into pre-defined questions for pre-
prior the process of transformed,
defined data types.
cleansed and structured.
Data lakes empower users to access
data before it has been transformed, Data warehouses offer insights into
Processing cleansed and structured. Thus, it allows pre-defined questions for pre-defined
time users to get to their result more quickly data types. So, any changes to the data
compares to the traditional data warehouse needed more time.
warehouse.
Typically schema is defined before
Typically, the schema is defined after
data is stored. Requires work at the
Position of data is stored. This offers high agility
start of the process, but offers
Schema and ease of data capture but requires
performance, security, and
work at the end of the process
integration.
Data warehouse uses a traditional
Data
Data Lakes use of the ELT (Extract Load Transform)
ETL (Extract Transform Load)
processing
process.
process.
Data is kept in its raw form. It is only The chief complaint against data
Complain
transformed when it is ready to be used. warehouses is the inability, or the

NEELIMA 9
BA III SEM UNIT - IV DAM
Parameters Data Lake Data Warehouse
problem faced when trying to make
change in in them.
They integrate different types of data to
Most users in an organization are
come up with entirely new questions as
Key operational. These type of users only
these users not likely to use data
Benefits care about reports and key
warehouses because they may need to
performance metrics.
go beyond its capabilities.

Q) List out Data Ware housing techniques.


A) Data warehousing techniques refer to the various methods and strategies used to design,
develop, and maintain data warehouses, which are large repositories of data used for reporting,
analysis, and business intelligence purposes. These techniques can vary in terms of data modeling,
storage, and processing. Here are some different data warehousing techniques:

1. Dimensional Modeling:
• Star Schema: In this technique, data is organized into a central fact table containing
quantitative measures and surrounding dimension tables that describe the context of the
measures.
• Snowflake Schema: It's an extension of the star schema, where dimension tables are
normalized into multiple related tables, reducing redundancy.
2. Data Integration:
• ETL (Extract, Transform, Load): ETL processes involve extracting data from source
systems, transforming it into a suitable format, and then loading it into the data warehouse.
• ELT (Extract, Load, Transform): ELT reverses the ETL process by first loading data
into the data warehouse and then transforming it as needed.
3. Data Storage:
• Data Warehouses: Traditional data warehousing systems store data in structured
databases optimized for analytical queries.
• Data Lakes: These store data in its raw, unstructured form, and offer flexibility for storing
both structured and unstructured data.
4. Data Processing:
• Batch Processing: Data is processed in batches at scheduled intervals, which is suitable for
historical reporting.
• Real-time Processing: Data is processed as it arrives, allowing for near real-time analytics
and decision-making.
5. Data Partitioning and Indexing:
• Partitioning: Data can be divided into partitions based on specific criteria like date, region,
or product. This enhances query performance and maintenance.
• Indexing: Indexes are created on columns to speed up data retrieval operations.
6. Data Compression and Archiving:
• Data Compression: Reduces the storage requirements of data while maintaining query
performance.
• Data Archiving: Moves older, less frequently accessed data to lower-cost storage to
optimize costs.
7. Data Security and Governance:

NEELIMA 10
BA III SEM UNIT - IV DAM
• Techniques to ensure data privacy, compliance with regulations (e.g., GDPR), and access
controls.
8. Data Quality and Cleansing:
• Ensuring data accuracy and consistency through processes like data profiling, data cleansing,
and data validation.
9. Scalability and Performance Optimization:
• Techniques like sharding, clustering, and distributed computing to scale data warehouses for
increased performance.
10. Cloud Data Warehousing:
• Utilizing cloud-based platforms and services like Amazon Redshift, Google BigQuery, or
Snowflake for flexible and scalable data warehousing.
11. Hybrid Data Warehousing:
• Combining on-premises and cloud-based data warehousing to leverage existing investments
while benefiting from cloud scalability.
12. Data Visualization and Reporting:
• Tools and techniques for creating dashboards, reports, and data visualizations to make data
insights accessible to business users.
13. Data Warehouse Automation:
• Using automation tools to accelerate the design, development, and maintenance of data
warehouses.
14. Data Warehouse as a Service (DWaaS):
• Outsourcing data warehousing to third-party providers who manage the infrastructure and
maintenance.
15. Streaming Data Warehousing:
• Handling real-time data streams for immediate analysis and decision-making.

The choice of data warehousing techniques depends on factors like data volume, complexity,
performance requirements, budget, and the specific needs of the organization. Data warehousing is
an evolving field, with new techniques and technologies continually emerging to meet the growing
demands of data-driven businesses.

Q) What is Data Integration?


A) Data integration is the process of combining data from different sources into a single, unified
view. Integration begins with the ingestion process, and includes steps such as cleansing, ETL
mapping, and transformation.

NEELIMA 11
BA III SEM UNIT - IV DAM

Need for Data Integration

• It is done for providing data in a specific view as requested by users, applications etc.
• The bigger the organization gets, the more data there is and the more data needs integration.
• Increases with the need for data sharing.
• Data integration is a key component of data-driven decision-making and the success of a
business.

Q) Compare techniques for Data Integration with regards to


warehousing
A) Data integration can be considered as one of the main components in the data
management process. It is the process of collecting and consolidating data from all sources into one
single dataset or data warehouse. The ultimate goal of data management is to provide users with
consistent access and delivery of data and to meet the different needs of all business applications
and processes.
There are several techniques and approaches to achieve data integration in the context of data
warehousing, each with its own advantages and disadvantages.

Here's a comparison of some common techniques for data integration with regards to data
warehousing:
Data Integration Techniques
The following are the technologies used for data integration:
1. Data Interchange
a. It is the structured transmission of organizational data between two or more organization
through electronic means; used for the transfer of electronic documents from one computer
another (i.e ., from one corporate trading partner to another).
b. Data interchange must not be seen merely as email. For instance, organizations might want to
do away with bills of lading (or even checks), and use appropriate EDI messages instead.
2. Object Brokering
a. An ORB (Object Request Broker) is a certain variety of middleware software. It gives
programmers the freedom to make calls from one computer to another over a computer
network.
b. It handles the transformation of in-process data structure to and from the byte sequence.
3. Modeling Techniques: There are two logical design techniques:
a. ER Modeling: Entity Relationship (ER) Modeling is a logical design technique whose main focus
is to reduce data redundancy. It is basically used for transaction capture and can contribute in the
initial stages of constructing a data warehouse. The reduction in the data redundancy solves the
problems of inserting, deleting, and updating data but it leads to yet another problem. In our bid to
keep redundancy to the minimum extent possible, we end up creating a whole lot of tables.
These huge numbers of tables imply dozens of joins between them. The result is a massive spider web
of joins between tables.

NEELIMA 12
BA III SEM UNIT - IV DAM
What could be the problems posed by ER Modeling?
• End-users find it difficult to comprehend and traverse through the ER model.
• Not too many software exist which can query a general ER model.
• ER Modeling cannot be used for data warehousing where the focus is on performance access
and satisfying ad hoc, unanticipated queries.
Example: Consider a library transaction system of a department of DIIT. Every transaction (issue of
book to a student or return of book by a student) are recorded. Let us draw an ER model to represent
the above-stated scenario.
Steps to drawing an ER model:
• Identify entities.
• Identify relationships between various entities.
• Identify the key attribute.
• Identify the other relevant attributes for the entities.
• Draw the ER diagram.
• Review the ER diagram with business users and get their sign-off.

Damage_Fine
Stud_ID
Book_ID Technology Transaction
_ID

Issue_Return Item_ID
Book Book
Book_Name Issue
d to

Author Issue_Date

Figure 6.15 considers


Publish2 entities:No.of
Book and Issue_Return.
Return_Date Due_Date
er Copies
Book entity has the attributes: Book_ID (key attribute), Technology, Book_Name, Author, Publisher,
NoOfCopies.
Issue_Return entity has the attributes: Stud_ID, Item_ID, Transaction_ID (key attribute),
Issue_Date, Due_Date, Return_Date, Damage_Fine, etc.
The relationship between the two entities (Book, Issue_Return) is 1:1.
b. Dimensional Modelling: It is a logical design technique, the main focus of which is to present
data in a standard format for end-user consumption. It is used for data warehouses having either a
Star schema or a Snowflake schema. Every dimensional model is composed of one large table called
the fact table and a number of relatively smaller tables called the dimensional tables. The fact
table has a multipart primary key. Each table has a single-part primary key. The dimension primary
key corresponds to precisely one component of the multipart key of the fact table. In addition to the
multipart primary key, the fact table also contains a few facts which are numeric and additive. The
dimension table generally contains textual data. The fact table maintains many to-many relationships.
What are the perceived benefits of the dimensional modeling?
• End-users find it easy to comprehend and traverse/navigate through the model.

NEELIMA 13
BA III SEM UNIT - IV DAM
• If designed appropriately, it can give quick responses to ad hoc query for information.

Q) list out Data warehousing tools.

A) Data warehousing tools are software platforms designed to facilitate the process of creating,
managing, and querying data warehouses. These tools are essential for organizations looking to
store, consolidate, and analyze large volumes of data from various sources to support business
intelligence, reporting, and data analytics. Here are some popular data warehousing tools as of my
last knowledge update in September 2021:

1. Amazon Redshift: A fully managed, petabyte-scale data warehouse service offered by AWS.
It's known for its scalability, cost-effectiveness, and integration with other AWS services.
2. Snowflake: A cloud-based data warehousing platform that provides features like data
sharing, data lakes integration, and support for structured and semi-structured data.
3. Google BigQuery: Google Cloud's data warehousing solution that allows you to run super-
fast SQL queries on large datasets. It's serverless and integrates well with other Google Cloud
services.
4. Microsoft Azure Synapse Analytics (formerly SQL Data Warehouse): Part of the
Microsoft Azure ecosystem, Synapse Analytics is designed for data warehousing and analytics
workloads. It supports both data warehousing and big data analytics.
5. Teradata: Teradata offers a powerful on-premises and cloud-based data warehousing
solution known for its performance and scalability. It's often used by enterprises for data
analytics.
6. IBM Db2 Warehouse: IBM's data warehousing solution that supports hybrid cloud
deployments and provides advanced analytics capabilities.
7. Oracle Exadata: Oracle's engineered system for data warehousing and analytics. It offers a
combination of hardware and software optimized for performance and scalability.
8. SAP BW/4HANA: SAP's data warehousing solution, built on the HANA in-memory database
platform. It's designed for real-time data processing and analytics.
9. Yellowbrick Data: A data warehouse platform known for its high performance and hybrid
cloud capabilities. It's designed for data-intensive workloads.
10. Vertica: A columnar database and data warehousing platform known for its speed and
scalability, especially for real-time analytics.
11. Couchbase: While primarily known as a NoSQL database, Couchbase offers a multi-
dimensional scaling feature that allows it to be used as a data warehousing solution for JSON
and semi-structured data.
12. Exasol: A high-performance, in-memory data warehousing solution known for its speed and
efficiency in processing large volumes of data.
13. Actian Avalanche: A cloud-native data warehousing platform designed for high-speed
analytics and data integration.
14. Panoply: A cloud data platform that automates the ETL (Extract, Transform, Load) process
and offers a data warehouse as a service.
15. HPE Vertica: Hewlett Packard Enterprise's data warehousing and analytics platform known
for its speed and scalability.

When choosing a data warehousing tool, organizations should consider factors such as
scalability, cost, integration capabilities, security, and the specific needs of their data analytics

NEELIMA 14
BA III SEM UNIT - IV DAM
projects. Many organizations also opt for cloud-based data warehousing solutions due to their
flexibility and scalability, but on-premises options are still prevalent for certain use cases.

Q) What is a Relational Data Warehouse (RDW)? Explain utility of RDW.


A) A relational data warehouse is where you centrally store and manage large volumes of
structured data copied from multiple data sources to be used for historical and trend analysis
reporting so your company can make better business decisions.

It is called relational because it is based on the relational model, a widely used approach to data
representation and organizational for databases.

In the relational model, data is organized into tables (also known as relations, hence the name).
These tables consist of rows and columns, where each row represents an entity (such as a customer
or product), and each column represents an attribute of that entity (like name, price, or quantity).

It is called a data warehouse because it collects, stores, and manages massive volumes of structured
data from various sources, such as transactional databases, application systems, and external data
feeds.

In a relational data warehouse, you will do a lot of work up front to get the data to where you can use
it to create reports. Doing all this work beforehand is a design and implementation methodology
referred to as a top-down approach. This approach works well for historical-type reporting, in
which you’re trying to determine what happened (descriptive analytics) and why it happened
(diagnostic analytics).

In the top-down approach, you establish the overall planning, design, and architecture of the data
warehouse first, then develop specific components. This method emphasizes the importance of
defining an enterprise-wide vision and understanding the organization’s strategic goals and
information requirements before diving into the development of the data warehouse.

The major benefits or utilities you can get from using a relational data warehouse:

• Reduce stress on the production system


• Optimize for read access
• Integrate multiple sources of data
• Run accurate historical reports
• Restructure and rename tables
• Protection against application upgrades
• Reduced security concerns
• Keep historical data
• Master Data Management (MDM)
• Improve data quality by plugging holes in source systems
• No IT involvement needed to create reports

NEELIMA 15
BA III SEM UNIT - IV DAM
Q) What is Data Cube? Explain Utilities of Data Cube.
A) a Data cube refers to a multi-dimensional data structure. That is, data within the data
cube is explained by specific dimensional values.

Elements of a Data Cube

• A data cube is a multi-dimensional data structure.


• A data cube is characterized by its dimensions (e.g., Products, States, Date).
• Each dimension is associated with corresponding attributes (for example, the attributes of the
Products dimensions are T-Shirt, Shirt, Jeans and Jackets).
• The dimensions of a cube allow for a concept hierarchy (e.g., the T-shirt attribute in the Products
dimension can have its own, such as T-shirt Brands).
• All dimensions connect in order to create a certain fact – the finest part of the cube.
• A fact has a corresponding measure in the data cube. Typically, the fact measure in a data cube
for a chain retail business is the revenue (such as the $900 revenue from jeans purchases in Indiana
during the second quarter).

Data cubes are a very convenient tool whenever one needs to build summaries or extract certain
portions of the entire dataset. We will cover the following:

• Rollup – decreases dimensionality by aggregating data along a certain dimension


• Drill-down – increases dimensionality by splitting the data further
• Slicing – decreases dimensionality by choosing a single value from a particular dimension
• Dicing – picks a subset of values from each dimension
• Pivoting – rotates the data cube

Q) What is in-memory Scenarios? Explain utilities of in-memory


scenarios.
A) An in-memory database keeps all data in the main memory or RAM of a computer. A traditional
database retrieves data from disk drives. In-memory databases are faster than traditional databases
because they require fewer CPU instructions. They also eliminate the time it takes to access data from
a disk.

Advantages of in – memory data bases

NEELIMA 16
BA III SEM UNIT - IV DAM
Low latency, providing real time responses
Latency is the lag between the request to access data and the application's response. In-memory
databases offer predictable low latencies irrespective of scale. They deliver microsecond read
latency, single-digit millisecond write latency, and high throughput.

As a result, in-memory storage allows enterprises to make data-based decisions in real-time. You
can design applications that process data and respond to changes before it's too late. For example,
in-memory computing of sensor data from self-driving vehicles can give the desired split-second
response time for emergency braking.

High throughput
In-memory databases are known for their high throughput. Throughput refers to the number of read
(read throughput) or write (write throughput) operations over a given period of time. Examples
include bytes/minute or transactions per second.
High scalability
You can scale your in-memory database to meet fluctuating application demands. Both write and
read scaling is possible without adversely impacting performance. The database stays online and
supports read-and-write operations during resizing.

In-memory database examples

In-memory databases can find their place in many different scenarios. Some of the typical use cases
could include:

• IoT data: IoT sensors can provide large amounts of data. An in-memory database could be
used for storing and computing data to later be stored in a traditional database.
• E-commerce: Some parts of e-commerce applications, such as the shopping cart, can be
stored in an in-memory database for faster retrieval on each page view, while the product
catalogue could be stored in a traditional database.
• Gaming: Leader boards require quick updates and fast reads when millions of players are
accessing a game at the same time. In-memory databases can help to sort the results more
quickly than traditional databases.
• Session management: In stateful web applications, a session is created to keep track of a
user identity and recent actions. Storing this information in an in-memory database avoids a
round trip to the central database with each web request.

NEELIMA 17

You might also like