0% found this document useful (0 votes)
11 views

Data warehouse fourth unit notes

The Multi Dimensional Data Model enables users to analyze business trends through data cubes, which represent data across multiple dimensions. It outlines two process architectures: Centralized, which processes data in a single location, and Distributed, which spreads processing across data centers. Additionally, it discusses types of database parallelism, including intraquery and interquery parallelism, to enhance query execution speed and efficiency.

Uploaded by

dhanushmaame2629
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Data warehouse fourth unit notes

The Multi Dimensional Data Model enables users to analyze business trends through data cubes, which represent data across multiple dimensions. It outlines two process architectures: Centralized, which processes data in a single location, and Distributed, which spreads processing across data centers. Additionally, it discusses types of database parallelism, including intraquery and interquery parallelism, to enhance query execution speed and efficiency.

Uploaded by

dhanushmaame2629
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Unit - IV

Multi Dimensional Data Model


The Multi Dimensional Data Model allows customers to interrogate analytical questions
associated with market or business trends, unlike relational databases which allow
customers to access data in the form of queries. They allow users to rapidly receive
answers to the requests which they made by creating and examining the data
comparatively fast.
OLAP (online analytical processing) and data warehousing uses multi dimensional
databases. It is used to show multiple dimensions of the data to users.
It represents data in the form of data cubes. Data cubes allow to model and view the
data from many dimensions and perspectives. It is defined by dimensions and facts and
is represented by a fact table. Facts are numerical measures and fact tables contain
measures of the related dimensional tables or names of the facts.

The following diagram shows the 2D view of Sales data


The following diagram shows the 3D view of sales of data of AllElectronics company

The corresponding data cube for 3D data as follows

4D view of sales data


Cuboid for above sales data as follows
Process Architecture

The process architecture defines an architecture in which the data from the data
warehouse is processed for a particular computation.

Following are the two fundamental process architectures:


Centralized Process Architecture

In this architecture, the data is collected into single centralized storage and
processed upon completion by a single machine with a huge structure in terms of
memory, processor, and storage.

Centralized process architecture evolved with transaction processing and is well


suited for small organizations with one location of service.

It requires minimal resources both from people and system perspectives.

It is very successful when the collection and consumption of data occur at the same
location.
Distributed Process Architecture

In this architecture, information and its processing are allocated across data
centers, and its processing is distributed across data centers, and processing of data
is localized with the group of the results into centralized storage. Distributed
architectures are used to overcome the limitations of the centralized process
architectures where all the information needs to be collected to one central
location, and results are available in one central location.

There are several architectures of the distributed process:

Client-Server

In this architecture, the user does all the information collecting and presentation,
while the server does the processing and management of data.

Three-tier Architecture
With client-server architecture, the client machines need to be connected to a
server machine, thus mandating finite states and introducing latencies and
overhead in terms of record to be carried between clients and servers.

N-tier Architecture

The n-tier or multi-tier architecture is where clients, middleware, applications, and


servers are isolated into tiers.

Cluster Architecture

In this architecture, machines that are connected in network architecture (software


or hardware) to approximately work together to process information or compute
requirements in parallel. Each device in a cluster is associated with a function that
is processed locally, and the result sets are collected to a master server that returns
it to the user.

Peer-to-Peer Architecture

This is a type of architecture where there are no dedicated servers and clients.
Instead, all the processing responsibilities are allocated among all machines, called
peers. Each machine can perform the function of a client or server or just process
data.

Types of Database parallelism


parallelism is used to support speedup, where queries are executed faster because
more resources, such as processors and disks, are provided. Parallelism is also used
to provide scale-up, where increasing workloads are managed without increase
response-time, via an increase in the degree of parallelism.

Different architectures for parallel database systems are shared-memory, shared-


disk, shared-nothing, and hierarchical structures.

(a)Horizontal Parallelism: It means that the database is partitioned across


multiple disks, and parallel processing occurs within a specific task (i.e., table
scan) that is performed concurrently on different processors against different sets
of data.

(b)Vertical Parallelism: It occurs among various tasks. All component query


operations (i.e., scan, join, and sort) are executed in parallel in a pipelined fashion.
In other words, an output from one function (e.g., join) as soon as records become
available.

Intraquery Parallelism

Intraquery parallelism defines the execution of a single query in parallel on


multiple processors and disks. Using intraquery parallelism is essential for
speeding up long-running queries.

Interquery parallelism does not help in this function since each query is run
sequentially.

To improve the situation, many DBMS vendors developed versions of their


products that utilized intraquery parallelism.

This application of parallelism decomposes the serial SQL, query into lower-level
operations such as scan, join, sort, and aggregation.

These lower-level operations are executed concurrently, in parallel.

Interquery Parallelism

In interquery parallelism, different queries or transaction execute in parallel with


one another.

You might also like