0% found this document useful (0 votes)

19 views32 pages

FINAL DW Record PDF

Uploaded by

harinithiya833

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views32 pages

FINAL DW Record PDF

Uploaded by

harinithiya833

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

HINDUSTHAN

INSTITUTE OF TECHNOLOGY
COIMBATORE 641 032

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Name : GURUMOORTHY M S

Register No: 720822103055

Class/Sec: CSE-A
lOMoAR cPSD| 45982007

HINDUSTHAN
INSTITUTE OF TECHNOLOGY
COIMBATORE 641 032

This is to Certified that this is the bonafide record of work done by

…………………………………………………………………………………………………
…………………………………….
in the 22CS509L – DATA WARHOUSING LABORATORY of this
B.E COMPUTER SCIENCE AND ENGINEERING for the V Semester during
the year 2024.

Place:
Date:

Staff Incharge Head of the Department

UNIVERSITY REGISTER NUMBER ………………………………………………….

Submitted for the Autonomous End Semester practical Examination conducted on

……………………………….

Internal Examiner External Examiner

lOMoAR cPSD| 45982007

CONTENTS
S.No Date Experiment Page No Marks Sign

Data exploration and integration with

1
WEKA

Apply weka tool for data validation

Plan the architecture for real time

3 application

Write the query for schema definition

Design data ware house for real time

applications
5

Analyse the dimensional Modeling

Case study using OLAP

Case study using OTLP

Implementation of warehouse testing

9
lOMoAR cPSD| 45982007

EX.NO.:1 DATA EXPLORATION AND INTEGRATION WITH WEKA

AIM:
To exploring the data and performing integration with weka

PROCEDURE:

WEKA - an open-source software provides tools for data preprocessing, implementation of

several Machine Learning algorithms, and visualization tools so that you can develop machine
learning techniques and apply them to real-world data mining problems. What WEKA offers is
summarized in the following diagram:
lOMoAR cPSD| 45982007

WEKA Installation

To install WEKA on your machine, visit WEKA’s official website and download the installation
file. WEKA supports installation on Windows, Mac OS X and Linux. You just need to follow the
instructions on this page to install WEKA for your OS.
The WEKA GUI Chooser application will start and you would see the following screen

The GUI Chooser application allows you to run five different types of applications as listed
here:
• Explorer

• Experimenter

• KnowledgeFlow

• Workbench

• Simple CLI

WEKA – Launching Explorer

let us look into various functionalities that the explorer provides for working with big data. When
you click on the Explorer button in the Applications selector, it opens the following screen:
lOMoAR cPSD| 45982007

On the top, you will see several tabs as listed here:

• Preprocess

• Classify

• Cluster

• Associate

• Select Attributes

• Visualize

Under these tabs, there are several pre-implemented machine learning algorithms. Let us
look into each of them in detail now.
Preprocess Tab
Initially as you open the explorer, only the Preprocess tab is enabled. The first step in machine
learning is to preprocess the data. Thus, in the Preprocess option, you will select the data file,
process it and make it fit for applying the various machine learning algorithms.
Classify Tab
The Classify tab provides you several machine learning algorithms for the classification of your
data. To list a few, you may apply algorithms such as Linear Regression, Logistic Regression,
Support Vector Machines, Decision Trees, RandomTree, RandomForest, NaiveBayes, and so on.
The list is very exhaustive and provides both supervised and unsupervised machine learning
algorithms.
Cluster Tab
lOMoAR cPSD| 45982007

Under the Cluster tab, there are several clustering algorithms provided - such as SimpleKMeans,
FilteredClusterer, HierarchicalClusterer, and so on.
Associate Tab
Under the Associate tab, you would find Apriori, FilteredAssociator and FPGrowth.
Select Attributes Tab
Select Attributes allows you feature selections based on several algorithms such as
ClassifierSubsetEval, PrinicipalComponents, etc.
Visualize Tab
Lastly, the Visualize option allows you to visualize your processed data for analysis. As you
noticed, WEKA provides several ready-to-use algorithms for testing and building your machine
learning applications. To use WEKA effectively, you must have a sound knowledge of these
algorithms, how they work, which one to choose under what circumstances, what to look for in
their processed output, and so on. In short, you must have a solid foundation in machine learning
to use WEKA effectively in building your apps.

Loading Data

The data can be loaded from the following sources:

• Local file system

• Web

• Database

Loading Data from Local File System

There are three buttons
• Open file

• Open URL

• Open DB

Click on the Open file ... button. A directory navigator window opens as shown in the following
screen
lOMoAR cPSD| 45982007

Loading Data from Web

Once you click on the Open URL button, you can see a window as follows:
We will open the file from a public URL Type the following URL in the popup box:
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/weather.nominal.arff
You may specify any other URL where your data is stored. The Explorer will load the data from
the remote site into its environment.
Loading Data from DB
Once you click on the Open DB button, you can see a window as follows:

As you would notice it supports several formats including CSV and JSON. The default file type
is Arff.

Arff Format
An Arff file contains two sections - header and data.
• The header describes the attribute types.

• The data section contains a comma separated list of data.

From the screenshot, you can infer the following points:

• The @relation tag defines the name of the database.

• The @attribute tag defines the attributes.

• The @data tag starts the list of data rows each containing the comma separated

• fields.
• The attributes can take nominal values as in the case of outlook shown here:
@attribute outlook (sunny, overcast, rainy)
lOMoAR cPSD| 45982007

As an example for Arff format, the Weather data file loaded from the WEKA sample databases is
shown below:

• The attributes can take real values as in this case: @attribute temperature real
• You can also set a Target or a Class variable called play as shown here:
@attribute play (yes, no)
• The Target assumes two nominal values yes or no.
lOMoAR cPSD| 45982007

Understanding Data
Let us first look at the highlighted Current relation sub window. It shows the name of the database
that is currently loaded. You can infer two points from this sub window:
• There are 14 instances - the number of rows in the table.

• The table contains 5 attributes - the fields, which are discussed in the upcoming

• sections.

On the left side, notice the Attributes sub window that displays the various fields in the
database.
lOMoAR cPSD| 45982007

The weather database contains five fields - outlook, temperature, humidity, windy and play.
when you select an attribute from this list by clicking on it, further details on the attribute itself
are displayed on the right hand side.
Let us select the temperature attribute first. When you click on it, you would see the following
screen:

In the Selected Attribute subwindow, you can observe the following:

• The name and the type of the attribute are displayed.

• The type for the temperature attribute is Nominal.

• The number of Missing values is zero.

• There are three distinct values with no unique value.

• The table underneath this information shows the nominal values for this field as

hot, mild and cold.

• It also shows the count and weight in terms of a percentage for each nominal
value.
At the bottom of the window, you see the visual representation of the class values
If you click on the Visualize All button, you will be able to see all features in one single window
as shown here:
lOMoAR cPSD| 45982007

Removing Attributes
Many a time, the data that you want to use for model building comes with many irrelevant fields.
For example, the customer database may contain his mobile number which is relevant in analysing
his credit rating

To remove Attribute/s select them and click on the Remove button at the bottom.
The selected attributes would be removed from the database. After you fully preprocess the data,
you can save it for model building.
Next, you will learn to preprocess the data by applying filters on this data.
lOMoAR cPSD| 45982007

Data Integration

Suppose you have 2 datasets as below and need to merge them together

Open a Command Line Interface

• Run the following command replacing values as needed

• java -cp weka.jar weka.core.Instances merge <path to file1> <path to file 2> >

<path to result file>

Example

java weka.core.Instances merge C:\Users\Ram\Downloads\file1.csv C:\Users\Ram\Downloads\

file2.csv > C:\Users\Ram\Downloads\results.csv

Finished redirecting output to 'C:\Users\Ram\Downloads\results.csv'.

Now you can see results.csv or

results.csv file in your given location as be
lOMoAR cPSD| 45982007

EX.NO.:2 APPLY WEKA TOOL FOR DATA VALIDATION

AIM:

To validate the data stored in data warehouse using Weka tool.

PROCEDURE:

Data validation is the process of verifying and validating data that is collected before it is
used. Any type of data handling task, whether it is gathering data, analyzing it, or structuring it
for presentation, must include data validation to ensure accurate results.

1. Data Sampling

After loading your dataset

• Click on choose ( certain datasets in sample datasets does not allow this operation. I used
Brest-cancer dataset for this experiment )
• Filters -> supervised -> Instance -> Re-sample
• Click on the name of the algorithm to change parameters
• Change biasToUniformClass to have a biased sample. If you set it to 1 resulting dataset
will have equal number of instances for each class. Ex:- Brest-cancer positive 20 negative
20.
• Change noReplacement accordingly.
• Change sampleSizePrecent accordingly. ( self explanatory )

2. Removing duplicates

• Choose filters -> unsupervised -> instance -> RemoveDuplicates

lOMoAR cPSD| 45982007

• Compare the results as below

1. Dealing with missing values

• Open labor.arff file using weka. ( which has missing values )

• Click on edit button on the top bar to get a view of dataset as blow. Here you can clearly
see the missing values in gray areas.

• Filters -> unsupervised -> attribute -> replaceMissingValuesWithUserConstants

• In attributes set the column number you want to replace values.
• Set nominalStringReplacementValue to a suitable string if selected column is nominal.
• Set numericReplacementValue to 0, -1 depending on your requirement if selected column
is numeric.
• fill all replacement values, add “first-last” to attribute column to apply for all columns at
once.
• ReplaceMissingValues filter which replace numeric values with mean and nominal values
with mode.
lOMoAR cPSD| 45982007

3. Data Reduction

PCA
• Load iris dataset
• Filters -> unsupervised -> attribute -> PrincipleComponents

• Original iris dataset have 5 columns. ( 4 data + 1 class ). Lets reduce that to 3 columns ( 2
data + 1 class ).

• maximumAttributes – No of attributes we need after reduction.

•
maximumAttributeNames – PCA algorithm calculates 4 principle components for this
dataset. Upon them we are selecting the 2 components which have the most variance ( PC1,
PC2 ). Then we need to re-represent data again using these selected components ( reducing
4D plot to 2D plot ). In this process we can select how many principle components we are
using when re-generating values. See the final result below where you can see new columns
are created using 3 principle components multiplied by respective bias values.
lOMoAR cPSD| 45982007

4. Data transformation

Normalization
• Load iris dataset
• Filters -> unsupervised -> attribute -> normalize
• Normalization is important when you don’t know the distribution of data beforehand.
• Scale is the length of number line and translation is the lower bound.
• Ex :- scale 2 and translation -1 => -1 to 1, scale 4 and translation -2 => -2 to 2
• This filter get applied to all numeric columns. You can’t selectively normalize.
Standardization
• Load iris dataset.
• Used when dataset known to be in Gaussian (bell curve) distribution.
• Filters -> unsupervised -> attribute -> standardize
• This filter get applied to all numeric columns. You can’t selectively standardize.

Discretization
• Load diabetes dataset.
• Discretization comes in handy when using decision trees.
• Suppose you need to change weight column to two values like low and high.
lOMoAR cPSD| 45982007

• Set column number 6 to AttributeIndices.

• Set bins to 2 ( Low/ High)
• When you set equal frequency to true there will be equal number of high and low entries
in the final column.
lOMoAR cPSD| 45982007

EX.NO.:3 PLAN THE ARCHITECTURE FOR REAL TIME APPLICATION

AIM:
To plan the architecture for real time application.

PROCEDURE:

DESIGN STEPS:

1. Gather Requirements: Aligning the business goals and needs of different departments
with the overall data warehouse project.
2. Set Up Environments: This step is about creating three environments for data warehouse
development, testing, and production, each running on separate servers
3. Data Modeling: Design the data warehouse schema, including the fact tables and
dimension tables, to support the business requirements.
4. Develop Your ETL Process: ETL stands for Extract, Transform, and Load. This process
is how data gets moved from its source into your warehouse.
5. OLAP Cube Design: Design OLAP cubes to support analysis and reporting requirements.
6. Reporting & Analysis: Developing and deploying the reporting and analytics tools that
will be used to extract insights and knowledge from the data warehouse.
7. Optimize Queries: Optimizing queries ensures that the system can handle large amounts
of data and respond quickly to queries.
8. Establish a Rollout Plan: Determine how the data warehouse will be introduced to the
organization, which groups or individuals will have access to it, and how the data will be
presented to these users.
lOMoAR cPSD| 45982007

EX.NO.:4 QUERY FOR SCHEMA DEFINITION

AIM:
To Write a query for Star, Snowflake and Galaxy schema definitions.

PROCEDURE:

STAR SCHEMA

• Each dimension in a star schema is represented with only one-dimension table.

• This dimension table contains the set of attributes.
• There is a fact table at the center. It contains the keys to each of four dimensions.
• The fact table also contains the attributes

SNOWFLAKE SCHEMA

• Some dimension tables in the Snowflake schema are normalized.

• The normalization splits up the data into additional tables.
• Unlike Star schema, the dimensions table in a snowflake schema are normalized.

FACT CONSTELLATION SCHEMA

• A fact constellation has multiple fact tables. It is also known as galaxy schema.
• The sales fact table is same as that in the star schema.
• The sales fact table is same as that in the star schema.
• The shipping fact table also contains two measures, namely dollars sold and units sold.

SYNTAX:

Cube Definition:
define cube < cube_name > [ < dimension-list > }: < measure_list >
Dimension Definition:
define dimension < dimension_name > as ( < attribute_or_dimension_list > )
SAMPLE PROGRAM:

STAR SCHEMA DEFINITION

define cube sales star [time, item, branch, location]:

dollars sold = sum(sales in dollars), units sold = count(*)

define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier type)
lOMoAR cPSD| 45982007

define dimension branch as (branch key, branch name, branch type)

define dimension location as (location key, street, city, province or state, country)

SNOWFLAKE SCHEMA DEFINITION

define cube sales snowflake [time, item, branch, location]:

dollars sold = sum(sales in dollars), units sold = count(*)

define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier (supplier key, supplier type))
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city (city key, city, province or state, country))

FACT CONSTELLATION SCHEMA DEFINITION

define cube sales [time, item, branch, location]:

dollars sold = sum(sales in dollars), units sold = count(*)

define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier type)
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city, province or state,country)
define cube shipping [time, item, shipper, from location, to location]:

dollars cost = sum(cost in dollars), units shipped = count(*)

define dimension time as time in cube sales

define dimension item as item in cube sales
define dimension shipper as (shipper key, shipper name, location as location in cube sales, shipper
type)
define dimension from location as location in cube sales
define dimension to location as location in cube sales
OUTPUT :
STAR SCHEMA
lOMoAR cPSD| 45982007

SNOWFLAKE SCHEMA

FACT CONSTELLATION SCHEMA

RESULT:
Thus the query for star, Snowflake and Galaxy schema was written Successfully.
lOMoAR cPSD| 45982007

EX.NO.:5 DESIGN DATA WARE HOUSE FOR REAL TIME APPLICATIONS

AIM:
To design a data warehouse for real time applications

PROCEDURE:

DATA CLEANSING AND TRANSFORMING

Dropping Tables

Since decision-making is concerned with the trends related to students’ history, behavior, and
academic
performance, tables “assets” and “item” are not needed; and therefore, they are discarded and
excluded from the
data warehouse. DROP
TABLE assets ; DROP
TABLE item ;

Merging Tables
Based on the design assumptions, the three tables “department”, “section”, and “course” do not
constitute
separately important parameters for extracting relevant patterns and discovering knowledge.
Therefore, they are
merged altogether with the “transcript_fact_table” table.
SELECT co_name FROM course, section, transcript
WHERE tr_id = n AND str_semester/year = se_semester/year AND tr_se_num = se_num AND
se_code =
co_code ;
ALTER TABLE transcript fact table ADD co_course TEXT ;
DROP TABLE department ;
DROP TABLE section ;
DROP TABLE course ;
lOMoAR cPSD| 45982007

Furthermore, table “Activities” is merged with table “RegistrationActivities” and a new table is produced
called “RegisteredActivities”.
SELECT act_name FROM activities, registrationActivities WHERE reg_act_id =
act_id ;
New Columns
During transformation new columns can be added. In fact, tr_courseDifficulty is added to table
“transcript_fact_table” in order to increase the degree of knowledge and information.
ALTER TABLE transcript_fact_table ADD tr_courseDifficulty TEXT ; Moreover a Boolean
column is added to table “receipt” called re_paidOnDueDate ALTER TABLE receipt
(re_paidOnDueDate) ;
Removing Columns
Unnecessary columns can be removed too during the transformation process. Below is a list of useless columns
that were discarded during the transformation process from tables “Account”, “Student”, “Receipt” and
“Activities” respectively:
ALTER TABLE Receipt REMOVE re_dueDate REMOVE
re_dateOfPayment ;
ALTER TABLE Activities REMOVE ac_supervisor ; ALTER
TABLE Student REMOVE st_phone REMOVE st_email ;
Conceptual Schema – The Snowflake Schema
The proposed data warehouse is a Snowflake type design with one center fact table and seven dimensions
Output:
lOMoAR cPSD| 45982007

EX.NO:6 Analyse the dimensional modelling

AIM:

To design and Analyse the dimensional modelling

PROCEDURE:

Steps to Create Dimensional Data Modeling

Step-1: Identifying the business objective: The first step is to identify the business objective. Sales,
HR, Marketing, etc. are some examples of the need of the organization. Since it is the most important
step of Data Modelling the selection of business objectives also depends on the quality of data
available for that process.

Step-2: Identifying Granularity: Granularity is the lowest level of information stored in the table.
The level of detail for business problems and its solution is described by Grain.

Step-3: Identifying Dimensions and their Attributes: Dimensions are objects or things.
Dimensions categorize and describe data warehouse facts and measures in a way that supports
meaningful answers to business questions. A data warehouse organizes descriptive attributes as
columns in dimension tables. For Example, the data dimension may contain data like a year, month,
and weekday.

Step-4: Identifying the Fact: The measurable data is held by the fact table. Most of the fact table
rows are numerical values like price or cost per unit, etc.

Step-5: Building of Schema: We implement the Dimension Model in this step. A schema is a
database structure. There are two popular schemes: Star Schema and

Dimensional Data Modeling Steps

Dimensional data modeling is a technique used in data warehousing to organize and structure data in
a way that makes it easy to analyze and understand. In a dimensional data model, data is organized
into dimensions and facts.

Overall, dimensional data modeling is an effective technique for organizing and structuring data in a
data warehouse for analysis and reporting. By providing a simple and intuitive structure for the data,
the dimensional model makes it easy for users to access and understand the data they need to make
informed business decisions

Advantages of Dimensional Data Modeling

• Simplified Data Access: Dimensional data modeling enables users to easily access data
through simple queries, reducing the time and effort required to retrieve and analyze data.
• Enhanced Query Performance: The simple structure of dimensional data modelingallows
for faster query performance, particularly when compared to relation modeling
• uses simple, intuitive structures that are easy to understand, even for non-technical users.

Disadvantages of Dimensional Data Modeling

• Limited Complexity: Dimensional data modeling may not be suitable for very complex data
relationships, as it relies on simple structures to organize data.
lOMoAR cPSD| 45982007

Limited Integration: Dimensional data modeling may not integrate well with other data
models, particularly those that rely on normalization techniques.

OUTPUT

RESULT:

Thus the Analyse the dimensional modelling was written Successfully.

lOMoAR cPSD| 45982007

EX.NO:7 CASE STUDY USING OLAP

AIM:

To write a case study using OLAP for whatsapp application.

INTRODUCTION:

WhatsApp is one of the most popular messaging applications worldwide, known for its real-time
communication capabilities. To manage its vast amount of data generated by millions of users, WhatsApp
employs Online Transaction Processing (OLTP) systems. This case study explores how WhatsApp utilizes
OLTP to enhance user experience, maintain data integrity, and ensure scalability.

Objectives

• Real-time Messaging: Ensure instant message delivery and retrieval.

• *Data Integrity*: Maintain accuracy and consistency of user data (messages, contacts, etc.).

• Scalability: Support millions of concurrent users without performance degradation.

OLTP Features Used by WhatsApp

1. Real-Time Data Processing

- WhatsApp's architecture relies on OLTP to handle a large number of transactions per second. Each
message sent or received is processed as a transaction, ensuring that messages are delivered promptly.2.
*ACID Compliance*

- WhatsApp's OLTP system adheres to ACID (Atomicity, Consistency, Isolation, Durability) properties to
ensure that every message transaction is reliable. This is crucial for maintaining the integrity of message data,
especially in scenarios where messages are sent or received while users are offline.

3. User Authentication and Management

- User authentication is a critical OLTP function. WhatsApp uses its OLTP system to manage user
accounts, including registration, login, and account recovery processes. This ensures secure and consistent
user experiences.

4. *Scalable Architecture*

- WhatsApp's OLTP system is designed to scale horizontally, allowing it to handle increased loads by
adding more servers. This architecture helps maintain performance during peak times when user activity
spikes.

5. *Data Synchronization

The app syncs messages across devices using an OLTP approach, ensuring that users have access to their
chat history regardless of the device being used. This requires real-time updates to the database as messages
are sent and received

Challenges and Solutions

1. High Volume of Transactions

- Challenge: WhatsApp handles billions of messages daily, leading to potential bottlenecks.

- *Solution*: Distributed database systems and load balancing are used to spread the load across multiple
servers, ensuring smooth operation.
lOMoAR cPSD| 45982007

2. *Network Reliability*

- Challenge: Users in different regions may experience varying network conditions.

- *Solution*: WhatsApp implements message queuing to store messages temporarily during poor
connectivity, ensuring messages are delivered once the connection is restored.

3. Data Consistency Across Devices

- *Challenge*: Users often switch between devices, leading to potential data inconsistencies.

- *Solution*: The OLTP system synchronizes data across all devices in real-time, utilizing unique
identifiers for messages to ensure that all devices reflect the same information.

Results

• *User Satisfaction*: The implementation of OLTP has resulted in minimal delays in message delivery,
leading to high user satisfaction and retention rates.

• *Robust Security*: By ensuring data integrity and secure user authentication, WhatsApp maintains user
trust, which is vital for any messaging platform.

• *Growth Management*: The scalable nature of the OLTP architecture has allowed WhatsApp to
accommodate rapid growth in user numbers without sacrificing performance.
lOMoAR cPSD| 45982007

Conclusion

WhatsApp's effective use of OLTP systems is crucial for its success as a leading messaging application. By
ensuring real-time processing, data integrity, and scalability, WhatsApp can provide a seamless and reliable
user experience, positioning itself as a trusted platform for communication worldwide. This case study
highlights the importance of OLTP in handling high-volume transactions and maintaining consistent, real-
time data across a vast user base.
lOMoAR cPSD| 45982007

EX.NO:8 CASE STUDY USING OLTP

AIM:

To write a case study using OLAP for finagle a leading bank.

INTRODUCTION:

A major global bank with millions of customers and thousands of branches across various countries
needed a robust solution to manage its high volume of daily transactions. The bank's primary
challenge was processing millions of banking transactions in real-time while ensuring data
accuracy, security, and high availability. It required an ecient Online Transaction Processing
(OLTP) system to handle transactions such as deposits, withdrawals, money transfers, and account
balance inquiries.

Objective:

To implement an OLTP system that could handle large- scale, real-time banking transactions while
providing high availability, reliability, and scalability.

Solution: OLTP with Finacle Core Banking System

The bank decided to implement the Finacle Core Banking Solution, an OLTP system developed by
Infosys that supports real-time transaction processing and helps banks manage day-to-day
operations eciently.

Key Features of the OLTP Solution: 1. Real-Time Transaction Processing:

Finacle’s OLTP capabilities enabled the bank to process millions of transactions in real-time across
various banking channels, such as branch banking, online banking, mobile banking, and ATMs.
Each transaction (such as a cash deposit, loan disbursement, or account balance update) was
processed instantly, ensuring immediate updates to the system.

Transactions included:
• - Customer Deposits and Withdrawals

• - Fund Transfers (both within the bank and interbank transfers)

- Loan Payments and Disbursements

-Account Management(opening, closing, and updating customer details)

2. High Availability and Reliability:

- Finacle ensured 24/7 availability, which was critical for the bank’s global operations. Its
architecture supported real-time replication, ensuring that all transactions were instantly recorded in
the database, even during high-trac periods.

- To enhance data integrity, Finacle’s OLTP system maintained ACID (Atomicity, Consistency,
Isolation, Durability) properties to guarantee that every transaction was processed accurately and
securely. This prevented issues like double spending or data corruption.

3. Scalability and Performance:

lOMoAR cPSD| 45982007

- As the bank grew, Finacle’s OLTP system could scale horizontally and vertically to accommodate
the increasing volume of transactions. The system could handle thousands of simultaneous
transactions without any performance degradation.

- The system’s database(often integrated with Oracle or Microsoft SQL Server) allowed for
concurrent access by thousands of users, ensuring smooth operations even during peak hours.

4. Multi-Channel Support:
- Finacle supported seamless integration across

dierent banking channels, including:

- Branches: Bank sta could quickly process customer transactions, reducing wait times.

- ATMs: Customers could withdraw, deposit, or transfer money instantly.

- Online and Mobile Banking: Real-time balance inquiries, bill payments, and transfers were made
possible for millions of users.

- All these channels accessed the same underlying database, ensuring consistency across the entire
system. For example, if a customer deposited money at an ATM, their account balance would be
immediately updated, and the same information would be available at the branch and on mobile
banking.

5. Security and Compliance:

- As a banking system, Finacle adhered to strict security protocols to protect sensitive financial data.
The OLTP system included encryption, authentication and fraud detection mechanisms to ensure
that only authorized users could access or modify data.

- The system also ensured regulatory compliance with banking regulations in various countries,
allowing the bank to operate globally without risk of non- compliance.

6. Data Recovery and Backups:

Finacle’s OLTP system provided robust backup and disaster recovery mechanisms, ensuring that
data could be recovered in case of system failures or disasters. This minimized downtime and
prevented any potential data loss.
lOMoAR cPSD| 45982007

CONCLUSION:

By implementing the **Finacle Core Banking Solution**, the bank was able to eciently manage its
high-volume, real-time transactions across various channels. The system’s OLTP capabilities
ensured fast, accurate, and secure transaction processing, leading to a better customer experience
and improved operational eciency.

Finacle’s OLTP system not only addressed the bank’s immediate needs but also provided a
scalable, secure, and reliable solution that could grow with the bank’s expanding operations. This
case study highlights how OLTP systems like Finacle can transform banking operations by
ensuring real-time processing, high availability, and enhanced security in a complex, fast- paced
environment.

Data_Warehousing_Lab_Record_Final
No ratings yet
Data_Warehousing_Lab_Record_Final
45 pages
ETL Testing Material-Final
100% (1)
ETL Testing Material-Final
121 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
51 pages
Data warehousing
No ratings yet
Data warehousing
54 pages
dw9exp1(1)
No ratings yet
dw9exp1(1)
43 pages
DWDM RECORD PRINT1
No ratings yet
DWDM RECORD PRINT1
100 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
data mining file
No ratings yet
data mining file
87 pages
Unit-7 Tools of AI (April 9, 2024)
No ratings yet
Unit-7 Tools of AI (April 9, 2024)
88 pages
Data Warehouse Lab Record
No ratings yet
Data Warehouse Lab Record
65 pages
itdw
No ratings yet
itdw
44 pages
Keshav-ML-8
No ratings yet
Keshav-ML-8
61 pages
CCS341-DW LAB Manual - Chumma Chumma Practical Notes
No ratings yet
CCS341-DW LAB Manual - Chumma Chumma Practical Notes
89 pages
OS journal
No ratings yet
OS journal
28 pages
Data Warehouse Manuel
No ratings yet
Data Warehouse Manuel
44 pages
Dm&pa Lab Manual
No ratings yet
Dm&pa Lab Manual
68 pages
Data Warehouse Final Record
No ratings yet
Data Warehouse Final Record
55 pages
Data Warehousing Lab Manual
No ratings yet
Data Warehousing Lab Manual
36 pages
Lab Updated - Merged
No ratings yet
Lab Updated - Merged
49 pages
Ccs341 DW Lab Manual Chumma Chumma Practical Notes
No ratings yet
Ccs341 DW Lab Manual Chumma Chumma Practical Notes
89 pages
dm-lab-manualiii-i-1-mrits
No ratings yet
dm-lab-manualiii-i-1-mrits
39 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
DMDV_210
No ratings yet
DMDV_210
61 pages
Data Warehousing Record
No ratings yet
Data Warehousing Record
30 pages
DataMiningManual_Sawan
No ratings yet
DataMiningManual_Sawan
30 pages
Dwh Manual Merged
No ratings yet
Dwh Manual Merged
47 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
DMW Lab Manual
No ratings yet
DMW Lab Manual
42 pages
Lab Manual - DM
No ratings yet
Lab Manual - DM
56 pages
Deepak Dmbi File
No ratings yet
Deepak Dmbi File
40 pages
DMW lab Print
No ratings yet
DMW lab Print
21 pages
data mining and warehousing
No ratings yet
data mining and warehousing
30 pages
Data Mining (WEKA) en Formatted
No ratings yet
Data Mining (WEKA) en Formatted
52 pages
DWDM File
No ratings yet
DWDM File
26 pages
Lab Manual Format
No ratings yet
Lab Manual Format
37 pages
WEKA
No ratings yet
WEKA
50 pages
DWDM Lab Manual 7th Sem
No ratings yet
DWDM Lab Manual 7th Sem
45 pages
Lab Manual CSF346
No ratings yet
Lab Manual CSF346
21 pages
DWDM Print
No ratings yet
DWDM Print
20 pages
DMW_LabFile_0901CS243D11_swastik
No ratings yet
DMW_LabFile_0901CS243D11_swastik
25 pages
Lab Manual
No ratings yet
Lab Manual
24 pages
Datawarehousing Lab Manual
No ratings yet
Datawarehousing Lab Manual
22 pages
data warehousing record
No ratings yet
data warehousing record
26 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
Data Mining (WEKA) en
No ratings yet
Data Mining (WEKA) en
51 pages
JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
No ratings yet
JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
33 pages
after 11 (1)
No ratings yet
after 11 (1)
12 pages
DM Manual III-II
No ratings yet
DM Manual III-II
18 pages
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
No ratings yet
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
13 pages
DWM1
No ratings yet
DWM1
19 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Lab 12 Introduction To Rapidminer/Weka.: Objective
No ratings yet
Lab 12 Introduction To Rapidminer/Weka.: Objective
24 pages
Lecture 8 Data_Analytics_BI_Ghana
No ratings yet
Lecture 8 Data_Analytics_BI_Ghana
37 pages
Data Mining in Bioinformatics
No ratings yet
Data Mining in Bioinformatics
21 pages
Study On Data Governance at Dataeaze Systems (PDF - Io)
No ratings yet
Study On Data Governance at Dataeaze Systems (PDF - Io)
84 pages
The Relationships Between Definitions of Big Data, Business Intelligence and Business Analytics: A Literature Review
No ratings yet
The Relationships Between Definitions of Big Data, Business Intelligence and Business Analytics: A Literature Review
18 pages
Data Engineering Top 100 Questions
No ratings yet
Data Engineering Top 100 Questions
59 pages
Final LP-VI Lab Manual 23-24
No ratings yet
Final LP-VI Lab Manual 23-24
71 pages
Suraj Kumar
No ratings yet
Suraj Kumar
2 pages
BD Chapter 14
No ratings yet
BD Chapter 14
14 pages
KPMG Example Data Dictionary
No ratings yet
KPMG Example Data Dictionary
1,205 pages
Resume Test
No ratings yet
Resume Test
1 page
CV - Vrunda Shah - Data Scientist - 2.5 Years Experience
No ratings yet
CV - Vrunda Shah - Data Scientist - 2.5 Years Experience
2 pages
Business Intelligence - Chapter 3
No ratings yet
Business Intelligence - Chapter 3
72 pages
Chapter 3 Canva Summary
No ratings yet
Chapter 3 Canva Summary
23 pages
Ey Actuarial Data Management Brochure
100% (1)
Ey Actuarial Data Management Brochure
11 pages
ETL Tool Evaluation Guide
No ratings yet
ETL Tool Evaluation Guide
36 pages
DataStage How To Kick Start
100% (2)
DataStage How To Kick Start
133 pages
Jati Pratomo - AI RDTR
No ratings yet
Jati Pratomo - AI RDTR
15 pages
Resume Template - Harvard
No ratings yet
Resume Template - Harvard
20 pages
SAP BW On Hana Resume-2023
No ratings yet
SAP BW On Hana Resume-2023
4 pages
Dataengieer
No ratings yet
Dataengieer
23 pages
Ibm Infosphere Datastage Essentials (V11.5) - SPVC: Description
0% (1)
Ibm Infosphere Datastage Essentials (V11.5) - SPVC: Description
2 pages
SAP-Centric Draft Target Architecture: Washington State Shared Central Services (Ofm, Ga, Dop, Ost)
No ratings yet
SAP-Centric Draft Target Architecture: Washington State Shared Central Services (Ofm, Ga, Dop, Ost)
58 pages
AWS White Paper
No ratings yet
AWS White Paper
88 pages
2 - Data Science Tools
No ratings yet
2 - Data Science Tools
21 pages
Big Data Storage Platforms
No ratings yet
Big Data Storage Platforms
19 pages
Talend Practical OSDI Vol3
No ratings yet
Talend Practical OSDI Vol3
18 pages
Hariharan-CV - BI Developer - SQL Expert
No ratings yet
Hariharan-CV - BI Developer - SQL Expert
2 pages
Ramana ADF MSBI powerBI
No ratings yet
Ramana ADF MSBI powerBI
5 pages
Install Guide Pentaho Data Integration With MySQL Database
No ratings yet
Install Guide Pentaho Data Integration With MySQL Database
7 pages
Data Migration Good Document
No ratings yet
Data Migration Good Document
16 pages
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
From Everand
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Kim Chantala
No ratings yet
JavaScript Essentials For Dummies
From Everand
JavaScript Essentials For Dummies
Paul McFedries
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

FINAL DW Record PDF

Uploaded by

FINAL DW Record PDF

Uploaded by

HINDUSTHAN

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Register No: 720822103055

This is to Certified that this is the bonafide record of work done by

Staff Incharge Head of the Department

UNIVERSITY REGISTER NUMBER ………………………………………………….

Internal Examiner External Examiner

Data exploration and integration with

Apply weka tool for data validation

Plan the architecture for real time

Write the query for schema definition

Design data ware house for real time

Analyse the dimensional Modeling

Case study using OLAP

Case study using OTLP

Implementation of warehouse testing

EX.NO.:1 DATA EXPLORATION AND INTEGRATION WITH WEKA

WEKA - an open-source software provides tools for data preprocessing, implementation of

WEKA – Launching Explorer

On the top, you will see several tabs as listed here:

The data can be loaded from the following sources:

Loading Data from Local File System

Loading Data from Web

• The data section contains a comma separated list of data.

From the screenshot, you can infer the following points:

• The @attribute tag defines the attributes.

In the Selected Attribute subwindow, you can observe the following:

• The type for the temperature attribute is Nominal.

• The number of Missing values is zero.

• There are three distinct values with no unique value.

hot, mild and cold.

Open a Command Line Interface

<path to result file>

java weka.core.Instances merge C:\Users\Ram\Downloads\file1.csv C:\Users\Ram\Downloads\

Finished redirecting output to 'C:\Users\Ram\Downloads\results.csv'.

Now you can see results.csv or

EX.NO.:2 APPLY WEKA TOOL FOR DATA VALIDATION

To validate the data stored in data warehouse using Weka tool.

After loading your dataset

• Choose filters -> unsupervised -> instance -> RemoveDuplicates

• Compare the results as below

• Open labor.arff file using weka. ( which has missing values )

• Filters -> unsupervised -> attribute -> replaceMissingValuesWithUserConstants

• maximumAttributes – No of attributes we need after reduction.

• Set column number 6 to AttributeIndices.

EX.NO.:3 PLAN THE ARCHITECTURE FOR REAL TIME APPLICATION

EX.NO.:4 QUERY FOR SCHEMA DEFINITION

• Each dimension in a star schema is represented with only one-dimension table.

• Some dimension tables in the Snowflake schema are normalized.

FACT CONSTELLATION SCHEMA

STAR SCHEMA DEFINITION

define cube sales star [time, item, branch, location]:

dollars sold = sum(sales in dollars), units sold = count(*)

define dimension branch as (branch key, branch name, branch type)

SNOWFLAKE SCHEMA DEFINITION

dollars sold = sum(sales in dollars), units sold = count(*)

FACT CONSTELLATION SCHEMA DEFINITION

define cube sales [time, item, branch, location]:

dollars sold = sum(sales in dollars), units sold = count(*)

dollars cost = sum(cost in dollars), units shipped = count(*)

define dimension time as time in cube sales

FACT CONSTELLATION SCHEMA

EX.NO.:5 DESIGN DATA WARE HOUSE FOR REAL TIME APPLICATIONS

DATA CLEANSING AND TRANSFORMING

EX.NO:6 Analyse the dimensional modelling

To design and Analyse the dimensional modelling

Steps to Create Dimensional Data Modeling

Dimensional Data Modeling Steps

Advantages of Dimensional Data Modeling

Disadvantages of Dimensional Data Modeling

Thus the Analyse the dimensional modelling was written Successfully.

EX.NO:7 CASE STUDY USING OLAP

To write a case study using OLAP for whatsapp application.

• *Real-time Messaging*: Ensure instant message delivery and retrieval.

• *Scalability*: Support millions of concurrent users without performance degradation.

OLTP Features Used by WhatsApp

• Real-time Messaging: Ensure instant message delivery and retrieval.

• Scalability: Support millions of concurrent users without performance degradation.

1. Real-Time Data Processing

3. User Authentication and Management

1. High Volume of Transactions

- Challenge: WhatsApp handles billions of messages daily, leading to potential bottlenecks.

- Challenge: Users in different regions may experience varying network conditions.

3. Data Consistency Across Devices