Data Warehousing Lab Excercise ,110
Data Warehousing Lab Excercise ,110
No:1
Date:
Data exploration and integration with Weka
Aim:
To implement data exploration and integration with Weka
Procedure:
Step 1: Launch Weka Explorer
- Open Weka and select the "Explorer" from the Weka GUI Chooser.
Step 2: Load the dataset
- Click on the "Open file" button and select "datasets" > "iris.arff" from the Weka
installation directory. This will load the Iris dataset.
Step 3: To know more about the iris dataset, open iris.arff in notepad++ or in a similar tool
and read thecomments.
Step 4: Fill this tables:
Flower Type Count
Sepal length
Sepalwidth
Petallength
Petal width
Aim:
To implement data validation using Weka
Procedure:
Step 1: Launch Weka Explorer
- Open Weka and select the "Explorer" from the Weka GUI Chooser.
Step 2: Load the dataset
- Click on the "Open file" button and select "datasets" > "iris.arff" from the Weka
installation directory. This will load the Iris dataset.
Step 3: Split your data into training and testing sets. Under the "Classify" tab, click on the
"Choose" button next to the "Test options" area and select a testing method. Weka offers
options like cross-validation, percentage split, and user-defined test set. Configure the
options according to your needs.
Step 4: Select a classifier algorithm. Weka offers a wide range of algorithms for
classification, regression, clustering, and other tasks. Under the "Classify" tab, click on the
"Choose" button next to the "Classifier" area and choose an algorithm. Configure its
parameters, if needed.
Step 5: Click on the "Start" button under the "Classify" tab to run the training and testing
process. Weka will train the model on the training set and test its performance on the testing
set using the selected algorithm.
Validation Techniques:
Cross-Validation: Go to the "Classify" tab and choose a classifier. Then, under the "Test
options," select the type of cross-validation you want to perform (e.g., 10-fold cross-
validation). Click "Start" to run the validation.
Train-Test Split: You can also split your data into a training set and a test set. Use the
"Supervised" tab to train a model on the training set and evaluate its performance on the test
set.
Step 6: Evaluate the model's performance. Once the process finishes, Weka will display
various performance measures like accuracy, precision, recall, and ROC curve (for
classification tasks) or RMSE and MAE (for regression tasks). These measures can be found
in the "Result list" on the right side of the window.
Step 7: Analyze the results and interpret them. Examine the performance measures to assess
the model's quality and suitability for your dataset. Compare different models or validation
methods if you have tried more than one.
Step 8: Repeat steps 4-7 with different algorithms or validation methods if desired. This will
help you compare the performance of different models and choose the best one.
Output
Result:
Thus the simple data validation and testing dataset using Weka was implemented.
Ex.No:3
Date:
Plan the architecture for Real time application
AIM:
To plan the Web Services based Real time Data Warehouse Architecture
Procedure:
A web services-based real-time data warehouse architecture enables the integration of data from
various sources in near real-time using web services as a communication mechanism. Here's an
overview of such an architecture:
Data Sources: These are the systems or applications where the raw data originates from. They
could include operational databases, external APIs, logs, etc.
Web Service Clients (WS Client): These components are responsible for extracting data changes
from the data sources using techniques such as Change Data Capture (CDC) and sending them to
the web service provider. They make use of web servicecalls to transmit data.
Web Service Provider: The web service provider receives data from the clients and processes
them for further integration into the real-time data warehouse. It decomposes the received data,
performs necessary transformations, generates SQL statements, and interacts with the data
warehouse for insertion.
This is a web service that receives data from the WS Client and adds it to the Real- Time
Partition. It decomposes the received Data Transfer Object into data and metadata. It then uses
metadata to generate SQL via an SQL-Generator to insert the data into RTDW log tables and
executes the generated SQL on the RTDW database.
Metadata: Metadata describes the structure and characteristics of the data. In this context, it's
used by the Web Service Provider to generate SQL for inserting data into RTDW log tables.In a
web services-based architecture, metadata plays a crucial role in understanding data formats,
schemas, and transformations. It is often managed centrally to ensure consistency across the
system.
ETL (Extract, Transform, Load): ETL processes are employed to collect data from various
sources, transform it into a consistent format, and load it into the data
warehouse. In a real-time context, this process may involve continuous or near real- time
transformations to ensure that data is available for analysis without significant delays.
Real-Time Partition: This is a section of the data warehouse dedicated to storing real-time or
near real-time data. It may utilize techniques such as in-memorydatabases or specialized storage
structures optimized for high-speed data ingestion and query processing. There are three stages:
Real-Time Data Integration: This component facilitates the integration of real- time data into
the data warehouse. It ensures that data from various sources are combined seamlessly and made
available for analysis in real-time or near real-time.
Query Interface: Users interact with the system through a query interface, which could be a
web-based dashboard, API endpoints, or other client applications. The query interface allows
users to retrieve and analyze data stored in the data warehouse, including both historical and
real-time data.
Thus the web services based real time data warehouse application has been studied successfully
Ex.No:4
Date:
Write the Query for Schema Definition
Ex.No.4.1 Query for Star schema using SQL Server Management Studio
Aim:
To execute and verify query for star schema using SQL Server Management Studio
Procedure:
Step 1: Install SQLEXPR and SQLManagementStudio
Step 2: Launch SQL Server Management Studio
Step 3: Create new database and write query for creating Star schema table
Step 4: Execute the query for schema
Step 5: Explore the database diagram for Star schema
Aim:
To execute and verify query for SnowFlake schema using SQL Server Management Studio
Procedure:
Step 1: Install SQLEXPR and SQLManagementStudio
Step 2: LaunchSQL Server Management Studio
Step 3: Create new database and write query for creating Star schema table
Step 4: Execute the query
Step 5: Explore the database diagram for SnowFlake schema
Step 6: Connect the Geography table with Salesperson & Product Geography key
Output
Result:
Thus the Query for SnowFlake Schema was created and executed successfully
Ex.No:5
Date:
Design Data Warehouse for Real Time Applications
Aim:
To design and execute data warehouse for real time application using SQL Server Management
Studio
Procedure:
Step 1: Launch SQL Server Management Studio
Step 2: Explore the created database
Step 3: 3.1 Right-click on the table name and click on the Edit top 200 rows option.
3.2. Enter the data inside the table or use the top 1000 rows option and enter the query.
Step 4: Execute the query, and the data will be updated in the table.
Step 5: Right-click on the database and click on the tasks option. Use the import data option to
import files to the database.
Sample Query
INSERT INTO dbo.person(first_name,last_name,gender) VALUES
('Kavi','S','M'), ('Nila','V','F'), ('Nirmal','B','M'), ('Kaviya','M','F');
Aim:
To analyze and execute data warehouse for real time application using SQL Server Management
Studio
Procedure:
DimensionalTables
Date Dimension:
Product Dimension:
OrderDimension:.
Customer Dimension:
.
Promotion Dimension:
Warehouse Dimension:
Aim:
To evaluate the implementation and impact of OLAP technology in a real-world business
context, analyzing its effectiveness in enhancing data analysis, decision-making, and overall
operational efficiency.
Introduction:
OLAP stands for On-Line Analytical Processing. OLAP is a classification of
software technology which authorizes analysts, managers, and executives to gain insight into
information through fast, consistent, interactive access in a wide variety of possible views of
data that has been transformed from raw information to reflect the real dimensionality of the
enterprise as understood by the clients .It is used to analyze business data from different
points of view. Organizations collect and store data from multiple data sources, such as
websites, applications, smart meters, and internal systems.
Methodology
OLAP (Online Analytical Processing) methodology refers to the approach and techniques
used to design, create, and use OLAP systems for efficient multidimensional data analysis. Here
are the key components and steps involved in the OLAP methodology:
1. Requirement Analysis:
The process begins with understanding the specific analytical requirements of the
users. Analysts and stakeholders define the dimensions, measures, hierarchies, and data sources
that will be part of the OLAP system. This step is crucial to ensure that the OLAP system meets
the business needs.
2. Dimensional Modeling:
Dimension tables are designed to represent attributes like time, geography, and
product categories. Fact tables contain the numerical data (measures) and the keys to
dimension tables.
3. Star Schema:
This is a common design in OLAP systems where the fact table is at the center, connected to
dimension tables.
Operations in OLAP
In OLAP (Online Analytical Processing), operations are the fundamental actions performed on
multidimensional data cubes to retrieve, analyze, and present data in a way that facilitates
decision-making and data exploration. The main operations in OLAP are:
2. Dice: Dicing is the process of selecting specific values from two or more dimensions to
create a subcube. It allows you to focus on a particular combination of attributes. For
example, you can dice the cube to view sales data for a specific product category and region
within a certain time frame.
3. Roll-up (Drill-up): Roll-up allows you to move from a more detailed level of data to a
higher-level summary. For instance, you can roll up from daily sales data to monthly or yearly
sales data, aggregating the information.
4. Drill-down (Drill-through): Drill-down is the opposite of roll-up, where you move from
a higher-level summary to a more detailed view of the data. For example, you can drill
down from yearly sales data to quarterly, monthly, and daily data, getting more granularity.
5. Pivot (Rotate): Pivoting involves changing the orientation of the cube, which means
swapping dimensions to view the data from a different perspective. This operation is useful for
exploring data in various ways.
6. Slice and Dice: Combining slicing and dicing allows you to select specific values from
different dimensions to create subcubes. This operation helps you focus on a highly specific
subset of the data.
7. Drill-across: Drill-across involves navigating between cubes that are related but have
different dimensions or hierarchies. It allows users to explore data across different OLAP cubes.
8. Data Filtering: In OLAP, you can filter data to view only specific data points or subsets
that meet certain criteria. This operation is useful for narrowing down data to what is most
relevant for analysis.
Slice
Dice
Roll Up
Pivot
Drill Down
3. Data Loading:
Load the integrated and preprocessed transaction data into the OLAP cube. Ensure that the
cube is regularly updated to reflect the most recent data.
4. OLAP Cube Design:
Define hierarchies and relationships within the cube to enable effective analysis. For instance,
you might have hierarchies that allow drilling down from product categories to individual
products.
5. Market Basket Analysis:
Although OLAP cubes are not designed for direct market basket analysis, they can
facilitate it in several ways:
Conclusion
OLAP is a powerful technology for businesses and organizations seeking data insights,
informed decisions, and performance improvement. It enables multidimensional data
analysis, especially in complex, data-intensive environments. It is a crucial technology for
organizations seeking to gain insights from their data and make informed decisions. It
empowers businesses to analyze data efficiently and effectively, offering a competitive
advantage in today's data-driven world.
Ex.No:8
Date:
Case Study Using OLTP
Aim:
Develop an OLTP system that enables the e-commerce company to process a high volume of
online orders, track inventory, manage customer information, and handle financial
transactions in real-time, ensuring data integrity and providing a seamless shopping
experience for customers.
Introduction:
In today's digital age, businesses across various industries are relying heavily on technology to
streamline their operations and provide seamless services to their customers. One crucial
aspect of this technological transformation is the development and implementation of
efficient Online Transaction Processing (OLTP) systems. This case study delves into the
design and implementation of an OLTP system for a fictional e-commerce company,
"TechTrend Electronics," and examines the key considerations, challenges, and aims
associated with such a project.
This case study aims to showcase the process of developing an OLTP system tailored to
TechTrend Electronics' unique requirements. The objective is to ensure that the company can
efficiently handle a multitude of real-time transactions while maintaining data accuracy and
providing a seamless shopping experience for its customers.
Methodology:
The methodology for developing an OLTP (Online Transaction Processing) system for a case
study involves a systematic approach to designing, implementing, and testing the system.
Below is a step-by-step methodology for creating an OLTP system for a case study, using the
fictional e-commerce company "Tech Trend Electronics" as an example:
1. Database Design:
Develop a well-structured relational database schema that aligns with the business
requirements.
Normalize the data to eliminate redundancy and ensure data consistency.
Create entity-relationship diagrams and define data models for key entities like customers,
products, orders, payments, and inventory.
2.Technology Selection:
Choose appropriate technologies for the database management system (e.g., MySQL,
PostgreSQL, Oracle) and programming languages (e.g., Java, Python, C#) for the OLTP
system.
Evaluate and select suitable frameworks, libraries, and tools that align with the chosen
technologies.
3. System Architecture:
Design the system's architecture, which may include multiple application layers, a web
interface, and a database layer.
Implement a layered architecture, separating concerns for scalability, maintainability, and
security.
Conclusion:
In conclusion, OLTP systems play a pivotal role in modern business operations, facilitating
real-time transaction processing, data integrity, and customer interactions. These systems are
designed for high concurrency, low-latency, and consistent data access, making them
essential for day-to-day operations in various industries, such as finance, e-commerce,
healthcare, and more.
Overall, OLTP systems are the backbone of modern business operations, ensuring the
seamless execution of day-to-day transactions and delivering a positive customer experience.
Ex.No:9
Date:
Implementation of Warehouse Testing.
Aim:
To perform load testing using JMeter and interact with a SQL Server database using SQL
Management Studio, you'll need to set up JMeter to send SQL queries to the database
and collect the results for analysis.
Procedure:
1. Install Required Software:
Install JMeter: Download and install JMeter from the official Apache JMeter website.
Install SQL Server and SQL Management Studio: If you haven't already, set up SQL
Server and SQL Management Studio to manage your database.
2. Create a Test Plan in JMeter:
Launch JMeter and create a new Test Plan.
3. Add Thread Group:
Add a Thread Group to your Test Plan to simulate the number of users and requests.
4. Add JDBC Connection Configuration:
Add a JDBC Connection Configuration element to your Thread Group. Configure it
with the database connection details, such as the JDBC URL, username, and password.
This element will allow JMeter to connect to your SQL Server database.
5. Add a JDBC Request Sampler:
6. Add Listeners:
□ Add listeners to your Test Plan to collect and view the test results. Common
listeners include View Results Tree, Summary Report, and Response Times
Over Time.
7. Configure Your Test Plan:
Configure the number of threads (virtual users), ramp-up time, and loop count in the
Thread Group to simulate the desired load.
8. Run the Test:
Start the test by clicking the "Run" button in JMeter.
Conclusion
Using JMeter in conjunction with SQL Management Studio can be a powerful
combination for load testing and performance analysis of applications that rely on SQL
Server databases. This approach allows you to simulate a realistic user load, send SQL
queries to the database, and evaluate the system's performance under various conditions.
JMeter in combination with SQL Management Studio provides a robust solution for
assessing the performance of applications that rely on SQL Server databases. Through thorough
testing, analysis, and optimization, you can ensure your application is capable of delivering a reliable
and responsive experience to users even under heavy load conditions.