SlideShare a Scribd company logo
B9DA102 Data Storage Solutions for Data Analytics
DATA STORAGE
ASSIGNMENT
Submitted by Trushita Prashant Redij
Student No: 10504099
1
INDEX
Table of Contents
1.Introduction to Data Warehouse
1.1 Business Intelligence ..................................................................................................2
1.2 Data Warehouse ..........................................................................................................2
2. IOWA Liquor Sales Analysis
2.1 Why IOWA Liquor Sales Analysis.............................................................................3
2.2 Data Warehouse Visions and Goals............................................................................3
2.3 Source Data Table Description ............................................................................................3
2.4 Area of Analysis ..................................................................................................................4
2.5 Key Stakeholders .................................................................................................................4
2.6 Tools Used...................................................................................................................4
2.7 Data Warehouse Architecture......................................................................................5
3. Data Model of Data Warehouse
3.1 Designing Data warehouse...............................................................................................8
4. Data warehouse Implementation
4.1 Extract, Transform and Load..........................................................................................10
5.SSRS Reports and Visualization
5.1 SSRS Report...................................................................................................................13
5.2 Tableau Visualization ....................................................................................................16
6.XML, XSD and DTD………………………………………………………………...…....20
7. Neo4J and Cypher Queries
7.1 Relation between Nodes .................................................................................................25
7.2 Relation between Multiple Nodes..................................................................................25
8. Bibliography ........................................................................................................................29
2
1. Introduction
In recent years, there has been a colossal upsurge in the business scenarios affecting today’s
marketplace which has fostered the need of data analysis and data warehousing in every
organization.
Data generated by organization is effectively used to analyse, extract, transform and derive
knowledgeable insights thereby influencing the organizations business decisions.
1.1 BUSINESS INTELLIGENCE:
Business Intelligence is an act of transforming raw data into useful information for business
analysis. BI based on Data Warehouse technology extracts information from a company’s
operational systems. The data is transformed (cleaned and integrated) and loaded into Data
Warehouses. Since Data is credible it is used for business insights.
1.2 DATA WAREHOUSE:
Data collected from various sources and stored in various databases cannot be directly
visualized. The data first needs to be integrated and processed before visualization take place.
Data warehouse is a central location where consolidated data from multiple locations
(databases) are stored. It’s maintained separately from an organizations operational database.
Data
Data
Data
INFORMATION
3
2 Iowa Liquor Sales Analysis
2.1 Why IOWA Liquor Sales?
IOWA Liquor Sales data set was released by IOWA state government department of
commerce. It highlights the product details and date of purchase of spirits holding class
“E” Liquor licenses. The size of dataset is 3.2 GB thus can be used effectively for data
modelling.
There 55,564 rows capturing each purchase order. Each observation beholds 18 features
describing about the product, vendor, invoice and store.
2.2 Data Warehouse Vision and Goals
Vision
Utilize the accessible data and provide insights on Liquor sales in different cities, most
popular brands and types of liquor sold, price distribution and variance across different
stores.
Goals
 Study trends of data and provide answers to strategic questions
 To provide subject oriented analysis wherein data is categorized and stored by
business subject rather than by application.
 Integrate data from disparate sources and store in single place
 To ensure the data is non-volatile and time variant.
2.3 Source Data Tables Description
Invoice_Source: Contains details about the purchase order.
Category_Table: Illustrates the liquor brands.
Vendor_Table: Summarizes vendor details with vendor number
Store_Table: Describes store details like address, city, zip code, store number etc.
Item_Table: Contains details about Liquor products like number, ID.
4
2.4 Area of Analysis
As Liquor brands and vendors are the key stakeholders of the Liquor business, it is prominent
for a store to assess the liquor sales. We can analyse and assess the liquor sales according to
the geographical locations of the stores, liquor brands purchased, amount of sales generated
in dollars, volume of liquor, availability of vendor to supply liquor bottles.
2.5 Key Stakeholders
Vendor: Team which deals with the supply of the liquor of different brands to the stores.
Store keeper: Update and records, and monitors stock of the Liquor bottles in the store
Customer: Purchases the Liquor from the store.
2.6 Tools Used
 SQL Server Management Studio
 SQL Server Integration Services
 SQL Server Reporting Services
 R Studio
 Tableau
2.7 Data Warehouse Architecture
Approach used: Kimball’s bottom up approach
Advantages:
 Data is easily available and performance is high since the source is an operational
system.
 Dimensional modelling is used wherein the data is combined from multiple sources,
cleaned and normalized.
 Faster execution of queries.
 Use of Star schema helps is building agile, decentralized warehouse.
 Data modelling using business intelligence tools assist users to access data easily.
5
Steps used in dimensional modelling:
 Selecting Business process
 Declaring Grain
 Identifying Dimensions
 Designing Fact Table
Grain:
Amount of Liquor bottles sold by the store or purchased from the vendor which are
highlighted in the fact table as measurable unit. There are five dimension tables
corresponding to the measurable units mentioned in Fact table.
STAGING AREA: Data has been fetched from different data sources and cleaned wherein
the duplicate and null values are truncated or fixed.
Invoice
Category
Store
Vendor
Item
ETL Star
Schema/Cub
e
Business
Queries using
Tableau
Loading of Data
Cleaning of
Data
dimensional
modelling
Building Data
warehouse
Staging
Area
Data Visualization
IOWA LIQUOR SALES DATA WAREHOUSE ARCHITECTURE
6
3. DATA MODEL of DATA WAREHOUSE
STAR SCHEMA: Schema is a logical description of entire database. It gives details about
constraints placed on the tables, the values present, how the key values are linked between
the different tables.
Star schema plays a fundamental role in data warehouse which contains fact table and
dimension table. Fact table contains all measurable values and foreign keys.
Dimension Table: Primary keys in dimension table are designated as foreign keys in fact
table. They describe about the details about business processes.
Advantages of Star Schema:
 It’s a steady, recoverable and reinforce able approach to build OLAP cube.
 Extraction, Transformation and loading can be performed swiftly.
 Has minimum number of foreign keys and complexity.
Star Schema for IOWA Liquor Sales
7
3.1 Designing the Data warehouse:
DIMENSION TABLES
ITEM_DIM:
This dimension contains product Id as primary key, product number and
product description. All the measurable values like item sold, item sold in
volume, number of bottles sold are mentioned in fact table.
CATEGORY_DIM:
This dimension consist the brand category ID as primary key, category
number and category name.
INVOICE_DIM:
This dimension contains the invoice Id, invoice Item no, invoice date,
bottle sold, sale is dollars, volume sold in litre, volume sold in gallons.
VENDOR_DIM:
This dimension consists of vendor Id(primary key), vendor number and
vendor name.
STORE_DIM:
This dimension contains store Id as primary key, store number and store
name
Dimension tables are interlinked and all the primary keys in dimension
tables are declared as foreign key in Fact table.
FACT TABLE
LIQUOR_FACT:
It comprises of records related to Invoice, Vendor, Item, Category and
Store that are maneuvered to process business queries.
It contains two types of attributes foreign keys and measures
8
4. Data warehouse Implementation
Create STORE_DIM Table:
CREATE TABLE STORE_DIM
(
STORE_ID INT PRIMARY KEY NOT NULL,
STORE_NO INT,
STORE_NAME NVARCHAR(50),
ADRRESS NVARCHAR(50),
STORE_CITY NVARCHAR(50),
STORE_COUNTY NVARCHAR(50),
STORE_COUNTY_NO INT,
STORE_ZIP NVARCHAR(50)
)
Create CATEGORY_DIM Table:
CREATE TABLE CATEGORY_DIM
(
CATEGORY_ID INT PRIMARY KEY NOT NULL,
CATEGORY NVARCHAR(50),
CATEGORY_NAME NVARCHAR(50)
)
Create VENDOR_DIM Table:
CREATE TABLE VENDOR_DIM
(
VENDOR_ID INT PRIMARY KEY NOT NULL,
VENDOR_NO INT,
VENDOR_NAME NVARCHAR(50)
)
Create ITEM_DIM Table:
CREATE TABLE ITEM_DIM
(
ITEM_ID INT PRIMARY KEY NOT NULL,
ITEM_NO INT ,
ITEM_DESCRIPTION NVARCHAR(100),
CATEGORY_ID INT NOT NULL,
VENDOR_ID INT NOT NULL
)
9
Create INVOICE_DIM Table:
CREATE TABLE INVOICE_DIM
(
INVOICE_ID INT PRIMARY KEY NOT NULL,
ITEM_ID INT NOT NULL,
STORE_ID INT NOT NULL,
INVOICE_ITEM_NO NVARCHAR(50),
INVOICE_DATE NVARCHAR(50) ,
BOTTLE_SOLD INT,
SALES_DOLLARS NVARCHAR(50),
VOL_SOLD_LITRES FLOAT,
VOL_SOLD_GALLONS FLOAT
)
Create LIQUOR_FACT Table:
CREATE TABLE LIQUOR_FACT
(
INVOICE_ID INT NOT NULL,
STORE_ID INT NOT NULL ,
VENDOR_ID INT NOT NULL,
ITEM_ID INT NOT NULL,
CATEGORY_ID INT NOT NULL,
PACK_SIZE INT NOT NULL,
TOTAL_BOTTLE_SOLD INT NOT NULL,
TOTAL_SALES NVARCHAR(50) NOT NULL ,
TOTAL_VOL_SOLD_LITRES NVARCHAR(50) NOT NULL
)
use lowa_liquor_sales_star
alter table LIQUOR_FACT add constraint FK_STORE_ID Foreign Key (STORE_ID) references
STORE_DIM (STORE_ID)
alter table LIQUOR_FACT add constraint FK_VENDOR_ID Foreign Key (VENDOR_ID) references
VENDOR_DIM (VENDOR_ID)
alter table LIQUOR_FACT add constraint FK_INVOICE_ID Foreign Key (INVOICE_ID) references
INVOICE_DIM (INVOICE_ID)
alter table LIQUOR_FACT add constraint FK_ITEM_ID Foreign Key (ITEM_ID) references ITEM_DIM
(ITEM_ID)
alter table LIQUOR_FACT add constraint FK_CATEGORY_ID Foreign Key (CATEGORY_ID) references
CATEGORY_DIM (CATEGORY_ID)
10
4.1 Extract, Transform and Load
Data warehouse construction involves extraction of data from different sources,
transforming it and loading it onto data warehouse.
SSIS and SSMS are used to perform the ETL task.
Extraction:
Data has been extracted from source database containing five datasets and
lodged into the corresponding dimensions.
Source Database:
Iowa_Liquor_sales: This is the source database containing all the raw data of
Iowa Liquor sales which is loaded in five tables.
The extracted data has been cleaned and reconstructed into structured dataset
thereby serving as staging area prior to transformation and loading.
Transformation and Loading
In the process the data is converted into meaningful information.
All the dimension and fact table are populated with relevant data from staging
tables using SSIS tool.
Fact table is populated using the dimension tables. The primary key in
dimension table serves as foreign key in fact table thereby justifying our star
schema.
11
Populating the Dimension Tables using SSIS
VENDOR_DIM
ITEM_DIM
12
STORE_DIM
CATEGORY_DIM
POPULATING FACT TABLE USING SQL QUERY
USE [lowa_liquor_sales_star]
GO
INSERT INTO [dbo].[LIQUOR_FACT]
([INVOICE_ID]
,[STORE_ID]
,[VENDOR_ID]
,[ITEM_ID]
,[CATEGORY_ID]
,[TOTAL_BOTTLE_SOLD]
,[TOTAL_SALES]
,[TOTAL_VOL_SOLD_LITRES])
select
i.INVOICE_ID,i.STORE_ID,v.VENDOR_ID,it.ITEM_ID,c.CATEGORY_ID,i.BOTTLE_SOLD
,i.SALES_DOLLARS,i.VOL_SOLD_LITRES
from dbo.INVOICE_DIM as i
join dbo.ITEM_DIM as it on i.ITEM_ID=it.ITEM_ID
join dbo.STORE_DIM as s on i.STORE_ID=s.STORE_ID
join dbo.VENDOR_DIM as v on it.VENDOR_ID=v.VENDOR_ID
join dbo.CATEGORY_DIM as c on it.CATEGORY_ID=c.CATEGORY_ID
13
5. SSRS REPORTS and VISUALIZATION
5.1 SSRS Report
Analysis: Report depicts information about invoice generated with Invoice ID
on a particular date in Liquor store.
Query:
select a.INVOICE_ID, b.STORE_NAME, a.INVOICE_DATE,b.STORE_CITY
from INVOICE_DIM a, STORE_DIM b
where a.STORE_ID = b.STORE_ID
ORDER BY
b. STORE_CITY
14
Analysis: Report illustrates the information about the product highlighting the
availability with the product vendor. It comprises Item Id, Vendor Id, Item
description, Vendor Name.
Query:
select a.ITEM_ID ,b.VENDOR_ID, a.ITEM_DESCRIPTION ,b.VENDOR_NAME
From ITEM_DIM a, VENDOR_DIM b
where a.ITEM_ID=b.ITEM_ID
order by
b.VENDOR_NAME
15
Analysis:
The above report highlights the details about invoice generated comprising of
Invoice Id, Bottle sold, Volume in litre, Sales in dollars
16
5.2 TABLEAU VISUALIZATION:
CASE STUDY OF LIQUOR VENDOR
 The above packed bubble figure illustrates the vendors having largest
business in Iowa State.
 The figure depicts that Diageo Americas are having maximum sales in
Iowa State.
 Jim Beam Brands, Bacardi USA, Inc, Luxco St Loius, Sazerac
CO,Inc, Laird and Company are the strong competitors in the liquor
business.
 It provides considerable input to vendors to develop and innovate new
strategies to elevate their business inventories in the Liquor industry
 It also can assist to view the top ten vendors in the liquor industry in
the state of Iowa.
17
CASE STUDY OF BOTTLES SOLD ACCORDING TO LIQOUR BRANDS
 The above pie chart illustrates the amount of bottles sold according to the
corresponding liquor brands.
 The chart clearly depicts that Vodka 80 is the highest selling liquor brand
in the city of Iowa.
 Blended Whiskies, Spiced Rum, Canadian Whiskies are the prominent
competitors in the Liquor brands industry.
 Also ,it gives brief overview of the customers preferences of Liquor
brands.
 This will assist the vendors and store to strategize their business
inventory in order to
increase their sales in Liquor industry.
18
BUSINESS CASE STUDY OF VENDOR AND SALES
 The above horizontal bar graph illustrates the liquor sales for the
corresponding Liquor vendor.
 It highlight’s the liquor brand selling highest amount of Liquor bottles for
a particular vendor.
 For e.g. Tenessa Whiskes is the highest selling Liquor brand by Brown
Farmen Corporation.
 The above illustration assists the vendors to strategize their business and
achieve profits by selling the brands that are in high demand.
19
CASE STUDY OF SALES vs CATEGORY of LIQUOR BOTTLES
 The above bar graph demonstrates the Sales of Liquor bottles according
to the corresponding brand names.
 It gives the details about the bottles cost, Sales in dollars against the
category of the liquor brand.
 The graph assists in briefing the vendor about the sales of the bottles
according to the brands in detail.
20
6. eXtensible Markup Language, XSD and DTD
XML :
eXtensible markup language is abbreviated as XML.
It is used to store data in text format which helps in storing, sharing and
accessing data on different platforms.
Its widely used in web development and acts complimentary to HTML.
XSD:
It’s a formal representation to describe XML.
It can be used for verification of the documents by the programmers.
DTD:
DTD stands for Document Type Definition.
It comprises the structure, legal elements and attributes of XML document
21
XML Document:
<CATEGORY>
<CATEGORY category_id="1">
<CATEGORY_DIM>
<category>1011100</category>
<category_name>BLENDED WHISKIES</category_name>
</CATEGORY_DIM>
</CATEGORY>
<CATEGORY category_id="2">
<CATEGORY_DIM>
<category>1011200</category>
<category_name>STRAIGHT BOURBON WHISKIES</category_name>
</CATEGORY_DIM>
</CATEGORY>
<CATEGORY category_id="3">
<CATEGORY_DIM>
<category>1011300</category>
<category_name>TENNESSEE WHISKIES</category_name>
</CATEGORY_DIM>
</CATEGORY>
<CATEGORY category_id="4">
<CATEGORY_DIM>
<category>1011500</category>
<category_name>STRAIGHT RYE WHISKIES</category_name>
</CATEGORY_DIM>
</CATEGORY>
<CATEGORY category_id="5">
<CATEGORY_DIM>
<category>1012100</category>
<category_name>CANADIAN WHISKIES</category_name>
</CATEGORY_DIM>
</CATEGORY>
</CATEGORY>
22
XSD :
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="TRUSHI">
<xs:complexType>
<xs:sequence>
<xs:element ref="CATEGORY"/>
<xs:element ref="INVOICE"/>
<xs:element ref="ITEM"/>
<xs:element ref="STORE"/>
<xs:element ref="VENDOR"/>
<xs:element ref="LIQUOR_FACT"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="CATEGORY">
<xs:complexType>
<xs:sequence minOccurs="0">
<xs:element ref="CATEGORY"/>
<xs:element ref="CATEGORY_DIM"/>
</xs:sequence>
<xs:attribute name="category_id" type="xs:integer"/>
</xs:complexType>
</xs:element>
<xs:element name="CATEGORY_DIM">
<xs:complexType>
<xs:sequence>
<xs:element ref="category"/>
<xs:element ref="category_name"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="category" type="xs:integer"/>
<xs:element name="category_name" type="xs:string"/>
<xs:element name="INVOICE">
<xs:complexType>
<xs:sequence minOccurs="0">
<xs:element ref="INVOICE"/>
<xs:element ref="INVOICE_DIM"/>
</xs:sequence>
<xs:attribute name="invoice_id" type="xs:integer"/>
</xs:complexType>
</xs:element>
<xs:element name="INVOICE_DIM">
<xs:complexType>
<xs:sequence>
23
DTD:
<?xml encoding="UTF-8"?>
<!ELEMENT TRUSHI (CATEGORY,INVOICE,ITEM,STORE,VENDOR,LIQUOR_FACT)>
<!ATTLIST TRUSHI
xmlns CDATA #FIXED ''>
<!ELEMENT CATEGORY (CATEGORY,CATEGORY_DIM)?>
<!ATTLIST CATEGORY
xmlns CDATA #FIXED ''
category_id CDATA #IMPLIED>
<!ELEMENT CATEGORY_DIM (category,category_name)>
<!ATTLIST CATEGORY_DIM
xmlns CDATA #FIXED ''>
<!ELEMENT category (#PCDATA)>
<!ATTLIST category
xmlns CDATA #FIXED ''>
<!ELEMENT category_name (#PCDATA)>
<!ATTLIST category_name
xmlns CDATA #FIXED ''>
<!ELEMENT INVOICE (INVOICE,INVOICE_DIM)?>
<!ATTLIST INVOICE
xmlns CDATA #FIXED ''
invoice_id CDATA #IMPLIED>
<!ELEMENT INVOICE_DIM (item_id,store_id,invoice_item_id,invoice_date,
bottle_sold,sale_dollars,vol_sold_litres,
vol_sold_gallons)>
<!ATTLIST INVOICE_DIM
xmlns CDATA #FIXED ''>
<!ELEMENT item_id (#PCDATA)>
<!ATTLIST item_id
xmlns CDATA #FIXED ''>
<!ELEMENT store_id (#PCDATA)>
<!ATTLIST store_id
xmlns CDATA #FIXED ''>
<!ELEMENT invoice_item_id (#PCDATA)>
<!ATTLIST invoice_item_id
xmlns CDATA #FIXED ''>
<!ELEMENT invoice_date (#PCDATA)>
<!ATTLIST invoice_date
xmlns CDATA #FIXED ''>
24
7. Neo4J and Cypher Queries
Step 1: Load all the .csv files onto Neo4j workspace and Index the files
Step 2: Create the relationship between the nodes
Upload Category Dimension File
load csv with headers from "file:///Category_dim.csv" as
row
create (CT:Category)
set
CT=row{CATEGORY_ID:row.CATEGORY_ID,CATEGORY_NAME:row.CATEGO
RY_Name}
return CT
25
Upload Item Dimension File
Upload Invoice Dimension File:
Upload Liquor Fact File:
load csv with headers from "file:///Item_dim.csv" as
row
create (IT:Item)
set
IT=row{ITEM_ID:row.ITEM_ID,ITEM_NO:row.ITEM_NO,ITEM_DES
CRIPTION:row.ITEM_DESCRIPTION,CATEGORY_ID:row.CATEGORY_
ID,VENDOR_ID:row.VENDOR_ID}
return IT
load csv with headers from "file:///Invoice_dim.csv" as row
create (INV:Invoice)
set
INV=row{INVOICE_ID:row.INVOICE_ID,ITEM_ID:row.ITEM_ID,STORE_ID:ro
w.STORE_ID,INVOICE_ITEM_NO:row.INVOICE_ITEM_NO,
INVOICE_DATE:row.INVOICE_DATE,BOTTLE_SOLD:row.BOTTLE_SOLD,
SALES_DOLLARS:row.SALES_DOLLARS,VOL_SOLD_LITRES:row.VOL_SOLD_LITR
ES,
VOL_SOLD_GALLONS:row.VOL_SOLD_GALLONS}
return INV
load csv with headers from "file:///Liquor_fact_dim.csv" as row
create (FA:LiquorSale)
set
FA=row{TOTAL_SALES:row.TOTAL_SALES,CATEGORY_ID:row.CATEGORY_ID,IN
VOICE_ID:row.INVOICE_ID,ITEM_ID:row.ITEM_ID}
return FA
26
7.1 RELATION BETWEEN NODES
Relationship between category and Liquor fact table
Relationship between category and Liquor fact table
MATCH t=()-[r:has_CATEGORY_ID]->() RETURN t LIMIT 25
MATCH t=()-[r:has_INVOICE_ID]->() RETURN t LIMIT 25
27
7.2 Relation between all multiple nodes:
Explanation:
The above diagram illustrates the cardinality of relationship between
multiple nodes.
 The above Neo 4 j graph depicts the relationship between
Category, Invoice and Liquor fact table.
 The circle corresponds to the nodes and line corresponds to
relationship.
 Category node is represented by orange colour, Invoice is
represented by blue colour and liquor fact node is represented by
red colour.
28
Code for relationship between multiple nodes
Graph Database and Relational Database:
 Graph database are indexed and consists of nodes and
relationships
 Energy consumed is more as compared to relational databases.
 Captured data can be changed with ease in graph database
 Traversal is used to join relations in graphical database.
match(CT:Category),(FA:LiquorSale) where
CT.CATEGORY_ID=FA.CATEGORY_ID create(FA)
[r:has_CATEGORY_ID]->(CT) return CT,FA,r
match(a:Artist),(as:ArtistSale) where a.ArtistKey=as.ArtistKey
create(as)[r:has_artistkey]->(a) return a,as,r
match(INV:Invoice),(FA:LiquorSale) where
INV.INVOICE_ID=FA.INVOICE_ID create(FA)
[r:has_INVOICE_ID]->(INV) return INV,FA,r
match(FA:LiquorSale),(CT:Category), (INV:Invoice) where
CT.CATEGORY_ID=FA.CATEGORY_ID
AND INV.INVOICE_ID=FA.INVOICE_ID return CT,FA,INV
29
8. Bibliography
Kimball, R. The Data Warehouse Toolkit. John Wiley, 1996
XML and XSD Validation link
http://www.utilitiesonline.info/xsdvalidation/#.XBqEuVz7TIV
Department of Commerce, IOWA State government.
Available at:https://data.iowa.gov/Sales-Distribution/Iowa-Liquor-Sales/m3tr-
qhgy

More Related Content

PDF
Sales Data Analysis using SQL & Excel
PDF
Đề tài vận dụng mô hình binary logistic, ĐIỂM 8, RẤT HAY
PPTX
Mcmix_ThoTruong_2015
DOC
Đề tài Chiến lược CRM đề xuất cho mảng dịch vụ gọi xe của Grab tại Việt Nam
PDF
Hoàn thiện hệ thống quản lý chất lượng theo tiêu chuẩn toàn cầu về an toàn th...
DOC
Phân tích chiến lược kinh doanh của công ty cổ phần sữa việt nam vina...
PDF
Document Imaging and the SAP Content Server 101
PPTX
SAP BW - Master data load via flat file
Sales Data Analysis using SQL & Excel
Đề tài vận dụng mô hình binary logistic, ĐIỂM 8, RẤT HAY
Mcmix_ThoTruong_2015
Đề tài Chiến lược CRM đề xuất cho mảng dịch vụ gọi xe của Grab tại Việt Nam
Hoàn thiện hệ thống quản lý chất lượng theo tiêu chuẩn toàn cầu về an toàn th...
Phân tích chiến lược kinh doanh của công ty cổ phần sữa việt nam vina...
Document Imaging and the SAP Content Server 101
SAP BW - Master data load via flat file

What's hot (6)

PDF
Nearline storage for sap bw (nls)
PDF
Văn hoá Doanh Nghiệp tại ngân hàng Thương mại Cổ phần Bưu điện Liên Việt
PPTX
Heart Disease Prediction Analysis - Sushil Gupta.pptx
DOCX
Đánh giá mức độ hài lòng của khách hàng đối với dịch vụ của hệ thống rạp chiế...
DOCX
Danh Sách Các Đề Tài Tiểu Luận Môn Tư Duy Sáng Tạo Được Tổng Hợp Từ Sinh Viên
PDF
Phương pháp nghiên cứu khoa học - Business Research Methods - Lê Văn Huy - Le...
Nearline storage for sap bw (nls)
Văn hoá Doanh Nghiệp tại ngân hàng Thương mại Cổ phần Bưu điện Liên Việt
Heart Disease Prediction Analysis - Sushil Gupta.pptx
Đánh giá mức độ hài lòng của khách hàng đối với dịch vụ của hệ thống rạp chiế...
Danh Sách Các Đề Tài Tiểu Luận Môn Tư Duy Sáng Tạo Được Tổng Hợp Từ Sinh Viên
Phương pháp nghiên cứu khoa học - Business Research Methods - Lê Văn Huy - Le...
Ad

Similar to Iowa liquor sales (20)

PPT
Data warehousing
PDF
Data warehouse project on retail store
PPT
Intro to Data warehousing lecture 15
PPT
Ch1 data-warehousing
PPT
Ch1 data-warehousing
PDF
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
PPTX
158001210111bapan data warehousepptse.pptx
PPT
bich-2.ngjfyjdkzxzkckzxzkxzkxkgxjgyityutxjgyutxppt
PPTX
module 1 DWDM (complete) chapter ppt.pptx
PDF
Dwbi Project
PPT
20IT501_DWDM_PPT_Unit_I.ppt
PDF
BI Chapter 03.pdf business business business business business business
PDF
DMDW 1st module.pdf
PDF
Data Warehousing
PPT
20IT501_DWDM_PPT_Unit_I.ppt
DOC
Dwdm unit 1-2016-Data ingarehousing
PPTX
Data warehouse presentaion
PPTX
Data Warehouse: Concepts and Architecture
PDF
DATA WAREHOUSING AND DATA MINING (R18A0524).pdf
PPT
Dataware housing
Data warehousing
Data warehouse project on retail store
Intro to Data warehousing lecture 15
Ch1 data-warehousing
Ch1 data-warehousing
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
158001210111bapan data warehousepptse.pptx
bich-2.ngjfyjdkzxzkckzxzkxzkxkgxjgyityutxjgyutxppt
module 1 DWDM (complete) chapter ppt.pptx
Dwbi Project
20IT501_DWDM_PPT_Unit_I.ppt
BI Chapter 03.pdf business business business business business business
DMDW 1st module.pdf
Data Warehousing
20IT501_DWDM_PPT_Unit_I.ppt
Dwdm unit 1-2016-Data ingarehousing
Data warehouse presentaion
Data Warehouse: Concepts and Architecture
DATA WAREHOUSING AND DATA MINING (R18A0524).pdf
Dataware housing
Ad

Recently uploaded (20)

PDF
Mastering Query Optimization Techniques for Modern Data Engineers
PPTX
Data-Driven-Credit-Card-Launch-A-Wells-Fargo-Case-Study.pptx
PPTX
Understanding Prototyping in Design and Development
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Linux OS guide to know, operate. Linux Filesystem, command, users and system
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Data Science Trends & Career Guide---ppt
PPTX
咨询新西兰毕业证(UCOL毕业证书)联合理工学院毕业证国外毕业证
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Logistic Regression ml machine learning.pptx
Mastering Query Optimization Techniques for Modern Data Engineers
Data-Driven-Credit-Card-Launch-A-Wells-Fargo-Case-Study.pptx
Understanding Prototyping in Design and Development
Moving the Public Sector (Government) to a Digital Adoption
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Data Science Trends & Career Guide---ppt
咨询新西兰毕业证(UCOL毕业证书)联合理工学院毕业证国外毕业证
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Reliability_Chapter_ presentation 1221.5784
Major-Components-ofNKJNNKNKNKNKronment.pptx
.pdf is not working space design for the following data for the following dat...
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Logistic Regression ml machine learning.pptx

Iowa liquor sales

  • 1. B9DA102 Data Storage Solutions for Data Analytics DATA STORAGE ASSIGNMENT Submitted by Trushita Prashant Redij Student No: 10504099
  • 2. 1 INDEX Table of Contents 1.Introduction to Data Warehouse 1.1 Business Intelligence ..................................................................................................2 1.2 Data Warehouse ..........................................................................................................2 2. IOWA Liquor Sales Analysis 2.1 Why IOWA Liquor Sales Analysis.............................................................................3 2.2 Data Warehouse Visions and Goals............................................................................3 2.3 Source Data Table Description ............................................................................................3 2.4 Area of Analysis ..................................................................................................................4 2.5 Key Stakeholders .................................................................................................................4 2.6 Tools Used...................................................................................................................4 2.7 Data Warehouse Architecture......................................................................................5 3. Data Model of Data Warehouse 3.1 Designing Data warehouse...............................................................................................8 4. Data warehouse Implementation 4.1 Extract, Transform and Load..........................................................................................10 5.SSRS Reports and Visualization 5.1 SSRS Report...................................................................................................................13 5.2 Tableau Visualization ....................................................................................................16 6.XML, XSD and DTD………………………………………………………………...…....20 7. Neo4J and Cypher Queries 7.1 Relation between Nodes .................................................................................................25 7.2 Relation between Multiple Nodes..................................................................................25 8. Bibliography ........................................................................................................................29
  • 3. 2 1. Introduction In recent years, there has been a colossal upsurge in the business scenarios affecting today’s marketplace which has fostered the need of data analysis and data warehousing in every organization. Data generated by organization is effectively used to analyse, extract, transform and derive knowledgeable insights thereby influencing the organizations business decisions. 1.1 BUSINESS INTELLIGENCE: Business Intelligence is an act of transforming raw data into useful information for business analysis. BI based on Data Warehouse technology extracts information from a company’s operational systems. The data is transformed (cleaned and integrated) and loaded into Data Warehouses. Since Data is credible it is used for business insights. 1.2 DATA WAREHOUSE: Data collected from various sources and stored in various databases cannot be directly visualized. The data first needs to be integrated and processed before visualization take place. Data warehouse is a central location where consolidated data from multiple locations (databases) are stored. It’s maintained separately from an organizations operational database. Data Data Data INFORMATION
  • 4. 3 2 Iowa Liquor Sales Analysis 2.1 Why IOWA Liquor Sales? IOWA Liquor Sales data set was released by IOWA state government department of commerce. It highlights the product details and date of purchase of spirits holding class “E” Liquor licenses. The size of dataset is 3.2 GB thus can be used effectively for data modelling. There 55,564 rows capturing each purchase order. Each observation beholds 18 features describing about the product, vendor, invoice and store. 2.2 Data Warehouse Vision and Goals Vision Utilize the accessible data and provide insights on Liquor sales in different cities, most popular brands and types of liquor sold, price distribution and variance across different stores. Goals  Study trends of data and provide answers to strategic questions  To provide subject oriented analysis wherein data is categorized and stored by business subject rather than by application.  Integrate data from disparate sources and store in single place  To ensure the data is non-volatile and time variant. 2.3 Source Data Tables Description Invoice_Source: Contains details about the purchase order. Category_Table: Illustrates the liquor brands. Vendor_Table: Summarizes vendor details with vendor number Store_Table: Describes store details like address, city, zip code, store number etc. Item_Table: Contains details about Liquor products like number, ID.
  • 5. 4 2.4 Area of Analysis As Liquor brands and vendors are the key stakeholders of the Liquor business, it is prominent for a store to assess the liquor sales. We can analyse and assess the liquor sales according to the geographical locations of the stores, liquor brands purchased, amount of sales generated in dollars, volume of liquor, availability of vendor to supply liquor bottles. 2.5 Key Stakeholders Vendor: Team which deals with the supply of the liquor of different brands to the stores. Store keeper: Update and records, and monitors stock of the Liquor bottles in the store Customer: Purchases the Liquor from the store. 2.6 Tools Used  SQL Server Management Studio  SQL Server Integration Services  SQL Server Reporting Services  R Studio  Tableau 2.7 Data Warehouse Architecture Approach used: Kimball’s bottom up approach Advantages:  Data is easily available and performance is high since the source is an operational system.  Dimensional modelling is used wherein the data is combined from multiple sources, cleaned and normalized.  Faster execution of queries.  Use of Star schema helps is building agile, decentralized warehouse.  Data modelling using business intelligence tools assist users to access data easily.
  • 6. 5 Steps used in dimensional modelling:  Selecting Business process  Declaring Grain  Identifying Dimensions  Designing Fact Table Grain: Amount of Liquor bottles sold by the store or purchased from the vendor which are highlighted in the fact table as measurable unit. There are five dimension tables corresponding to the measurable units mentioned in Fact table. STAGING AREA: Data has been fetched from different data sources and cleaned wherein the duplicate and null values are truncated or fixed. Invoice Category Store Vendor Item ETL Star Schema/Cub e Business Queries using Tableau Loading of Data Cleaning of Data dimensional modelling Building Data warehouse Staging Area Data Visualization IOWA LIQUOR SALES DATA WAREHOUSE ARCHITECTURE
  • 7. 6 3. DATA MODEL of DATA WAREHOUSE STAR SCHEMA: Schema is a logical description of entire database. It gives details about constraints placed on the tables, the values present, how the key values are linked between the different tables. Star schema plays a fundamental role in data warehouse which contains fact table and dimension table. Fact table contains all measurable values and foreign keys. Dimension Table: Primary keys in dimension table are designated as foreign keys in fact table. They describe about the details about business processes. Advantages of Star Schema:  It’s a steady, recoverable and reinforce able approach to build OLAP cube.  Extraction, Transformation and loading can be performed swiftly.  Has minimum number of foreign keys and complexity. Star Schema for IOWA Liquor Sales
  • 8. 7 3.1 Designing the Data warehouse: DIMENSION TABLES ITEM_DIM: This dimension contains product Id as primary key, product number and product description. All the measurable values like item sold, item sold in volume, number of bottles sold are mentioned in fact table. CATEGORY_DIM: This dimension consist the brand category ID as primary key, category number and category name. INVOICE_DIM: This dimension contains the invoice Id, invoice Item no, invoice date, bottle sold, sale is dollars, volume sold in litre, volume sold in gallons. VENDOR_DIM: This dimension consists of vendor Id(primary key), vendor number and vendor name. STORE_DIM: This dimension contains store Id as primary key, store number and store name Dimension tables are interlinked and all the primary keys in dimension tables are declared as foreign key in Fact table. FACT TABLE LIQUOR_FACT: It comprises of records related to Invoice, Vendor, Item, Category and Store that are maneuvered to process business queries. It contains two types of attributes foreign keys and measures
  • 9. 8 4. Data warehouse Implementation Create STORE_DIM Table: CREATE TABLE STORE_DIM ( STORE_ID INT PRIMARY KEY NOT NULL, STORE_NO INT, STORE_NAME NVARCHAR(50), ADRRESS NVARCHAR(50), STORE_CITY NVARCHAR(50), STORE_COUNTY NVARCHAR(50), STORE_COUNTY_NO INT, STORE_ZIP NVARCHAR(50) ) Create CATEGORY_DIM Table: CREATE TABLE CATEGORY_DIM ( CATEGORY_ID INT PRIMARY KEY NOT NULL, CATEGORY NVARCHAR(50), CATEGORY_NAME NVARCHAR(50) ) Create VENDOR_DIM Table: CREATE TABLE VENDOR_DIM ( VENDOR_ID INT PRIMARY KEY NOT NULL, VENDOR_NO INT, VENDOR_NAME NVARCHAR(50) ) Create ITEM_DIM Table: CREATE TABLE ITEM_DIM ( ITEM_ID INT PRIMARY KEY NOT NULL, ITEM_NO INT , ITEM_DESCRIPTION NVARCHAR(100), CATEGORY_ID INT NOT NULL, VENDOR_ID INT NOT NULL )
  • 10. 9 Create INVOICE_DIM Table: CREATE TABLE INVOICE_DIM ( INVOICE_ID INT PRIMARY KEY NOT NULL, ITEM_ID INT NOT NULL, STORE_ID INT NOT NULL, INVOICE_ITEM_NO NVARCHAR(50), INVOICE_DATE NVARCHAR(50) , BOTTLE_SOLD INT, SALES_DOLLARS NVARCHAR(50), VOL_SOLD_LITRES FLOAT, VOL_SOLD_GALLONS FLOAT ) Create LIQUOR_FACT Table: CREATE TABLE LIQUOR_FACT ( INVOICE_ID INT NOT NULL, STORE_ID INT NOT NULL , VENDOR_ID INT NOT NULL, ITEM_ID INT NOT NULL, CATEGORY_ID INT NOT NULL, PACK_SIZE INT NOT NULL, TOTAL_BOTTLE_SOLD INT NOT NULL, TOTAL_SALES NVARCHAR(50) NOT NULL , TOTAL_VOL_SOLD_LITRES NVARCHAR(50) NOT NULL ) use lowa_liquor_sales_star alter table LIQUOR_FACT add constraint FK_STORE_ID Foreign Key (STORE_ID) references STORE_DIM (STORE_ID) alter table LIQUOR_FACT add constraint FK_VENDOR_ID Foreign Key (VENDOR_ID) references VENDOR_DIM (VENDOR_ID) alter table LIQUOR_FACT add constraint FK_INVOICE_ID Foreign Key (INVOICE_ID) references INVOICE_DIM (INVOICE_ID) alter table LIQUOR_FACT add constraint FK_ITEM_ID Foreign Key (ITEM_ID) references ITEM_DIM (ITEM_ID) alter table LIQUOR_FACT add constraint FK_CATEGORY_ID Foreign Key (CATEGORY_ID) references CATEGORY_DIM (CATEGORY_ID)
  • 11. 10 4.1 Extract, Transform and Load Data warehouse construction involves extraction of data from different sources, transforming it and loading it onto data warehouse. SSIS and SSMS are used to perform the ETL task. Extraction: Data has been extracted from source database containing five datasets and lodged into the corresponding dimensions. Source Database: Iowa_Liquor_sales: This is the source database containing all the raw data of Iowa Liquor sales which is loaded in five tables. The extracted data has been cleaned and reconstructed into structured dataset thereby serving as staging area prior to transformation and loading. Transformation and Loading In the process the data is converted into meaningful information. All the dimension and fact table are populated with relevant data from staging tables using SSIS tool. Fact table is populated using the dimension tables. The primary key in dimension table serves as foreign key in fact table thereby justifying our star schema.
  • 12. 11 Populating the Dimension Tables using SSIS VENDOR_DIM ITEM_DIM
  • 13. 12 STORE_DIM CATEGORY_DIM POPULATING FACT TABLE USING SQL QUERY USE [lowa_liquor_sales_star] GO INSERT INTO [dbo].[LIQUOR_FACT] ([INVOICE_ID] ,[STORE_ID] ,[VENDOR_ID] ,[ITEM_ID] ,[CATEGORY_ID] ,[TOTAL_BOTTLE_SOLD] ,[TOTAL_SALES] ,[TOTAL_VOL_SOLD_LITRES]) select i.INVOICE_ID,i.STORE_ID,v.VENDOR_ID,it.ITEM_ID,c.CATEGORY_ID,i.BOTTLE_SOLD ,i.SALES_DOLLARS,i.VOL_SOLD_LITRES from dbo.INVOICE_DIM as i join dbo.ITEM_DIM as it on i.ITEM_ID=it.ITEM_ID join dbo.STORE_DIM as s on i.STORE_ID=s.STORE_ID join dbo.VENDOR_DIM as v on it.VENDOR_ID=v.VENDOR_ID join dbo.CATEGORY_DIM as c on it.CATEGORY_ID=c.CATEGORY_ID
  • 14. 13 5. SSRS REPORTS and VISUALIZATION 5.1 SSRS Report Analysis: Report depicts information about invoice generated with Invoice ID on a particular date in Liquor store. Query: select a.INVOICE_ID, b.STORE_NAME, a.INVOICE_DATE,b.STORE_CITY from INVOICE_DIM a, STORE_DIM b where a.STORE_ID = b.STORE_ID ORDER BY b. STORE_CITY
  • 15. 14 Analysis: Report illustrates the information about the product highlighting the availability with the product vendor. It comprises Item Id, Vendor Id, Item description, Vendor Name. Query: select a.ITEM_ID ,b.VENDOR_ID, a.ITEM_DESCRIPTION ,b.VENDOR_NAME From ITEM_DIM a, VENDOR_DIM b where a.ITEM_ID=b.ITEM_ID order by b.VENDOR_NAME
  • 16. 15 Analysis: The above report highlights the details about invoice generated comprising of Invoice Id, Bottle sold, Volume in litre, Sales in dollars
  • 17. 16 5.2 TABLEAU VISUALIZATION: CASE STUDY OF LIQUOR VENDOR  The above packed bubble figure illustrates the vendors having largest business in Iowa State.  The figure depicts that Diageo Americas are having maximum sales in Iowa State.  Jim Beam Brands, Bacardi USA, Inc, Luxco St Loius, Sazerac CO,Inc, Laird and Company are the strong competitors in the liquor business.  It provides considerable input to vendors to develop and innovate new strategies to elevate their business inventories in the Liquor industry  It also can assist to view the top ten vendors in the liquor industry in the state of Iowa.
  • 18. 17 CASE STUDY OF BOTTLES SOLD ACCORDING TO LIQOUR BRANDS  The above pie chart illustrates the amount of bottles sold according to the corresponding liquor brands.  The chart clearly depicts that Vodka 80 is the highest selling liquor brand in the city of Iowa.  Blended Whiskies, Spiced Rum, Canadian Whiskies are the prominent competitors in the Liquor brands industry.  Also ,it gives brief overview of the customers preferences of Liquor brands.  This will assist the vendors and store to strategize their business inventory in order to increase their sales in Liquor industry.
  • 19. 18 BUSINESS CASE STUDY OF VENDOR AND SALES  The above horizontal bar graph illustrates the liquor sales for the corresponding Liquor vendor.  It highlight’s the liquor brand selling highest amount of Liquor bottles for a particular vendor.  For e.g. Tenessa Whiskes is the highest selling Liquor brand by Brown Farmen Corporation.  The above illustration assists the vendors to strategize their business and achieve profits by selling the brands that are in high demand.
  • 20. 19 CASE STUDY OF SALES vs CATEGORY of LIQUOR BOTTLES  The above bar graph demonstrates the Sales of Liquor bottles according to the corresponding brand names.  It gives the details about the bottles cost, Sales in dollars against the category of the liquor brand.  The graph assists in briefing the vendor about the sales of the bottles according to the brands in detail.
  • 21. 20 6. eXtensible Markup Language, XSD and DTD XML : eXtensible markup language is abbreviated as XML. It is used to store data in text format which helps in storing, sharing and accessing data on different platforms. Its widely used in web development and acts complimentary to HTML. XSD: It’s a formal representation to describe XML. It can be used for verification of the documents by the programmers. DTD: DTD stands for Document Type Definition. It comprises the structure, legal elements and attributes of XML document
  • 22. 21 XML Document: <CATEGORY> <CATEGORY category_id="1"> <CATEGORY_DIM> <category>1011100</category> <category_name>BLENDED WHISKIES</category_name> </CATEGORY_DIM> </CATEGORY> <CATEGORY category_id="2"> <CATEGORY_DIM> <category>1011200</category> <category_name>STRAIGHT BOURBON WHISKIES</category_name> </CATEGORY_DIM> </CATEGORY> <CATEGORY category_id="3"> <CATEGORY_DIM> <category>1011300</category> <category_name>TENNESSEE WHISKIES</category_name> </CATEGORY_DIM> </CATEGORY> <CATEGORY category_id="4"> <CATEGORY_DIM> <category>1011500</category> <category_name>STRAIGHT RYE WHISKIES</category_name> </CATEGORY_DIM> </CATEGORY> <CATEGORY category_id="5"> <CATEGORY_DIM> <category>1012100</category> <category_name>CANADIAN WHISKIES</category_name> </CATEGORY_DIM> </CATEGORY> </CATEGORY>
  • 23. 22 XSD : <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="TRUSHI"> <xs:complexType> <xs:sequence> <xs:element ref="CATEGORY"/> <xs:element ref="INVOICE"/> <xs:element ref="ITEM"/> <xs:element ref="STORE"/> <xs:element ref="VENDOR"/> <xs:element ref="LIQUOR_FACT"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="CATEGORY"> <xs:complexType> <xs:sequence minOccurs="0"> <xs:element ref="CATEGORY"/> <xs:element ref="CATEGORY_DIM"/> </xs:sequence> <xs:attribute name="category_id" type="xs:integer"/> </xs:complexType> </xs:element> <xs:element name="CATEGORY_DIM"> <xs:complexType> <xs:sequence> <xs:element ref="category"/> <xs:element ref="category_name"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="category" type="xs:integer"/> <xs:element name="category_name" type="xs:string"/> <xs:element name="INVOICE"> <xs:complexType> <xs:sequence minOccurs="0"> <xs:element ref="INVOICE"/> <xs:element ref="INVOICE_DIM"/> </xs:sequence> <xs:attribute name="invoice_id" type="xs:integer"/> </xs:complexType> </xs:element> <xs:element name="INVOICE_DIM"> <xs:complexType> <xs:sequence>
  • 24. 23 DTD: <?xml encoding="UTF-8"?> <!ELEMENT TRUSHI (CATEGORY,INVOICE,ITEM,STORE,VENDOR,LIQUOR_FACT)> <!ATTLIST TRUSHI xmlns CDATA #FIXED ''> <!ELEMENT CATEGORY (CATEGORY,CATEGORY_DIM)?> <!ATTLIST CATEGORY xmlns CDATA #FIXED '' category_id CDATA #IMPLIED> <!ELEMENT CATEGORY_DIM (category,category_name)> <!ATTLIST CATEGORY_DIM xmlns CDATA #FIXED ''> <!ELEMENT category (#PCDATA)> <!ATTLIST category xmlns CDATA #FIXED ''> <!ELEMENT category_name (#PCDATA)> <!ATTLIST category_name xmlns CDATA #FIXED ''> <!ELEMENT INVOICE (INVOICE,INVOICE_DIM)?> <!ATTLIST INVOICE xmlns CDATA #FIXED '' invoice_id CDATA #IMPLIED> <!ELEMENT INVOICE_DIM (item_id,store_id,invoice_item_id,invoice_date, bottle_sold,sale_dollars,vol_sold_litres, vol_sold_gallons)> <!ATTLIST INVOICE_DIM xmlns CDATA #FIXED ''> <!ELEMENT item_id (#PCDATA)> <!ATTLIST item_id xmlns CDATA #FIXED ''> <!ELEMENT store_id (#PCDATA)> <!ATTLIST store_id xmlns CDATA #FIXED ''> <!ELEMENT invoice_item_id (#PCDATA)> <!ATTLIST invoice_item_id xmlns CDATA #FIXED ''> <!ELEMENT invoice_date (#PCDATA)> <!ATTLIST invoice_date xmlns CDATA #FIXED ''>
  • 25. 24 7. Neo4J and Cypher Queries Step 1: Load all the .csv files onto Neo4j workspace and Index the files Step 2: Create the relationship between the nodes Upload Category Dimension File load csv with headers from "file:///Category_dim.csv" as row create (CT:Category) set CT=row{CATEGORY_ID:row.CATEGORY_ID,CATEGORY_NAME:row.CATEGO RY_Name} return CT
  • 26. 25 Upload Item Dimension File Upload Invoice Dimension File: Upload Liquor Fact File: load csv with headers from "file:///Item_dim.csv" as row create (IT:Item) set IT=row{ITEM_ID:row.ITEM_ID,ITEM_NO:row.ITEM_NO,ITEM_DES CRIPTION:row.ITEM_DESCRIPTION,CATEGORY_ID:row.CATEGORY_ ID,VENDOR_ID:row.VENDOR_ID} return IT load csv with headers from "file:///Invoice_dim.csv" as row create (INV:Invoice) set INV=row{INVOICE_ID:row.INVOICE_ID,ITEM_ID:row.ITEM_ID,STORE_ID:ro w.STORE_ID,INVOICE_ITEM_NO:row.INVOICE_ITEM_NO, INVOICE_DATE:row.INVOICE_DATE,BOTTLE_SOLD:row.BOTTLE_SOLD, SALES_DOLLARS:row.SALES_DOLLARS,VOL_SOLD_LITRES:row.VOL_SOLD_LITR ES, VOL_SOLD_GALLONS:row.VOL_SOLD_GALLONS} return INV load csv with headers from "file:///Liquor_fact_dim.csv" as row create (FA:LiquorSale) set FA=row{TOTAL_SALES:row.TOTAL_SALES,CATEGORY_ID:row.CATEGORY_ID,IN VOICE_ID:row.INVOICE_ID,ITEM_ID:row.ITEM_ID} return FA
  • 27. 26 7.1 RELATION BETWEEN NODES Relationship between category and Liquor fact table Relationship between category and Liquor fact table MATCH t=()-[r:has_CATEGORY_ID]->() RETURN t LIMIT 25 MATCH t=()-[r:has_INVOICE_ID]->() RETURN t LIMIT 25
  • 28. 27 7.2 Relation between all multiple nodes: Explanation: The above diagram illustrates the cardinality of relationship between multiple nodes.  The above Neo 4 j graph depicts the relationship between Category, Invoice and Liquor fact table.  The circle corresponds to the nodes and line corresponds to relationship.  Category node is represented by orange colour, Invoice is represented by blue colour and liquor fact node is represented by red colour.
  • 29. 28 Code for relationship between multiple nodes Graph Database and Relational Database:  Graph database are indexed and consists of nodes and relationships  Energy consumed is more as compared to relational databases.  Captured data can be changed with ease in graph database  Traversal is used to join relations in graphical database. match(CT:Category),(FA:LiquorSale) where CT.CATEGORY_ID=FA.CATEGORY_ID create(FA) [r:has_CATEGORY_ID]->(CT) return CT,FA,r match(a:Artist),(as:ArtistSale) where a.ArtistKey=as.ArtistKey create(as)[r:has_artistkey]->(a) return a,as,r match(INV:Invoice),(FA:LiquorSale) where INV.INVOICE_ID=FA.INVOICE_ID create(FA) [r:has_INVOICE_ID]->(INV) return INV,FA,r match(FA:LiquorSale),(CT:Category), (INV:Invoice) where CT.CATEGORY_ID=FA.CATEGORY_ID AND INV.INVOICE_ID=FA.INVOICE_ID return CT,FA,INV
  • 30. 29 8. Bibliography Kimball, R. The Data Warehouse Toolkit. John Wiley, 1996 XML and XSD Validation link http://www.utilitiesonline.info/xsdvalidation/#.XBqEuVz7TIV Department of Commerce, IOWA State government. Available at:https://data.iowa.gov/Sales-Distribution/Iowa-Liquor-Sales/m3tr- qhgy