Data Modeling
Data Modeling
Data Modeling
I. Introduction to Data Modeling
Data modeling is the process of creating a visual representation of data structures and
their relationships within a system. It's a crucial first step in database design and
application development. A well-designed data model ensures:
We'll focus on three primary types of data models: conceptual, logical, and physical.
These models represent different levels of abstraction and detail in the design process.
1 / 14
Data Modeling
2 / 14
Data Modeling
Star Schema
3 / 14
Data Modeling
schema gets its name because its structure resembles a star, with a central fact table
connected to multiple dimension tables.
1. Fact Table: Contains quantitative data (measures) for analysis, such as sales,
revenue, or profit. Includes foreign keys linking to dimension tables. Columns
typically include:
Measures: Numerical data (e.g., Total_Sales , Units_Sold ).
Foreign Keys: References to dimension tables (e.g., Product_ID ,
Date_ID ).
Example:
Product Dimension:
Date Dimension:
Store Dimension:
4 / 14
Data Modeling
Single Join Path: Queries involve joining the fact table to the relevant dimension
tables.
Denormalized Structure: Dimension tables are typically denormalized for faster
querying.
Simplicity: Easy to understand and use for business intelligence analysts.
Use Cases
5 / 14
Data Modeling
Hub_Customer
Customer_SK
--------------
1
Link_Customer_Order
Link_SK
----------
101
Sat_Customer
Customer_SK
--------------
1
1
6 / 14
Data Modeling
Use Cases
7 / 14
Data Modeling
8 / 14
Data Modeling
Use Cases
1. Social Networks:
Nodes: Users
Edges: Friendships, Follows, Likes
Example Query: "Find mutual friends of two users."
2. Recommendation Engines:
Nodes: Users, Products
Edges: Buys, Rates, Likes
Example Query: "Suggest products that users similar to Alice have bought."
3. Fraud Detection:
Nodes: Accounts, Transactions
Edges: Transfers, Access
Example Query: "Identify suspicious account clusters based on shared IP
addresses."
9 / 14
Data Modeling
10 / 14
Data Modeling
11 / 14
Data Modeling
time. It is widely used for data replication, synchronization, and integration in systems
like data warehouses, event-driven architectures, and microservices.
12 / 14
Data Modeling
Advantages of CDC
Debezium: Open-source tool for log-based CDC, integrates with Apache Kafka.
AWS Database Migration Service (DMS): Supports CDC for cloud-based
database migrations.
Talend: ETL tool with CDC capabilities.
Oracle GoldenGate: High-performance replication and CDC solution.
StreamSets: DataOps platform with CDC support.
13 / 14
Data Modeling
Visualizations
14 / 14