ipl-project
ipl-project
1. Project Overview
Summary:
This project demonstrates an end-to-end data engineering pipeline using Azure services to
ingest, clean, process, store, and visualize IPL-related data from raw CSVs to insightful Power BI
dashboards.
Objective:
To build a scalable, automated, and efficient data pipeline that:
Stores data at multiple stages (Bronze, Silver, Gold) in Azure Data Lake Storage Gen2.
Technologies Used:
Power BI
2. Architecture Diagram
Summary:
The pipeline consists of multiple stages connected via Azure services. Each stage performs
specific tasks, from raw ingestion to advanced analytics.
3. Data Ingestion & Storage
Summary:
Set up cloud infrastructure to store raw and processed data in an organized manner.
Blob Storage:
o Container: raw
o player.csv
o match.csv
o stadium.csv
o player_match.csv
o team.csv
o player_team.csv
Summary:
Used three Databricks notebooks to transform and process data through different layers
(Bronze, Silver, Gold).
o Drop nulls.
o Rename columns.
o Total Wins
o Player Stats
o Venue Analysis
Summary:
Orchestrated the pipeline using ADF pipelines to trigger Databricks notebooks sequentially.
Summary:
Used JDBC connections to transfer data from Databricks into Azure SQL DB for centralized
storage and Power BI access.
Total Tables:
Silver DB:
player_cleaned
match_cleaned
player_match_cleaned
team_cleaned
stadium_cleaned
player_team_cleaned
team_performance_metrics
player_contribution
venue_analysis
player_efficiency_metrics
match_summary_insights
7. Power BI Dashboard
Summary:
Connected Power BI to Azure SQL Database to visualize insights, performance, and key metrics
of the IPL dataset.
KPIs Created:
Summary:
Real-world implementation involved handling multiple datasets, formats, and orchestrations.
Challenges Faced:
Small Dataset
The IPL data volume was limited in size, which may not fully capture the complexities of
large-scale, real-world sports analytics projects.
Key Learnings:
Understood the process of establishing JDBC connections between Azure Databricks and
Azure SQL Database for reading and writing data.
Power BI basics
9. Conclusion
Summary:
The project successfully showcases how cloud-native tools can be combined to create a
powerful, scalable, and automated data pipeline with meaningful analytics.