AC52010
AC52010
(SEM 2 23/24)
Project Proposal
Introduction
With the increasing technology, data processing has become an integral part of our
lives. Every aspect of our life is influenced in one way or another by a processed data,
Real-time data processing has especially become critical to organisations and governments,
playing roles in weather prediction, air traffic control, and health monitoring.
Time Data Warehousing pipeline using Apache Airflow for weather data visualisation. This
system aims to efficiently manage, process, and store real-time weather data, enabling timely
analysis and insights for various applications. By leveraging Apache Airflow's powerful
process large volumes of weather data from weather APP api for continuous analysis and
visualisation.
Features to be implemented
Data Ingestion:
Data Storage:
Raw Data Storage: Store raw weather data in a data lake (AWS S3).
Structured Database: Store cleaned and transformed data in a relational database
(MySQL).
Data Transformation:
ETL Process: Implement ETL (Extract, Transform, and Load) processes using
Airflow to clean and transform the raw data.
Data Quality Checks: Ensure data integrity and accuracy through validation and
quality checks.
Real-Time Processing:
Stream Processing: Integrate Apache Kafka for real-time data streaming and
processing.
Airflow Integration: Use Airflow's Kafka sensors to handle real-time data ingestion
and processing.
Data Visualization:
Power BI Integration: Connect the processed data stored in the database to Power BI
for analysis.
Interactive Dashboards: Develop interactive dashboards and visualizations in Power
BI to provide insights into weather data trends and patterns.
Infrastructure Automation:
Project Timeline
Create a detailed design for the user interface and dashboard in Power BI.
Perform load testing to ensure the system can handle large volumes of data.
Conclusion
The proposed Real-Time Data Warehousing pipeline with Apache Airflow for
weather data on AWS will significantly enhance our ability to manage and analyse weather
data in real-time. By integrating Power BI for data visualization, the system will provide