0% found this document useful (0 votes)
3 views

WeatherDataAnalysis

Mini project

Uploaded by

tanujashinde273
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
3 views

WeatherDataAnalysis

Mini project

Uploaded by

tanujashinde273
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 17
DSBDA REPORT ON ”Weather Data Analysis SUBMITTED TO THE SAVITRIBAI PHULE PUNE UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE AWARD OF THE, BACHELOR OF IT ENGINEERING BY ‘Mr. Hitesh Gopal Patil (11902708552) Mr. Prathamesh Bajirao Chormale (11902708510) ‘Ms. Tanuja Babasaheb Shinde (11902708569) Mr. Devidas Kakasaheb Tambe (11902708573) UNDER THE GUIDANCE OF Ms. Shital S. Patil DEPARTMENT OF IT ENGINEERING Sir Visvesvaraya Institute Of Technology, Nashik A/p.Chincholi, Tal.Sinnar, Dist.Nashik - 422102(MS)India YEAR 2024-2025 DEPARTMENT OF INFORMATION TECHNOLOGY Sir Visvesvaraya Institute Of Technology, Nashik A /p.Chincholi, Tal.Sinner, Dist.Nashik - 422102(MS)India Year 2024-25 we CERTIFICATE ‘This is to certify that DSBDAL report entitled “Weather Data Analysis” Is submitted as partial fulfillment of curriculum of the T.E. of IT Engineering BY ‘Mr. Hitesh Gopal Patil (11902708552) Mr. Prathamesh Bajirao Chormale (11902708510) Ms. Tanuja Babasaheb Shinde (T1902708569) ‘Mr. Devidas Kakasaheb Tambe (11902708573) (Ms. Shital S. Patil) (Dr-Pratibha.V.Kashid) Guide Head Of Department SVIT, Nashik Certificate By Guide This is to certify that Mr. Hitesh Gopal Patil (11902708552) ‘Mr. Prathamesh Bajirao Chormale (11902708510) Ms. Tanuja Babasaheb Shinde (11902708569) Mr. Devidas Kakasaheb Tambe (11902708573) Has completed the DSBDA project under my guidance and that, I have verified the work for its originality in documentation, problem statement, literature survey and conclusion presented in DSBDA project . Place: Nashik (Ms. Shital S. Patil) Date: Acknowledgement Itis our immense pleasure to work on this project Weather Data Analysis. It is only the ble: ing of my divine master which has prompted and mentally equipped me to undergo the study of this project. We would like to thank Prof Dr.G.B.Shinde, Principal, Sir Visvesvarya Institute of Technology for giving me such an opportunity to develop practical knowledge about subject. We are also thankful to Dr.Pratibha.V.Kashid, Head of IT Engineering Department for his valuable encouragement at every phase of our project and completion. We offer our sincere thanks to our guide Ms, Shital S, Patil, who very a encourages We to work on the subject and gave his valuable guidance from time to time. While preparing this project we are very much thankful to him, We are also grateful to entire staff of IT Engineering Department for their kind co- operation who helped we in successful completion of project. SVIT, NASHIK. Mr. Hitesh Gopal Patil (11902708552) Mr. Prathamesh Bajirao Chormale (11902708510) Tanuja Babasaheb Shinde (1902708569) Mr. Devidas Kakasaheb Tambe (11902708573) INDEX SR.NO TITTLE PAGE NO. 1 Abstract 1 2 Introduction 2 3 Implementation 3 4 Conclusion 5 ABSTRACT The aim of this project is to perform exploratory data analysis and predictive modeling on a weather dataset using Python, The dataset contains hourly weather records for the year 2012, including attributes such as temperature, humidity, wind speed, visibility, and atmospheric pressure. Through data preprocessing and visualization techniques, we uncover patterns, seasonal trends, and relationships among the variables. Additionally, a simple linear regression model is implemented to predict temperature based on selected features like humidity, wind speed, and pressure. The project highlights the importance of data-driven insights in understanding weather behavior and sets the foundation for building more accurate predictive systems in the future. ‘This project presents a comprehensive analysis of hourly weather data collected over the year 2012. The objective is to explore, understand, and predict weather patterns using data science tools and techniques. The dataset includes key weather parameters such as temperature, dew point, relative humidity, wind speed, visibility, and atmospheric pressure. ‘The analysis begins with data cleaning and preprocessing, followed by detailed exploratory data analysis (EDA) using visualizations like line graphs, scatter plots, histograms, and heatmaps. These visualizations help reveal trends such as seasonal temperature variation, the relationship between temperature and humidity, and correlations among various weather attributes. INTRODUCTION Weather has a significant impact on human life, affecting agriculture, transportation, health, and even the economy. With the growing availability of large weather datasets and powerful data analysis tools, it possible to understand and predict weather patterns using data science techniques. s now This project focuses on analyzing hourly weather data collected throughout the year 2012. The dataset includes various parameters such as temperature, dew point, humidity, wind speed, visibility, and atmospheric pressure. By performing exploratory data analysis (EDA), we aim to uncover meaningful patterns and relationships among these weather attributes. In addition to EDA, we also implement a basic machine learning model to predict temperature based on other environmental features. Python libraries like Pandas, Matplotlib, Seaborn, and Scikit-learn are used to handle data processing, visualization, and modeling. ‘The objective of this project is not only to gain insights from real-world weather data but also to apply fundamental data science techniques that are essential for solving practical problems. IMPLEMENTATION ‘The implementation of this project was carried out in Python using Jupyter Notebook. It involved multi steps including data loading, cleaning, analysis, visualization, and predictive modeling. Below is a detailed explanation of each phase: 1. Importing Required Libraries We started by importing essential libraries: + pandas and numpy for data manipulation, + matplotlib.pyplot and seaborn for data visualization, + scikit-leamn for building the machine learning model. 2. Loading and Exploring the Dataset ‘The dataset Weather Data.csv was loaded using Pandas. We used functions like .info(), -head(), and describe() to understand its structure and summary statistics. 3. Data Cleaning and Preprocessing + Checked for missing values and found none. + Removed any duplicate records. © Converted Date/Time column to datetime format and set it as the index for time-series analysis, 4, Data Visualization Various plots were created to analyze trends and relationships: + Line Plot: To visi * Histogram: To observe temperature distribution. + Heatmap: To understand correlation among numerical features. + Scatter Plot: To examine relationship between humidity and temperature. + Daily & Monthly Trends: Focused analysis on May Ist and monthly averages. lize temperature trends throughout the yea 5, Feature Engineering + Extracted the month from the datetime index for seasonal analysis. 6. Machine Learning Model A Linear Regression model was implemented to predict temperature using: + Relative Humidity + Wind Speed + Pressure Steps: + Defined input (X) and output (y) features, + Split the dataset into training and testing sets. + Trained the model and evaluated its performance using R? score and Mean Squared Error (MSE). Results: + R®Score: 0.177 + Mean Squared Error: 119.12 This shows the linear model could partially explain the variation in temperature but could be improved with more features or complex models. CONCLUSION In this project, we successfully analyzed a real-world weather dataset using Python. By applying data cleaning, preprocessing, and visualization techniques, we were able to uncover meaningful insights about temperature trends, humidity levels, seasonal patterns, and the relationships between different weather parameters. We observed that temperature generally follows a seasonal trend and is influenced by factors like humidity and atmospheric pressure. The data visualizations helped us better understand these patterns Furthermore, we implemented a simple linear regression model to predict temperature using humidity, wind speed, and pressure as input features. Although the model provided a basic prediction, the R? score indicated that more complex models or additional data would be needed to improve accuracy. This project has strengthened our understanding of exploratory data analysis, time-series data handling, and regression modeling. It also demonstrates how data science techniques can be applied to gain valuable insights from environmental data, paving the way for more advanced forecasting systems in the future. import pandas as pd import nunpy as np import matplotlib.pyplot as plt import seaborn as sns GF = pd.read_csv( "Weather Data.cs) af ° 8779 8780 8781 8782 8783 8784 rows x 8 columns Date/Time Temp ¢ anjeore 0:00 anjeore 1:00 a2012 200 anjoiz 3:00 an2o12 4:00 12/31/2012 19:00 12/31/2012 20:00 12/31/2012 21:00 12/31/2012 22.00 12/31/2012 23:00 18 18 18 AS a5 a 02 00 Rel Hum_% 86 7 89 88 88 81 83 9B 89 86 Wind Speed_km/h 30 24 28 28 30 Visibility km Press kPa 80 80 40 40 48 97 97 48 97 101.24 101.24 101.26 101.27 101.23 100.13 100.03 99.95 99.91 99.89 ¢ df.info() RangeIndex: 8784 entries, @ to 8783 Data columns (total 8 columns): # Colum Non-Null Count Dtype @ Date/Time 8784 non-null object 1 Temp_c 8784 non-null floatea 2 Dew Point Temp_C 8784 non-null floated 3 Rel Hum_% 8784 non-null int64 4 Wind Speed_km/h 8784 non-null int6a 5 6 Visibility km 8784 non-null floatea Press_kPa 8784 non-null floate4 7 Weather 8784 non-null object dtypes: floatea(4), int6a(2), object(2) memory usage: 549.1+ KB print (df.isnul1().sum()) Date/Time @ Temp_C Dew Point Tenp_c Rel Hum_% Wind Speed_km/h Visibility_km Press_kPa Weather dtype: intea Gf = df.drop_duplicates() df .describe() Dew Point Wind TempC “Temp.c RetHUM% soeed km/h Visibility km Press kPa count 8784,000000 8784.000000 8784,000000 8784.000000 8784.000000 8784,000000 mean 8798144 2.555294 67431694 14945469 27.664447 101.051623 std 11.687883 10883072 16918881 8.688696 12.622688 0.844005, min -23300000 -28,500000 18.000000 0.000000. 0.200000 97520000 25% 0.100000 5.900000 56000000 9.000000 24.100000 100560000 50% 9300000 + 3.300000 8.000000 + 13,000000 25.0000 101.070000 75% 18800000 11.80000081,000000 + 20,000000 2.000000 101590000 max 33,000000 24400000 100,.000000 + 83,000000 48300000 103.6500 df[ ‘Formatted Date'] = pd.to_datetime(df[ 'Date/Time' ]) dF. set_index( ‘Formatted Date’, inplace=True) af Formatted Date 2012-01- o1 00:00:00 2012-01- o1 01:00:00 2012-01- o1 02:00:00 2012-01- o1 (03:00:00 2012-01- o1 04:00:00 2012-12- 31 19:00:00 2012-12- 31 20:00:00 2012-12- 31 21:00:00 2012-12- 31 22:00:00 2012-12- 31 23:00:00 Date/Time Temp.C arj2012 0.00 qnj2012 1:00 anp2012 2:00 anor 3:00 anj2o1z 4:00 12/31/2012 19:00 12/31/2012 20:00 12312012 21:00 12/31/2012 22:00 12/31/2012 23:00 8784 rows x 8 columns 18 18 a5 o1 02 00 Point Temp_C 27 15 18 Rel Hum % 86 87 89 81 83 83 89 86 Wind Speed_km/h 30 24 28 28 30 Visibility km Press kPé 80 80 49 40 48 97 97 48 97 13 Deere plt.Figure(Figsize=(12,5)) plt.plot(df.index, df[‘Temp_c']) plt.title("Tenperature Over Time") plt.xlabel ("Date") plt.ylabel( “Temperature (C)") plt.grid() plt.show() 101.2 101.24 01.2 101.2; 101.2: 100.1 1000: 99,9: 99.9" 99.8 In [12 ‘Temperature Over Time ‘empertire (€) ate numeric_df = df.select_dtypes(include=[‘float64', ‘int64']) plt.figure(Figsize=(10,6)) sns-heatmap(numeric_df.corr(), annot=True, cmap='coolwarm' ) plt.title("Correlation Heatmap") plt.show() Correlation Heatmas a 10 Temp.¢ os ew Point Temp.€ os -0a Rel Hum 36 -02 Wind speed kr . 00 siity_krn ~ 02 04 Press, kPa Temp. & g z 4 2 z z ew Point Temp.C Wind Speed kmh pit. Figure(Figsize=(8,5)) sns.histplot(df[‘Tenp_C'], kde=True, color="orange’) plt.title(‘ Temperature Distribution") plt.xlabel('Tenperature (C)') pit. ylabel (‘Frequency’) plt.grid() plt. show() n [14 ‘Temperature Distribution 400 Frequency § 200 100 0 10 ‘Temperature (C) pit. Figure (Figsize=(8,5)) sns.scatterplot (data=-df, x="Rel Hum_X', plt.title(*Humidity vs Temperature’) plt.grid() plt.show() ‘Temp_c*) Humidity vs Temperature 20 40 60. 80 100 Rel Hum_% # Get data for the entire da day_data = df-loc['2012-05-01"] Ast May 2012 plt. Figure(figsize=(12,5)) plt.plot(day_data.index, day data['Temp_c'], marker='0', colors" green’) 1 plt.title(*Temperature Throughout the Day (1 May 2012)") plt.xlabel('Time') plt.ylabel( ‘Temperature (C)') pit. xticks(rotation=45) plt.grid() plt.show() “Temperature Throughout the Day (1 May 2012) ‘Temperate () * > 2 3 & a ra Ca & d#[ Month") = df.index.month monthly_avg = d¥.groupby("Nonth')[‘Tenp_C*}.mean() plt.Figure(Figsize-(10,5)) monthly _avg.plot(marker='0', color="purple’) plt.title(‘Monthly Average Temperature’ ) plt.xlabel (‘Month' ) plt.ylabel(‘Avg Temperature (C)') plt.grid() plt.show() Monthly Average Temperature 2 2 $0 2 Es 2 ° -s 3 7 3 3 % 2 Month from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score 17]: X = d[['Rel Hum%', ‘Wind Speed_km/h’, ‘Press kPa']] y = df{'Temp_c'] # split data X_train, X test, y train, y test = train_test_split(X, y, test_size=0.2, random # Train model model = LinearRegression() model. fit(X_train, y_train) # Predict and evaluate y_pred = model.predict(x_test) y_pred Ly) array([ 8.31174767, 10.49252381, 1.6905262 , ..., 9.3846832 , 13.71053101, 14.93376871]) 1s]: print ("R2 Score:", r2_score(y test, y pred) print("Mean Squared Error:", mean_squared_error(y test, y_pred)) R2 Score: @.17748486570306532 Mean Squared Error: 119.11967208953386

You might also like