0% found this document useful (0 votes)
9 views

weather_report

The Weather Analysis Report utilizes Exploratory Data Analysis (EDA), K-Nearest Neighbors (KNN), and K-Means clustering to analyze weather data and extract insights. The project includes data preprocessing, statistical analysis, and visualizations to answer key questions about weather patterns, such as temperature trends and rainfall predictions. It demonstrates the effectiveness of machine learning techniques in weather forecasting and highlights the importance of data-driven approaches in environmental analysis.

Uploaded by

shibilbasith4u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
9 views

weather_report

The Weather Analysis Report utilizes Exploratory Data Analysis (EDA), K-Nearest Neighbors (KNN), and K-Means clustering to analyze weather data and extract insights. The project includes data preprocessing, statistical analysis, and visualizations to answer key questions about weather patterns, such as temperature trends and rainfall predictions. It demonstrates the effectiveness of machine learning techniques in weather forecasting and highlights the importance of data-driven approaches in environmental analysis.

Uploaded by

shibilbasith4u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 7
20/12/2026, 09:17 LUnttledS.pynb -Colab Weather Analysis Report Abstract ‘his projet analyzes weather date using techniques suchas Exploratory Data Analysis (DA), KNearest Neighbors (KNN) clasifcaton and KMeans clsterng he project begins by loading and prerocessing the dataset and conti wth detalles EDA to understand rand, Predictive ana is caned out vang ANN to detemine rfl presence, Means clterng groupe weather ecords int clusters to revel underhing pattems. The report provides Pyton code, stepbyetep explanations, an resus te demonstrate the application ofthese techniques in weather ana Data Loading Objective Load and explore the weather dataset to ensure readiness for analysis. 1. Import Librates: Load pandas, nenpy, ‘and skleara for analysis, 2. Load Dataset: Use p.read_csv) to load the dataset 3. Explore Dataset: Check data structure, column names, and summary code: ‘from sklearn. cluster inport Means data = pd.read_csv(“/eontent/fin Print("Finet rows of the datasets") print(data.head¢)) print("\noataset tnfornation:*) Print(dta.info()) Gate eather Condition ew Point (°C) Humidity (%) \ 1 ge aan 5.8 8 2 ‘spoke 30. se Pressure (Wa) Yenperature (*¢) Visibility (km) Kind Direction (Compass) \ class "pandas, core. rane, oataFrane'> ata columns (total 9 clues) Colum: Non Null count otyBe 2 Dew poise (°c) 730 roncnsll — fleatse 3 hamieity CO 10 ron-nsll Floatee 4 Pressure (Pa) ae roncmll— fleates 6 738 roncnull — fleatee 7 wind.oirection (Compass) 730 non-null object types: Floatsa(s), sntsa(a), object) htips:fcolab. research goosle.comidrvel WVi0cSCKMSxG9 1ZOXA/-S8oqCeErOK7-#scrolTo=hX_MalucdBb&priniMode=true wr 20/12/2026, 09:17 LUnttledS.pynb -Colab Section 1: Exploratory Data Analysis (EDA) 1.1: Which day recorded the highest temperature? Steps: 1. Find the maximum valuein the Tenperature (*c) column 2. Identity the corresponding date, code: ighest_tenp day = data. toc{datal “Temperature (°0)"]-idunax(), “Dat Print(Fdoy with Aighest cenperature: (Manest_tenp_ay}") 1.2: What is the average humidity recorded across all days? Steps: 1. Caleulate the mean of the Hunidsty (x) column, code: average_huntasty = dsta["iunlesty (X)"].nean() print hverage Munidity: (average huniaiey:.26)%°) 1.3: Caleulate the median visibility recorded in the dataset. Stops: 1. Use the nedtan() function on the visibinity (ke) column. Code: rodian visibility = data Visipilsty (Ja) "].nedian() print(?™Mecian Visibility: (redian visibility) ke") Fe Median vistbility: 2.0 kn 1.4: Which wind direction (Compass) was most frequently observed? Steps: Use the mode function to find the mast common wind dtecton. Code: 4 For exanple, 1€ the column nane is ‘wind Direction(Corpass)'s use the most frequent wind direction = éatal'Wind Direction (Congas) ].mode()(2) Print(#™Most Frequent Wind Direction: (nost_frequent wind. direction)" Fe Most Frequent hind Direction: Mt 1.5: Find the average temperature for each weather condition and the highest. Stops: 1. Group data by weather Condition 2. Calculate the mean temperature foreach group. 3. Kentity the condition with the maximum average temperature, enperature_by condition = date. grauphy( ‘Weather Condition’ Temperature (°C)*]-aean() highest ave tenp_condition = average_tangerature_by_condition.Sdenax() highest oveitenp « average, tenperstire by condition sear() htips:fcolab. research goosle.comidrvel WVi0eSCKMSxG8 1ZOXA-S8oqCeErOK7-#scrolToshX_MalucdBtb&priniMode=true on 20r1212024, 09:17 UnttiedS.pynb -Colab Print f*uesther Condition with highest average tenperature: (highest ave tenp condition), (highest_ave_tenp:.24)*C*) Weather Conéition with highest average temperature EDA Visualizations Objective: + Visualize trends and insights from the dataset, Visualization Code: 1 histogran for temperature, huntdtty, and vistbiiity ple. tigure(Figsizen(a8, 6)) 1 Temperature nistogran eidespread oust, 39.48 pie pie ae ‘subplot(t, 3, 1) hist data" Tenperature (°C)'], bins=15, colo "title("Tenperature Distribution’, fortsize=14) slabel( Temperature (°C)", fontsize=12) laben( Frequency", fontsizeui2) rea", edgecolo 1 humidity histogram als ple le att ne ‘subplot(t, 3, 2) nist (datal"wmidsty ()"], bins-15, color="blue’, edgecolor=‘black', alpha-e.7) title("Hunidity Distribution’, fontsize=t4) label("Aumtdity (%)", fontsize=12) ylaben( Frequency", fontsize=12) 4 Visibility nistogran Bs ple ne ale subplet(t, 3, 3) Iist(datal'Visibi2sty (ke)"], bine-15, color='green', edgecolor='black', alpha-®.7) Uatie("Vistaility Distribution”, fontalze=14) nlabel("Visibiity (ke)", fontsize=12) ylabel(‘Frequency" fontsize=t2) ‘eight Aayost() Section 2: K-Nearest Neighbors (KNN) Problem Statement Predict the ran presence forthe following record: + Dew Point: 13°, Humidity: 60%, Pressure: 1018 hPa 2.1: Compute distances between the new record and al records. Steps: 1. Normalize numerical colimns using Mintaxscater. uty oxbton etoyoioton 2. Compute Euclidean distances between the new record and all records code: htips:fcolab. research goosle.comidrvel WVi0cSCKMSxG9 1ZOXA/-S8oqCeErOK7-#scrolTo=hX_MalucdBb&priniMode=true ar 20/12/2026, 09:17 LUnttledS.pynb -Colab ‘from sktearn. preprocessing inport HintxScaler 1 Selact numerical colums rumerica}_coluwas = [Dew Point (2C)", “Mamtdity ()", ‘Pressure (NPa)*] fealer « Rintasealer() ddata_scaled = scaler-fit_transforn(asta[nunerseal_colunns]) rew_record = np.array({{3, 68, 1618))) rew_racord_scaled = scaler, transforn(new_record) 4 compute atscances ‘Atstances = npusqrt(np.sun((ata_scaleé ~ now_record, sealed) ** 2, axis-2)) Gatal Distance’) = distances nearest neighoors = date.asnallest(3, ‘Oistance’) print(nearest_seighbors[{"0ate', “Rain Presence’I]) b/pythond.t0/dist-packages/skleara/utéls/validation.py:2739: Userianing: X does not have valid Feature nanes, but MinManScaler was nt 2.2: Predict rain presence using Ke3. Steps: 1. Use the majority vote among the top 3 nearest neighbors. code: prediction = nearest_neignsors{Ratn_Presence'].node(9)| Print ("Preaseted Rain Presence: {precierion)”) Predicted fain Presence: @ KEN Visualization Objective: + Visualize the nearest neighbors ofthe new record Visualization Code: 4 Plot nearest nesghbors for Kt pit. figure(Figsize-(8, 6)) plesscatter(asta_scaled(:, 0), data_scaled(:, 1], c='gray', label~' Existing Date!) plt.scatter(new pecord scaled{:, @], newracord scaled, 1, color="red’, label='New Racore') pit seatter(aata scaled{eurest netghtors. index, 0], data scoLed{nearest neightors.sndex, 2], folor="blue’, Labela'Nearest Neighbors', $100, edgecolorss"black’) PIS.tStDe( KAN Nearest Netenbors") pltsxdabel(‘0em Point (°C)") pltiylaben( nuniity (2°) ples legenat) Put taght_Layout() pie-shou() htips:fcolab. research goosle.comidrvel WVi0cSCKMSxG9 1ZOXA/-S8oqCeErOK7-#scrolTo=hX_MalucdBb&priniMode=true an 20/12/2026, 09:17 Untitled jpynt KN Nearest Neighbors = Cola tol) © wising ona ee new Record @ eves none . . os tee a H.23 +, (tied : ce tes l pgs tsee ts fe : 22 titi z a ae gerd see t cote iii jiplial! a2 HH { i! <3 ote Z . tihpdis te Ptace se abes me = we a a Section 3: K-Means Clustering Problom Statement + cluster wester records into two groups using the K Means algorthm. 3.1: ntalize clusters, steps: 1. Extract numerical atributes for clustering. 2.Set initial cluster centers, code: toeans(a_clusters-2, randon_state-t2) atal ‘Cluster'] = kneans. #58 predict( cata scaleé) Print Cluster Centers:*) Cluster centers: [(eve3oasas3 o.27734724 0, 56187024) (evaaioena2 0139712291 @.21280357]] 3.2: Assign 04-Jan-2015" toa cluster. Steps: 1. Gateulate distances from cluster centers, 2. Assign the record to the nearest cluster, code: record tndex = dataldatal Oate’] == “0t-Jen-2035"].index{@) record scaled = data_sealedrecord_incex] cluster_assigmant = kneans.predict( {record scaled) print( Assigned Cluster: (cluster essiguent[@])") Fe Assigned cluster: 6 3.3: Recompute cluster centers. ntips:fcolab. research google. comidrvel WVi0cSCKM5xG9 1ZOXA\-S80qCeErOK7-#scrlTo=t X_MzluctBtb&priniMod ‘rue 20/12/2026, 09:17 LUnttledS.pynb -Colab Steps: 1. Assign records to clusters based on distances. 2. Caleulat the mean for each cluster. Code: new_centers = kneane.cluster_centers., print("upestes cluster Centers:") Print(nen centers) ST e.e3095433 9.277347 8,56187914] flaateen%2 @.sersaaen 9.212889577) KeMeans Visualization objective: + Display cluster assignments and cluster centers. Visualization Code: 1 Perform clustering again to extrac lneans = Keans(_clusters2, randon st datal Cluster’ ] = kneans. fit predect (aa 2) pit. figure(tigsize-(8, 6)) pltiscatter(aatascalec{:, 0], data_scaled[:, 1], cedatal‘Cluster"], caape'vinidts', labele'Clusters") pltiscatter(\means.cluster_centers_(:, 0), seans.cluster_centers_{!, 1), olors"red", marker="x", 5268, labein’ Cluster Centers") PIt.tstte(Koteane Clustering’) pit alabel(‘Oew Point (20)") pltsylabel( Hunidity ()") ples legenat) Ut. taght_Layout() = Means Clustering . : os & i fi of 20 — Summary ‘This project tes "Weather Date Aralys ocuses on verging dete sconce techniques to extract meaning inaights and pater fom weathe: data. The su htips:fcolab. research goosle.comidrvel WVi0cSCKMSxG9 1ZOXA/-S8oqCeErOK7-#scrolTo=hX_MalucdBb&priniMode=true 20/12/2026, 09:17 LUnttledS.pynb -Colab Integrates Exploratory Date Anais EDA) K Nearest Neighbors (KN), and XMeans Clustering to adress 12 of analytical questions and predictive goals. ‘The project begins by loading and preprocessing the weather dataset, ensuring its readiness for analys's. Through EDA, several ical {questions are answered, such as identifying the day with the highest recorded temperature, calculating average humility, and pinpointing the ‘most frequent wind direction, Statistical measures like mean and median are computed, and the data distribution is explored using histograms for parameters Ike temperature, humidity, and visibility, These visualizations provide a deeper understanding of the datasets structure and vara ‘The KNN algorithm is applied to predic the lkelnood of rainfall for a new weather record, Using distance-based measures, the nearest neighbors ae identied, and a majority vote determines the ran presence. This approach demonstrates how classification techniques can provide actionable predictions based on historical weather patterns. The K-Means clustering algorithm is utlized to group weather records into two distinct clusters. Intial cluster centers are assigned, and Iterative computations refine these centers based on the datapoints assigned to each cluster. Specific questions, such as the distance of record from a cluster center and the assignment of records to clusters, are answered. Visualizations of clusters further elucidate the underlying patterns in the dataset. The project concludes by synthesizing the findings from all three methodologies offering insights into temperature trends, rin predictions, and weather pattems. The use of Python libraries tke Pandas, NumPy, Matpltlb, and Scikit/ear underscores the role of programming in modem ata analysis, Conclusion he "Weather Data Aras’ projet succestuly demonsvats the integration of stastical analysis, machine learing and clustering techniques to analyze and Invrpret weather data EDA revealed several tical insights, such as identifying trends in temperature, humidity, and visibility, and uncovering pattems lke the most frequent wind dtection and weather conditions with extreme valves, The use of histograms to visualize distributions prove a clearer understanding ofthe data, ‘The KNN classification algorithm proved effective in predicting rain presence, demonstrating the value of using historical data for weather forecasting. By identifying the nearest neighbors of a new record ané analyzing ther characteristics, the algorithm delivered accurate and actionable predictions. Means clustering added a layer of pattern recognition by grouping records into clusters based on their numerical attributes. This approach not only highlighted distinct weather patterns but also showcased the iterative nature of machinelearing algorithms, where recalculating cluster center refines the results, Thi project highlights the power of data-driven techniques in weather analysis, The methodologies used here ean be extended to other domains, showcasing the vercatity of tools lke Python and machine learning algorithms. By merging statistial rigor with algorithmic insights, ‘his project lays the foundation for deeper exploration and practical applications in environmental analysis, weather forecasting, and decision: making processes, ‘The findings underscore the importance af weather data analysis in understanding climate pattems, planning activities, and preparing for weatherelated challenges. This study serves as a stepping stone for further exploration, potentially incorporating advanced models lke neural networks or extending the analysis to larger, more diverse datasets Appendix Toole: + Python + Pandas + Numpy + Matplotib + Scikitearn, htips:fcolab. research goosle.comidrvel WVi0cSCKMSxG9 1ZOXA/-S8oqCeErOK7-#scrolTo=hX_MalucdBb&priniMode=true a

You might also like