0% found this document useful (0 votes)

16 views

all code explanations

The document outlines a comprehensive data analysis process for fire incident data, including data cleaning, handling missing values, and preparing the data for modeling. It details various analyses such as trends in incidents over time, spatial clustering of incidents, and the financial impact of different incident types. The code also implements predictive modeling techniques to optimize resource utilization and improve incident response efficiency.

Uploaded by

tali66261

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

all code explanations

Uploaded by

tali66261

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Code Explanation:

Data Cleaning:

1. Importing Libraries: The code starts by importing necessary libraries such as

pandas for data manipulation, scikit>learn for machine learning tasks, and geopy
for geocoding functionalities.

2. Reading Data: It reads the data from a CSV file using pandas' `read_csv` function.

3. Checking Empty Cells: The code checks for empty cells in each column of the
DataFrame using the `isnull().sum()` method.

4. Filtering Special Service Data: It filters rows where the 'StopCodeDescription' is

'Special Service' and counts NaN values in the 'SpecialServiceType' column within
the filtered rows.

5. Handling NaN Values: NaN values in the 'SpecialServiceType' column are replaced
with 'Not applicable' using the `fillna()` method.

6. Filling Data Efficiently: Several functions are defined to efficiently fill missing data:
> `fill_postcode_from_district`: Fills 'Postcode_full' values based on
'Postcode_district'.
> `fill_lat_lon_from_postcode_efficiently`: Fills 'Latitude' and 'Longitude' values
based on 'Postcode_full'.
> `fill_postcode_from_district_efficiently`: Another approach to fill 'Postcode_full'
based on 'Postcode_district'.
> `fill_incgeo_wardcode_from_propercase`: Fills 'IncGeo_WardCode' based on
'ProperCase'.
> `fill_easting_from_uprn`: Fills 'Easting_m' based on 'UPRN'.

7. Filling Blank Cells: Additional columns like 'IncGeo_WardCode',

'IncGeo_WardName', etc., are filled with 'Unknown' or 0 where appropriate.
8. Preparing for Linear Regression: The data is prepared for a linear regression
model by selecting relevant columns and handling missing values in 'Easting_m' and
'Northing_m' using linear regression.

9. Fitting Linear Regression Models: Two linear regression models are fitted:
> One predicts missing 'Easting_m' values based on 'Longitude'.
> Another predicts missing 'Northing_m' values based on 'Longitude'.

10. Predicting and Filling Missing Values: Missing 'Easting_m' and 'Northing_m'
values are predicted using the fitted models and filled in the DataFrame.

11. Saving Cleaned Data: Finally, the cleaned DataFrame is saved to a new CSV file.

In your report, you can explain each step briefly, highlighting the data cleaning and
preprocessing techniques used, the strategies for handling missing values, and the
use of linear regression for imputation. You can also mention the efficiency
considerations in filling missing data and the overall goal of preparing the data for
further analysis or modeling.

Data Behavior:

1. Trend of Incidents Over Time:

> The code converts the 'DateOfCall' column to datetime format.
> It groups the data by month and calculates the number of incidents per month.
> The monthly trend of incidents is plotted using a line graph.

2. Trend of Incident Types:

> The code counts the frequency of different types of incidents ('IncidentGroup').
> It plots the frequency of incident types using a bar chart.

3. Trend of First Pump Arriving Attendance Time:

> Rows with 'Unknown' or missing values in 'FirstPumpArriving_AttendanceTime'
are filtered out.
> The average first pump arriving attendance time is calculated monthly.
> The trend of average attendance time over time is plotted using a line graph.

4. Incidents by Hour of Call:

> The code counts the number of incidents by hour of the day ('HourOfCall').
> It plots the incidents by hour using a bar chart.

5. Notional Cost of Incidents Over Time:

> The 'Notional Cost (£)' column is converted to numeric format, handling errors if
any.
> The total notional cost of incidents is calculated monthly.
> The trend of total notional cost over time is plotted using a line graph.

Each visualization provides valuable insights into different aspects of the incident
data, such as the overall trend of incidents over time, the frequency of incident
types, response time trends, hourly patterns of incidents, and the financial impact of
incidents. These visualizations can help in understanding patterns, identifying
trends, and making informed decisions based on the data.

Spatial Analysis for Fire Incident Hotspots:

1. Loading and Preprocessing Data:

> Reads the dataset from a CSV file, selecting specific columns.
> Filters out invalid latitude and longitude values.
> Fills missing values in the 'Postcode_district' column with 'Unknown'.
> Imputes missing values in 'Latitude' and 'Longitude' columns with their mean
values.

2. Applying K>means Clustering:

> Extracts latitude and longitude coordinates for clustering.
> Applies K>means clustering with a specified number of clusters (default is 5).
> Adds a new column 'Cluster' to the DataFrame indicating the cluster each data
point belongs to.

3. Analyzing Clusters:
> Prints detailed information about each cluster, including the number of
incidents, centroid coordinates, and bounding box of latitude and longitude values.

4. Plotting Clusters:
> Plots the clustered data points on a scatter plot, using seaborn for visualization.
> Includes centroids of the clusters as red stars for better understanding of
cluster centers.

5. Generating Heatmap:
> Creates a heatmap based on the density of incidents within each cluster.
> Utilizes seaborn's kdeplot to visualize the density distribution.

This analysis helps in understanding the spatial distribution and clustering patterns
of fire incidents, providing valuable insights for further investigation or
decision>making. You can customize parameters such as the number of clusters
and plot sizes based on your specific analysis needs.

Impact of Incident Types on Resource

Utilization:
This Python code performs several data analysis and modeling tasks on fire incident
data. Here's a summary of what each part of the code accomplishes:

1. Data Exploration and Visualization:

> Groups the data by 'IncidentGroup' and aggregates unique
'StopCodeDescription' values.
> Converts the series into a DataFrame for better readability.
> Computes frequency and distribution of stations and pumps attending incidents
by incident type.
> Plots bar charts and box plots to visualize the frequency and distribution of
stations and pumps attending incidents.

2. Feature Engineering and Modeling:

> Extracts the hour from the 'TimeOfCall' column and adds it as a new feature
'HourOfCall'.
> Selects features ('IncidentGroup', 'HourOfCall', 'PropertyType') and target
('NumPumpsAttending') for the regression model.
> Encodes categorical variables using OneHotEncoder.
> Splits the dataset into training and testing sets.
> Creates a pipeline with OneHotEncoder and DecisionTreeRegressor.
> Trains the model on the training data and evaluates it on the testing data using
mean absolute error (MAE).
> Plots actual vs. predicted values to visualize model performance.

3. Cost Analysis:
> Aggregates notional costs by incident type to assess the financial impact.
> Calculates the average cost per incident within each incident group.
> Plots bar charts to visualize total and average notional costs by incident type.

Overall, this code provides a comprehensive analysis of fire incident data, including
visualization of incident characteristics, modeling the number of pumps attending
incidents, and assessing the financial impact of different incident types.

Efficiency of Incident Call Processing:

1. Data Cleaning and Preprocessing
Numeric Conversion and Handling NaNs: Ensuring that time metrics are numeric
and handling missing values are crucial steps to maintain data integrity. Clean data
leads to more reliable models, which in turn supports sound decision>making.
2. Feature Engineering
Creating New Features: Features like CallToIncidentRatio provide new insights that
can help in understanding the factors affecting response times. Knowing which
variables influence response times the most can guide resource allocation and
process improvement.
3. Model Training and Evaluation
Predictive Modeling: Using RandomForest and XGBoost models to predict first pump
arriving attendance time based on various features can help in anticipating delays
and identifying areas for improvement.
Evaluation Metrics: Metrics such as Mean Squared Error (MSE), Mean Absolute Error
(MAE), and R>squared offer insights into model performance. A model with lower
MSE and MAE and higher R>squared is more reliable. Businesses can use these
models to simulate different scenarios and prepare more effectively for future
incidents.
4. Visualization of Actual vs Predicted Values
Understanding Model Accuracy: Visual comparisons between actual and predicted
response times illustrate the model's accuracy in real>world terms. This can help in
trust>building among stakeholders and in refining the models for better accuracy.
5. Analysis of Call Volume Patterns
Call Volume by Time of Day/Week: Visualizing call volumes can reveal patterns and
trends, such as peak times or days when incidents are more likely to occur. This
insight allows businesses to allocate resources more effectively, ensuring that
adequate personnel and equipment are available when needed most.
Business Benefits and Decision Support:
Resource Optimization: By understanding when and where incidents are more likely
to occur, businesses can optimize resource allocation, ensuring that response teams
are adequately staffed and equipped to handle peak times.
Process Improvement: Identifying factors that lead to delayed response times can
highlight areas for process improvement. For example, if certain times of day have
longer response times, it may indicate a need for process adjustments or additional
resources.
Strategic Planning: Predictive models can inform long>term strategic planning, such
as where to station new resources or how to design training programs for
responders based on the most impactful factors affecting response times.
Performance Monitoring: Continuously monitoring model predictions against actual
outcomes can help in setting and tracking performance benchmarks. It also
supports a culture of continuous improvement.
In summary, the script supports a data>driven approach to managing incident
response times, offering a foundation for making informed business decisions,
optimizing operations, and enhancing readiness for future incidents.

Incident Response Cost Analysis:

1. Data Cleaning and Preprocessing:
> Loads the dataset and inspects the 'Notional Cost (£)' column for data type and
missing values.
> Converts 'Notional Cost (£)' to a numeric format, handling potential
non>numeric characters like currency symbols and commas.
> Handles missing values in 'Notional Cost (£)' by filling them with the median.

2. Data Analysis and Visualization:

> Computes summary statistics and visualizes the average cost per incident type
using bar charts.

3. Feature Selection and Splitting:

> Defines relevant columns and separates features (X) and target variable (y).
> Splits the dataset into training and testing sets.

4. Pipeline Setup:
> Sets up preprocessing pipelines for numerical and categorical features using
SimpleImputer for missing values and StandardScaler for scaling numerical
features.
> Combines the preprocessing pipelines using ColumnTransformer.
> Creates a model pipeline with GradientBoostingRegressor as the estimator.

5. Model Training and Evaluation:

> Fits the model pipeline on the training data.
> Makes predictions on both training and testing sets.
> Evaluates the model using mean absolute error (MAE) and mean squared error
(MSE) for both training and testing data.

6. Visualization of Predictions:
> Plots a scatter plot of actual vs. predicted notional costs, along with a line of
best fit.
Overall, this code provides a comprehensive example of cleaning, preprocessing,
modeling, and evaluating a machine learning regression model for predicting the
notional cost of fire incidents.

Trellix Endpoint Security (Ens) 10 7 x Product Guide - Windows Overview of Trellix Endpoint Security (Ens) 2025-01!11!10!13!21
100% (1)
Trellix Endpoint Security (Ens) 10 7 x Product Guide - Windows Overview of Trellix Endpoint Security (Ens) 2025-01!11!10!13!21
22 pages
Honda N-One 2013 Owner's Manual
No ratings yet
Honda N-One 2013 Owner's Manual
19 pages
Dynatrace Boot Camp
No ratings yet
Dynatrace Boot Camp
1 page
Vdocuments - MX - Caterpillar d3c Series II Manual XL Series III Track Type Tractor d8r Series
50% (4)
Vdocuments - MX - Caterpillar d3c Series II Manual XL Series III Track Type Tractor d8r Series
4 pages
Online Supplier Registration Guideline
No ratings yet
Online Supplier Registration Guideline
21 pages
Yash Week 3 Uber Case Study
No ratings yet
Yash Week 3 Uber Case Study
38 pages
Extended - Basic Eda Python Fellow
No ratings yet
Extended - Basic Eda Python Fellow
22 pages
report
No ratings yet
report
25 pages
11_20241108_DataAnalysis_AppliExamples
No ratings yet
11_20241108_DataAnalysis_AppliExamples
36 pages
Doc3_merged
No ratings yet
Doc3_merged
16 pages
DAC Phase2
No ratings yet
DAC Phase2
8 pages
Assignment 1 - Rev
No ratings yet
Assignment 1 - Rev
8 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
DAP writeups_merged
No ratings yet
DAP writeups_merged
33 pages
data wrangling
No ratings yet
data wrangling
6 pages
PFDA_Khalil_Mirza_TP053846.docx
No ratings yet
PFDA_Khalil_Mirza_TP053846.docx
39 pages
Modern Pandas: Hervé Mignot Equancy
No ratings yet
Modern Pandas: Hervé Mignot Equancy
21 pages
Homework 04
No ratings yet
Homework 04
2 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
python_for_rf
No ratings yet
python_for_rf
22 pages
Performing Analysis of Meteorological Data: Punam Seal
No ratings yet
Performing Analysis of Meteorological Data: Punam Seal
21 pages
Prac 7
No ratings yet
Prac 7
5 pages
task2-eda-cleaning
No ratings yet
task2-eda-cleaning
33 pages
Delhivery Feature Engineering - Solution Approach
No ratings yet
Delhivery Feature Engineering - Solution Approach
7 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Manufacturing: Engineering, Management and Marketing
From Everand
Manufacturing: Engineering, Management and Marketing
S.O.T Ogaji
No ratings yet
DAC Phase3
No ratings yet
DAC Phase3
6 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
Assignment1,codeandssfile
No ratings yet
Assignment1,codeandssfile
29 pages
cover2
No ratings yet
cover2
31 pages
Group-3 Report
No ratings yet
Group-3 Report
38 pages
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
dvlab-code
No ratings yet
dvlab-code
10 pages
GeoPandas
No ratings yet
GeoPandas
42 pages
t2
No ratings yet
t2
10 pages
CSE445 NSU Week_3
No ratings yet
CSE445 NSU Week_3
48 pages
Lavanya Sharma IP File 2024-25-1
No ratings yet
Lavanya Sharma IP File 2024-25-1
37 pages
scaffold fg
No ratings yet
scaffold fg
13 pages
Shashank Bodduna: Informatics Practices Project XII
No ratings yet
Shashank Bodduna: Informatics Practices Project XII
20 pages
Cleaning Data in Python Live Session
No ratings yet
Cleaning Data in Python Live Session
23 pages
Data Science Challenge_2
No ratings yet
Data Science Challenge_2
3 pages
Exp 8_LM
No ratings yet
Exp 8_LM
10 pages
Cloud Brokering
From Everand
Cloud Brokering
Felipe Díaz-Sánchez
No ratings yet
Project 02 Customer Service Requests Analysis Caltech
No ratings yet
Project 02 Customer Service Requests Analysis Caltech
19 pages
Final
No ratings yet
Final
32 pages
Module 2
No ratings yet
Module 2
20 pages
Assignment
No ratings yet
Assignment
12 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
RegressionAnalysisTutorial ArcGIS10
No ratings yet
RegressionAnalysisTutorial ArcGIS10
24 pages
DataCleaning
No ratings yet
DataCleaning
28 pages
EDA LAB MANUAL (1) (1)
No ratings yet
EDA LAB MANUAL (1) (1)
34 pages
Dbms
No ratings yet
Dbms
15 pages
AIML
No ratings yet
AIML
13 pages
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
From Everand
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
Wouter Verbeke
No ratings yet
AQI Project
No ratings yet
AQI Project
25 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
DEV RECORD AIDS
No ratings yet
DEV RECORD AIDS
24 pages
Tutorial RegressionAnalysis
No ratings yet
Tutorial RegressionAnalysis
23 pages
Naan Mudhalvan Phase 2
No ratings yet
Naan Mudhalvan Phase 2
13 pages
Python MCQs
No ratings yet
Python MCQs
21 pages
lab record dev
No ratings yet
lab record dev
20 pages
PHASE 2.3
No ratings yet
PHASE 2.3
8 pages
DEV Lab Material
No ratings yet
DEV Lab Material
16 pages
41b Data Wrangling, Grouping and Aggregation
No ratings yet
41b Data Wrangling, Grouping and Aggregation
31 pages
UNIT 3 Extended
No ratings yet
UNIT 3 Extended
215 pages
ISO27k Audit Exercise
No ratings yet
ISO27k Audit Exercise
6 pages
Class 11 A': Gurukul School Dhamnod
No ratings yet
Class 11 A': Gurukul School Dhamnod
17 pages
BP Process Safety Series Safe Furnace and Boiler Firing 5th ed. Edition Collective - The complete ebook set is ready for download today
100% (2)
BP Process Safety Series Safe Furnace and Boiler Firing 5th ed. Edition Collective - The complete ebook set is ready for download today
47 pages
Cooling Infrastructure Solution: Room, Row and Chiller
No ratings yet
Cooling Infrastructure Solution: Room, Row and Chiller
18 pages
Catalogue Lenze Selection Universal Joints en
No ratings yet
Catalogue Lenze Selection Universal Joints en
36 pages
Hard Landscape Operation Manuals
No ratings yet
Hard Landscape Operation Manuals
3 pages
Quectel Product Brochure V7 4 3 638246561317078702
No ratings yet
Quectel Product Brochure V7 4 3 638246561317078702
60 pages
L I ACS880 Full
No ratings yet
L I ACS880 Full
47 pages
ForgeOps Dok
No ratings yet
ForgeOps Dok
59 pages
MRB Punch List
No ratings yet
MRB Punch List
2 pages
QRadar Suite Overview
No ratings yet
QRadar Suite Overview
26 pages
Scanauto Ug en
No ratings yet
Scanauto Ug en
17 pages
EasyPlusTM Titrator
No ratings yet
EasyPlusTM Titrator
12 pages
CSC Update Log
No ratings yet
CSC Update Log
16 pages
Kurikulum Matematika Di Australia
No ratings yet
Kurikulum Matematika Di Australia
96 pages
Question Bank
No ratings yet
Question Bank
61 pages
CCC 2
No ratings yet
CCC 2
32 pages
Pistol Detector - TOTeM
100% (1)
Pistol Detector - TOTeM
10 pages
Chatgpt Prompts For Blogging
100% (2)
Chatgpt Prompts For Blogging
23 pages
Maths Half Yearly Question Papers Class 8
No ratings yet
Maths Half Yearly Question Papers Class 8
10 pages
Get International project management for technical professionals 1st Edition Brian E. Porter free all chapters
100% (5)
Get International project management for technical professionals 1st Edition Brian E. Porter free all chapters
51 pages
Avoiding Acoustic-Induced Vibration
No ratings yet
Avoiding Acoustic-Induced Vibration
4 pages
ĐỀ TIÊN TRI 8
No ratings yet
ĐỀ TIÊN TRI 8
6 pages
1-Embedded Systems Architecture
100% (1)
1-Embedded Systems Architecture
104 pages
Summary - Huang & Rust (2021)
No ratings yet
Summary - Huang & Rust (2021)
5 pages

all code explanations

Uploaded by

all code explanations

Uploaded by

Code Explanation:

1. Importing Libraries: The code starts by importing necessary libraries such as

4. Filtering Special Service Data: It filters rows where the 'StopCodeDescription' is

7. Filling Blank Cells: Additional columns like 'IncGeo_WardCode',

1. Trend of Incidents Over Time:

2. Trend of Incident Types:

3. Trend of First Pump Arriving Attendance Time:

4. Incidents by Hour of Call:

5. Notional Cost of Incidents Over Time:

Spatial Analysis for Fire Incident Hotspots:

1. Loading and Preprocessing Data:

2. Applying K>means Clustering:

Impact of Incident Types on Resource

1. Data Exploration and Visualization:

2. Feature Engineering and Modeling:

Efficiency of Incident Call Processing:

Incident Response Cost Analysis:

2. Data Analysis and Visualization:

3. Feature Selection and Splitting:

5. Model Training and Evaluation:

You might also like