Crime Data Analysis in Toronto - Group 4
Crime Data Analysis in Toronto - Group 4
TORONTO
Crime Data Analysis and Exploratory Report
Toronto Police Services
Group 4
Dev Wadiker
Chinmay Wadhavkar
Hajera Unnisa
Deepanshi
1
AGENDA
•Introduction
•Data Description
•Methodology
•Data Cleaning and Preparation
•Exploratory Data Analysis (EDA)
•Correlation Analysis
•Model Analysis-Predictive Modeling
•Conclusions
•Recommendations
2
EXECUTIVE SUMMARY
Key Findings:
•Data cleaning improved dataset quality.
•Identified significant temporal and spatial trends.
•Assaults and theft are the most prevalent crimes.
•Certain neighborhoods are consistent crime hotspots.
Problem Statement:
The need to identify temporal and spatial trends in crime to
inform resource allocation and public safety measures.
Objectives:
1.Clean and preprocess the dataset.
2.Conduct exploratory data analysis (EDA).
3.Provide insights into crime distribution.
4.Offer recommendations for crime prevention and resource
allocation.
4
DATA DESCRIPTION
5
METHODOLOGY
6
DATA CLEANING AND PREPARATION
Process:
7
EXPLORATORY DATA ANALYSIS (EDA)
1. Descriptive Statistics
Objective: Provide an overview of the main
statistics of the dataset, such as the count, mean,
median, standard deviation, minimum, and
maximum values for key variables.
Summary:
crime_trends_over_years
visualization shows an
increase in crime rates up
until 2020, with a significant
drop in 2024. This could
indicate a data anomaly or the
effect of external factors such
as the COVID-19 pandemic.
9
B. Crime Distribution by Day
of the Week:
The
crime_distribution_by_dow.pn
g chart indicates a fairly
uniform distribution of crimes
across the week, with slightly
higher incidents on Fridays
and Saturdays.
10
C. Crime Distribution by Hour
of the Day:
crime_distribution_by_hour
visualization highlights peak
crime hours between midnight
and 2 AM, with another rise in
the late afternoon to early
evening
11
3. SPATIAL ANALYSIS
A. Crime Hotspots:
The crime_hotspots.png map
reveals high concentrations of
crime in certain areas of Toronto,
with noticeable hotspots.
12
B. Neighborhood Analysis:
Crime_distribution_top_neighbo
rhoods chart identifies the top
20 neighborhoods with the
highest crime rates, with West
Humber-Clairville and Moss
Park leading.
13
4. CRIME CATEGORY ANALYSIS
A. Offense Types:
Frequency_of_offense_types
chart shows that assault is the
most frequent crime type,
followed by vehicle-related
offenses and theft.
14
B. Location and Premises Types:
Distribution_of_crimes_by_loc
ation visualization shows that
most crimes occur in condos
and mobile homes, followed by
public spaces.
15
CORRELATION ANALYSIS
•Correlation Matrix:
The correlation_matrix.png
visualization shows strong
correlations between certain
variables, like OBJECT ID and
REPORT_YEAR, which could
indicate data recording patterns
rather than meaningful insights.
16
MODEL ANALYSIS
Performance Metrics:
Objective:
• Precision, Recall, F1-Score:
Evaluate the performance of a
Significant variation across different crime
Random Forest model used to types.
predict crime types based on
Higher performance for frequent categories
temporal and spatial features. (e.g., "Assault with Weapon").
Model Summary: • Macro Average:
Hyperparameter Tuning: Precision: 0.29
Optimized using Randomized Recall: 0.08
Search.Best Parameters:Number
of Estimators: 200Maximum F1-Score: 0.11
Depth: 20Minimum Samples Split: • Weighted Average:
2Minimum Samples Leaf: 1
Overall Accuracy: Precision: 0.42
44.19% (Moderate Performance) Recall: 0.44
F1-Score: 0.38
17
1. CONFUSION MATRIX
•Insight:
18
2. FEATURE IMPORTANCE:
• Insight:
Spatial features (LAT_WGS84,
LONG_WGS84) are the most critical
predictors, followed by temporal
features (OCC_DAY, OCC_HOUR).
• Conclusion:
The Random Forest model is a solid
foundation with an accuracy of 44.19%.
• Limitations:
The model struggles with less frequent
crime categories, indicating room for
improvement.
19
CONCLUSION
Summary of Findings:
•Temporal Trends: Crime rates increased notably post-2014, with a peak in
2019-2021.
•Spatial Trends: High crime concentration in specific neighborhoods like
Moss Park and West Humber-Clairville.
•Crime Categories: Assaults are the most prevalent crime type.
•Modeling: Random Forest effectively identifies key predictors such as
location and time.
Implications:
•Strategic Resource Deployment: Allocate more resources to high-risk
areas and peak times.
•Predictive Policing: Utilize and refine predictive models to anticipate crime
trends and allocate resources proactively. 20
RECOMMENDATIONS
Strategic Resource Deployment:
• Focus on high-crime neighborhoods such
as Moss Park and West Humber-
Clairville.
• Optimize patrol schedules to cover peak
crime hours, especially late at night.
Predictive Policing:
• Model Integration: Incorporate the
Random Forest model into daily
operations for proactive crime prevention.
• Model Enhancement: Continuously
refine the model with additional data (e.g.,
socioeconomic factors, weather patterns).
Community Engagement:
• Strengthen community relations in high-
crime areas through increased presence
and outreach programs.
• Promote public awareness on crime
prevention strategies tailored to specific 21
neighborhoods.
THANK YOU