0% found this document useful (0 votes)
16 views

Crime Data Analysis in Toronto - Group 4

Uploaded by

hajeraunnisa188
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Crime Data Analysis in Toronto - Group 4

Uploaded by

hajeraunnisa188
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

CRIME DATA ANALYSIS IN

TORONTO
Crime Data Analysis and Exploratory Report
Toronto Police Services
Group 4
Dev Wadiker
Chinmay Wadhavkar
Hajera Unnisa
Deepanshi

1
AGENDA

•Introduction
•Data Description
•Methodology
•Data Cleaning and Preparation
•Exploratory Data Analysis (EDA)
•Correlation Analysis
•Model Analysis-Predictive Modeling
•Conclusions
•Recommendations

2
EXECUTIVE SUMMARY

Objective: Analyze Toronto crime data to uncover trends,


patterns, and actionable insights.

Key Findings:
•Data cleaning improved dataset quality.
•Identified significant temporal and spatial trends.
•Assaults and theft are the most prevalent crimes.
•Certain neighborhoods are consistent crime hotspots.

Conclusions: Insights will aid in better resource allocation and


targeted interventions by Toronto Police Services.
3
INTRODUCTION
Background:
Overview of crime challenges in Toronto and the importance of
understanding crime patterns.

Problem Statement:
The need to identify temporal and spatial trends in crime to
inform resource allocation and public safety measures.

Objectives:
1.Clean and preprocess the dataset.
2.Conduct exploratory data analysis (EDA).
3.Provide insights into crime distribution.
4.Offer recommendations for crime prevention and resource
allocation.
4
DATA DESCRIPTION

Data Source: Major Crime Indicators Open Data by Toronto


Police Services.
Variables:
OCC_DATE: Date and time of crime occurrence.
REPORT_DATE: Date and time the crime was reported.
Geographical Data: Longitude (LONG_WGS84) and
Latitude (LAT_WGS84).
Crime Types: OFFENCE and MCI_CATEGORY.
Data Quality: Cleaning steps included imputation,
standardization, and removal of duplicates.

5
METHODOLOGY

Approach: Systematic analysis starting from data


cleaning to exploratory data analysis.

Tools: Python (Pandas, Matplotlib, Seaborn), Jupyter


Notebooks, Geopandas.

Assumptions and Limitations:


•Assumed data completeness and temporal consistency.
•Geographic resolution and lack of socio-economic data
were noted limitations.

6
DATA CLEANING AND PREPARATION

Process:

1.Handling Missing Data: Mode, mean imputation, and


backfill method.
2.Dropping Redundant Columns: Removed
unnecessary or duplicate columns.
3.Data Standardization: Standardized categorical data
for consistency.
4.Handling Duplicates: Ensured each record was
unique.
•Outcome: A clean and well-structured dataset ready for
analysis.

7
EXPLORATORY DATA ANALYSIS (EDA)
1. Descriptive Statistics
Objective: Provide an overview of the main
statistics of the dataset, such as the count, mean,
median, standard deviation, minimum, and
maximum values for key variables.

Summary:

REPORT_YEAR: The data spans from 2000 to


2024, with a mean year of approximately 2019.

REPORT_DAY: The day of the report varies from 1


to 31, with an average of about 15.

OCC_YEAR: The occurrence year ranges from 2000


to 2024, with similar statistics to the report year.

LONG_WGS84 and LAT_WGS84: The geographic


coordinates have a standard deviation indicating
varying locations, with some invalid data points
(longitude 0). 8
2. TEMPORAL ANALYSIS
A. Crime Trends Over Time:

crime_trends_over_years
visualization shows an
increase in crime rates up
until 2020, with a significant
drop in 2024. This could
indicate a data anomaly or the
effect of external factors such
as the COVID-19 pandemic.

9
B. Crime Distribution by Day
of the Week:
 The
crime_distribution_by_dow.pn
g chart indicates a fairly
uniform distribution of crimes
across the week, with slightly
higher incidents on Fridays
and Saturdays.

10
C. Crime Distribution by Hour
of the Day:

crime_distribution_by_hour
visualization highlights peak
crime hours between midnight
and 2 AM, with another rise in
the late afternoon to early
evening

11
3. SPATIAL ANALYSIS

A. Crime Hotspots:
The crime_hotspots.png map
reveals high concentrations of
crime in certain areas of Toronto,
with noticeable hotspots.

12
B. Neighborhood Analysis:

Crime_distribution_top_neighbo
rhoods chart identifies the top
20 neighborhoods with the
highest crime rates, with West
Humber-Clairville and Moss
Park leading.

13
4. CRIME CATEGORY ANALYSIS

A. Offense Types:
 Frequency_of_offense_types
chart shows that assault is the
most frequent crime type,
followed by vehicle-related
offenses and theft.

14
B. Location and Premises Types:
 Distribution_of_crimes_by_loc
ation visualization shows that
most crimes occur in condos
and mobile homes, followed by
public spaces.

15
CORRELATION ANALYSIS

•Correlation Matrix:
 The correlation_matrix.png
visualization shows strong
correlations between certain
variables, like OBJECT ID and
REPORT_YEAR, which could
indicate data recording patterns
rather than meaningful insights.

16
MODEL ANALYSIS
Performance Metrics:
Objective:
• Precision, Recall, F1-Score:
 Evaluate the performance of a
Significant variation across different crime
Random Forest model used to types.
predict crime types based on
Higher performance for frequent categories
temporal and spatial features. (e.g., "Assault with Weapon").
Model Summary: • Macro Average:
Hyperparameter Tuning: Precision: 0.29
Optimized using Randomized Recall: 0.08
Search.Best Parameters:Number
of Estimators: 200Maximum F1-Score: 0.11
Depth: 20Minimum Samples Split: • Weighted Average:
2Minimum Samples Leaf: 1
Overall Accuracy: Precision: 0.42
 44.19% (Moderate Performance) Recall: 0.44
F1-Score: 0.38
17
1. CONFUSION MATRIX

•Insight:

The matrix highlights the model's


strengths in predicting high-
frequency crimes but shows
challenges with less frequent ones.
Normalization provides a clearer
understanding of relative
performance.

18
2. FEATURE IMPORTANCE:
• Insight:
Spatial features (LAT_WGS84,
LONG_WGS84) are the most critical
predictors, followed by temporal
features (OCC_DAY, OCC_HOUR).

• Conclusion:
The Random Forest model is a solid
foundation with an accuracy of 44.19%.
• Limitations:
The model struggles with less frequent
crime categories, indicating room for
improvement.

19
CONCLUSION
Summary of Findings:
•Temporal Trends: Crime rates increased notably post-2014, with a peak in
2019-2021.
•Spatial Trends: High crime concentration in specific neighborhoods like
Moss Park and West Humber-Clairville.
•Crime Categories: Assaults are the most prevalent crime type.
•Modeling: Random Forest effectively identifies key predictors such as
location and time.

Implications:
•Strategic Resource Deployment: Allocate more resources to high-risk
areas and peak times.
•Predictive Policing: Utilize and refine predictive models to anticipate crime
trends and allocate resources proactively. 20
RECOMMENDATIONS
Strategic Resource Deployment:
• Focus on high-crime neighborhoods such
as Moss Park and West Humber-
Clairville.
• Optimize patrol schedules to cover peak
crime hours, especially late at night.
Predictive Policing:
• Model Integration: Incorporate the
Random Forest model into daily
operations for proactive crime prevention.
• Model Enhancement: Continuously
refine the model with additional data (e.g.,
socioeconomic factors, weather patterns).
Community Engagement:
• Strengthen community relations in high-
crime areas through increased presence
and outreach programs.
• Promote public awareness on crime
prevention strategies tailored to specific 21
neighborhoods.
THANK YOU

You might also like