0% found this document useful (0 votes)

3 views

Untitled5

Uploaded by

Vineet Paun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Untitled5

Uploaded by

Vineet Paun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

p9ptp45b9

February 4, 2025

[11]: # Import necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Increase default figure size for better visualization (optional)

plt.rcParams["figure.figsize"] = (10, 6)

# --------------- Step 0: Load the dataset ---------------

# Use the appropriate encoding (e.g., 'latin-1') to avoid UnicodeDecodeError.
# Replace 'amazon.csv' with your actual file path if needed.
try:
df = pd.read_csv('amazon.csv', encoding='latin-1')
except UnicodeDecodeError:
# If latin-1 doesn't work, try iso-8859-1
df = pd.read_csv('amazon.csv', encoding='iso-8859-1')

[13]: # --------------- Task 1: Display Top 5 Rows of the Dataset ---------------

print("Top 5 rows:")
print(df.head(), "\n") # .head() displays the first 5 rows by default

Top 5 rows:
year state month number date
0 1998 Acre Janeiro 0.0 1998-01-01
1 1999 Acre Janeiro 0.0 1999-01-01
2 2000 Acre Janeiro 0.0 2000-01-01
3 2001 Acre Janeiro 0.0 2001-01-01
4 2002 Acre Janeiro 0.0 2002-01-01

[15]: # --------------- Task 2: Check Last 5 Rows ---------------

print("Last 5 rows:")
print(df.tail(), "\n") # .tail() displays the last 5 rows

Last 5 rows:
year state month number date
6449 2012 Tocantins Dezembro 128.0 2012-01-01
6450 2013 Tocantins Dezembro 85.0 2013-01-01

1
6451 2014 Tocantins Dezembro 223.0 2014-01-01
6452 2015 Tocantins Dezembro 373.0 2015-01-01
6453 2016 Tocantins Dezembro 119.0 2016-01-01

[17]: # --------------- Task 3: Find Shape of Our Dataset (Number of Rows and␣
↪Columns) ---------------

print("Shape of the dataset (rows, columns):")

print(df.shape, "\n") # returns a tuple (number_of_rows, number_of_columns)

Shape of the dataset (rows, columns):

(6454, 5)

[19]: # --------------- Task 4: Getting Information About Our Dataset ---------------

print("Dataset information:")
df.info() # prints number of rows, columns, datatypes and memory usage
print("\n") # Newline for better separation

Dataset information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6454 entries, 0 to 6453
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 year 6454 non-null int64
1 state 6454 non-null object
2 month 6454 non-null object
3 number 6454 non-null float64
4 date 6454 non-null object
dtypes: float64(1), int64(1), object(3)
memory usage: 252.2+ KB

[21]: # --------------- Task 5: Check For Duplicate Data and Drop Them ---------------
# Check for duplicate rows
num_duplicates = df.duplicated().sum()
print(f"Number of duplicate rows before dropping: {num_duplicates}")

Number of duplicate rows before dropping: 32

[23]: # Drop duplicate rows if any exist

df.drop_duplicates(inplace=True)
print("Duplicates dropped (if any).\n")

Duplicates dropped (if any).

2
[25]: # --------------- Task 6: Check Null Values in The Dataset ---------------
print("Null values in each column:")
print(df.isnull().sum(), "\n") # displays the count of null values per column

Null values in each column:

year 0
state 0
month 0
number 0
date 0
dtype: int64

[27]: # --------------- Task 7: Get Overall Statistics About the Dataframe␣

↪---------------

print("Overall statistics for numerical columns:")

print(df.describe(), "\n") # .describe() provides summary statistics

Overall statistics for numerical columns:

year number
count 6422.000000 6422.000000
mean 2007.490969 108.815178
std 5.731806 191.142482
min 1998.000000 0.000000
25% 2003.000000 3.000000
50% 2007.000000 24.497000
75% 2012.000000 114.000000
max 2017.000000 998.000000

[29]: # --------------- Task 8: Rename Month Names to English ---------------

# Assuming that the original month names are in another language (e.g., Spanish)
# Create a mapping dictionary from Spanish to English.
month_mapping = {
'Enero': 'January',
'Febrero': 'February',
'Marzo': 'March',
'Abril': 'April',
'Mayo': 'May',
'Junio': 'June',
'Julio': 'July',
'Agosto': 'August',
'Septiembre': 'September',
'Octubre': 'October',
'Noviembre': 'November',
'Diciembre': 'December'
}

3
# Check if the 'month' column exists and apply the mapping.
if 'month' in df.columns:
df['month'] = df['month'].map(month_mapping).fillna(df['month'])
print("Month names have been renamed to English.\n")
else:
print("Column 'month' not found in the dataset.\n")

Month names have been renamed to English.

[31]: # --------------- Task 9: Total Number of Fires Registered ---------------

# Assuming that each row represents one fire registration.
total_fires = len(df)
print(f"Total number of fires registered: {total_fires}\n")

Total number of fires registered: 6422

[33]: # --------------- Task 10: In Which Month Maximum Number of Forest Fires Were␣
↪Reported? ---------------

fires_by_month = df['month'].value_counts()
max_fires_month = fires_by_month.idxmax()
print(f"Month with maximum number of fires: {max_fires_month}")
print("Fires reported by month:")
print(fires_by_month, "\n")

Month with maximum number of fires: August

Fires reported by month:
month
August 540
Setembro 540
Outubro 540
Novembro 540
Junho 539
Julho 539
Janeiro 535
Fevereiro 535
Março 534
April 534
Maio 533
Dezembro 513
Name: count, dtype: int64

[35]: # --------------- Task 11: In Which Year Maximum Number of Forest Fires Was␣
↪Reported? ---------------

4
if 'year' in df.columns:
fires_by_year = df['year'].value_counts().sort_index() # sort by year for␣
↪clarity

max_fires_year = fires_by_year.idxmax()
print(f"Year with maximum number of fires: {max_fires_year}")
print("Fires reported by year:")
print(fires_by_year, "\n")
else:
print("Column 'year' not found in the dataset.\n")

Year with maximum number of fires: 1999

Fires reported by year:
year
1998 304
1999 324
2000 324
2001 321
2002 324
2003 324
2004 323
2005 324
2006 323
2007 322
2008 323
2009 323
2010 324
2011 324
2012 324
2013 323
2014 324
2015 324
2016 324
2017 296
Name: count, dtype: int64

[37]: # --------------- Task 12: In Which State Maximum Number of Forest Fires Was␣
↪Reported? ---------------

if 'state' in df.columns:
fires_by_state = df['state'].value_counts()
max_fires_state = fires_by_state.idxmax()
print(f"State with maximum number of fires: {max_fires_state}")
print("Fires reported by state:")
print(fires_by_state, "\n")
else:
print("Column 'state' not found in the dataset.\n")

5
State with maximum number of fires: Rio
Fires reported by state:
state
Rio 697
Mato Grosso 473
Paraiba 472
Acre 239
Pará 239
Sergipe 239
Sao Paulo 239
Santa Catarina 239
Roraima 239
Rondonia 239
Piau 239
Pernambuco 239
Minas Gerais 239
Alagoas 239
Maranhao 239
Goias 239
Espirito Santo 239
Distrito Federal 239
Ceara 239
Bahia 239
Amazonas 239
Amapa 239
Tocantins 239
Name: count, dtype: int64

[39]: # --------------- Task 13: Find Total Number of Fires Reported in Amazonas␣
↪---------------

if 'state' in df.columns:
# Filter rows where state is 'Amazonas'
fires_amazonas = df[df['state'] == 'Amazonas']
total_fires_amazonas = len(fires_amazonas)
print(f"Total number of fires reported in Amazonas:␣
↪{total_fires_amazonas}\n")

else:
print("Column 'state' not found in the dataset.\n")

Total number of fires reported in Amazonas: 239

[41]: # --------------- Task 14: Display Number of Fires Reported in Amazonas␣

↪(Year-Wise) ---------------

if 'year' in df.columns and 'state' in df.columns:

amazonas_yearly = fires_amazonas.groupby('year').size()

6
print("Number of fires in Amazonas (Year-wise):")
print(amazonas_yearly, "\n")
else:
print("Required columns ('state' and/or 'year') not found in the dataset.
↪\n")

Number of fires in Amazonas (Year-wise):

year
1998 12
1999 12
2000 12
2001 12
2002 12
2003 12
2004 12
2005 12
2006 12
2007 12
2008 12
2009 12
2010 12
2011 12
2012 12
2013 12
2014 12
2015 12
2016 12
2017 11
dtype: int64

[43]: # --------------- Task 15: Display Number of Fires Reported in Amazonas␣

↪(Day-Wise) ---------------

# Assuming there is a 'day' column in the dataset.

if 'day' in df.columns and 'state' in df.columns:
amazonas_daywise = fires_amazonas.groupby('day').size()
print("Number of fires in Amazonas (Day-wise):")
print(amazonas_daywise, "\n")
else:
print("Required column 'day' (and/or 'state') not found in the dataset.\n")

Required column 'day' (and/or 'state') not found in the dataset.

[45]: # --------------- Task 16: Find Total Number of Fires Reported In 2015 And␣
↪Visualize Data Based on Each 'Month' ---------------

if 'year' in df.columns and 'month' in df.columns:

7
fires_2015 = df[df['year'] == 2015]
total_fires_2015 = len(fires_2015)
print(f"Total number of fires reported in 2015: {total_fires_2015}\n")

# Group by month and count fires in 2015

fires_2015_by_month = fires_2015['month'].value_counts().
↪reindex(list(month_mapping.values()))

# The reindex ensures months are ordered as in our mapping (if all months␣
↪are present)

# Plotting the bar chart for 2015

plt.figure()
sns.barplot(x=fires_2015_by_month.index, y=fires_2015_by_month.values,␣
↪palette="viridis")

plt.title("Number of Fires Reported in 2015 by Month")

plt.xlabel("Month")
plt.ylabel("Number of Fires")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
else:
print("Required columns ('year' and/or 'month') not found in the dataset.
↪\n")

Total number of fires reported in 2015: 324

C:\Users\vinee\AppData\Local\Temp\ipykernel_23096\1305613164.py:13:
FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be removed in

v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same
effect.

sns.barplot(x=fires_2015_by_month.index, y=fires_2015_by_month.values,
palette="viridis")

8
[47]: # --------------- Task 17: Find Average Number of Fires Reported from Highest␣
↪to Lowest (State-Wise) ---------------

# Here, we assume that we want to find the average number of fires per month␣
↪for each state.

if 'state' in df.columns and 'month' in df.columns:

# First, group by state and month to count number of fires in each␣
↪state-month combination

monthly_fires = df.groupby(['state', 'month']).size().

↪reset_index(name='fires_count')

# Then, compute the average fires per month for each state
avg_fires_state = monthly_fires.groupby('state')['fires_count'].mean().
↪sort_values(ascending=False)

print("Average number of fires reported per month by state (from highest to␣
↪lowest):")

print(avg_fires_state, "\n")
else:
print("Required columns ('state' and/or 'month') not found in the dataset.
↪\n")

Average number of fires reported per month by state (from highest to lowest):
state
Rio 58.083333
Mato Grosso 39.416667

9
Paraiba 39.333333
Acre 19.916667
Sergipe 19.916667
Sao Paulo 19.916667
Santa Catarina 19.916667
Roraima 19.916667
Rondonia 19.916667
Piau 19.916667
Pernambuco 19.916667
Pará 19.916667
Minas Gerais 19.916667
Alagoas 19.916667
Maranhao 19.916667
Goias 19.916667
Espirito Santo 19.916667
Distrito Federal 19.916667
Ceara 19.916667
Bahia 19.916667
Amazonas 19.916667
Amapa 19.916667
Tocantins 19.916667
Name: fires_count, dtype: float64

[ ]:

Grandmaster Preparation Polugaevsky PDF
25% (4)
Grandmaster Preparation Polugaevsky PDF
4 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Galgotias University Engineering Entrance Examination (GEEE) - 2014
No ratings yet
Galgotias University Engineering Entrance Examination (GEEE) - 2014
1 page
Hologram Dream Sequence Manual
No ratings yet
Hologram Dream Sequence Manual
21 pages
Eco Flow
No ratings yet
Eco Flow
25 pages
Assignment1_param - converted
No ratings yet
Assignment1_param - converted
10 pages
Forest Fires Analysis
No ratings yet
Forest Fires Analysis
11 pages
Practical - With Solution - XII - IP
No ratings yet
Practical - With Solution - XII - IP
13 pages
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
Numpy
No ratings yet
Numpy
9 pages
Week 10 Intro Time Series
No ratings yet
Week 10 Intro Time Series
34 pages
Computing Programming With Python (W10)
No ratings yet
Computing Programming With Python (W10)
30 pages
Advance Operations On Dataframes
No ratings yet
Advance Operations On Dataframes
3 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Ip Final File
No ratings yet
Ip Final File
46 pages
AI Lab 05 Lab Tasks Maaz
No ratings yet
AI Lab 05 Lab Tasks Maaz
23 pages
Python Programs
No ratings yet
Python Programs
29 pages
Mastering Data Analyst Interview Scenarios
No ratings yet
Mastering Data Analyst Interview Scenarios
20 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
IP practical
No ratings yet
IP practical
24 pages
exp3 python (1)
No ratings yet
exp3 python (1)
15 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Pandas
No ratings yet
Pandas
36 pages
Lecture 3 - Pandas
No ratings yet
Lecture 3 - Pandas
37 pages
Pandas
No ratings yet
Pandas
5 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Terror Casualty Attack
No ratings yet
Terror Casualty Attack
6 pages
Pandas Notes
No ratings yet
Pandas Notes
27 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
XII IP PRACTICAL LIST 2022-23-1
No ratings yet
XII IP PRACTICAL LIST 2022-23-1
23 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Python Practical Questions
No ratings yet
Python Practical Questions
13 pages
24
No ratings yet
24
7 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Data Manipulation With Pandas - Yulei's Sandbox
No ratings yet
Data Manipulation With Pandas - Yulei's Sandbox
18 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
lab record dev
No ratings yet
lab record dev
20 pages
Hands On Data Cleaning With Pandas and NumPy
No ratings yet
Hands On Data Cleaning With Pandas and NumPy
20 pages
Pandas
No ratings yet
Pandas
13 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Numpy
No ratings yet
Numpy
9 pages
Unit 04 Pandas
No ratings yet
Unit 04 Pandas
46 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
No ratings yet
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
19 pages
PYQ Data Analysis and Visualisation Using Python GE May 2024
No ratings yet
PYQ Data Analysis and Visualisation Using Python GE May 2024
6 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
Data Cheat Sheet
No ratings yet
Data Cheat Sheet
2 pages
Pandas Library
No ratings yet
Pandas Library
5 pages
NM
No ratings yet
NM
23 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Motoroal - 802.11ac White Paper
No ratings yet
Motoroal - 802.11ac White Paper
10 pages
02 05 Ullrich Et Al Resistivity PDF
No ratings yet
02 05 Ullrich Et Al Resistivity PDF
7 pages
John R. Searle - Is The Brain's Mind A Computer Program (Scientific American 1990)
No ratings yet
John R. Searle - Is The Brain's Mind A Computer Program (Scientific American 1990)
12 pages
Twenty-Five Years With Nicholas Bourbaki, - Borel - 1949-1974
No ratings yet
Twenty-Five Years With Nicholas Bourbaki, - Borel - 1949-1974
8 pages
Vertex Cover Problem
100% (1)
Vertex Cover Problem
4 pages
Activity 5.5 Guided Notes
No ratings yet
Activity 5.5 Guided Notes
10 pages
11i 10gDB Migration
No ratings yet
11i 10gDB Migration
7 pages
Inspection of Boilers
No ratings yet
Inspection of Boilers
13 pages
Poly Works Talisman User Guide For Apple Mobile Devices
No ratings yet
Poly Works Talisman User Guide For Apple Mobile Devices
45 pages
EPEVER Datasheet Tracer LPLI PDF
No ratings yet
EPEVER Datasheet Tracer LPLI PDF
3 pages
BR Training Catalog
No ratings yet
BR Training Catalog
64 pages
LOGOTRON Control e Logic Relay
No ratings yet
LOGOTRON Control e Logic Relay
9 pages
Displacer Level Sensor Working
No ratings yet
Displacer Level Sensor Working
4 pages
IOE Assignment
No ratings yet
IOE Assignment
15 pages
Pages From ASCE 7-05 Minimum Design Loads For Buildings and Other Struc
No ratings yet
Pages From ASCE 7-05 Minimum Design Loads For Buildings and Other Struc
3 pages
Unit IV
No ratings yet
Unit IV
75 pages
20KVA UPS Battery sizings
No ratings yet
20KVA UPS Battery sizings
1 page
Biodiesel Blend: Biofuels
No ratings yet
Biodiesel Blend: Biofuels
17 pages
A Novel Grain Level Measurement Method For Silos
No ratings yet
A Novel Grain Level Measurement Method For Silos
5 pages
Prod. Ucts Det. Ails
No ratings yet
Prod. Ucts Det. Ails
15 pages
Bersabal - Experiment 25 - Ee435al
No ratings yet
Bersabal - Experiment 25 - Ee435al
8 pages
Malignant Comments Classifier Project
No ratings yet
Malignant Comments Classifier Project
30 pages
29-Horizontal Subsea Xmas Tree en
No ratings yet
29-Horizontal Subsea Xmas Tree en
2 pages
Tarifa 2011 English
No ratings yet
Tarifa 2011 English
204 pages
Earthing Transformer (ZNyn11) ZigZag Transformer Report Sample
50% (2)
Earthing Transformer (ZNyn11) ZigZag Transformer Report Sample
3 pages
38 Thermal Power Plant
No ratings yet
38 Thermal Power Plant
26 pages

Untitled5

Uploaded by

Untitled5

Uploaded by

p9ptp45b9

[11]: # Import necessary libraries

# Increase default figure size for better visualization (optional)

# --------------- Step 0: Load the dataset ---------------

[13]: # --------------- Task 1: Display Top 5 Rows of the Dataset ---------------

[15]: # --------------- Task 2: Check Last 5 Rows ---------------

print("Shape of the dataset (rows, columns):")

Shape of the dataset (rows, columns):

[19]: # --------------- Task 4: Getting Information About Our Dataset ---------------

Number of duplicate rows before dropping: 32

[23]: # Drop duplicate rows if any exist

Duplicates dropped (if any).

Null values in each column:

[27]: # --------------- Task 7: Get Overall Statistics About the Dataframe␣

print("Overall statistics for numerical columns:")

Overall statistics for numerical columns:

[29]: # --------------- Task 8: Rename Month Names to English ---------------

Month names have been renamed to English.

[31]: # --------------- Task 9: Total Number of Fires Registered ---------------

Total number of fires registered: 6422

Month with maximum number of fires: August

Year with maximum number of fires: 1999

Total number of fires reported in Amazonas: 239

[41]: # --------------- Task 14: Display Number of Fires Reported in Amazonas␣

if 'year' in df.columns and 'state' in df.columns:

Number of fires in Amazonas (Year-wise):

[43]: # --------------- Task 15: Display Number of Fires Reported in Amazonas␣

# Assuming there is a 'day' column in the dataset.

Required column 'day' (and/or 'state') not found in the dataset.

if 'year' in df.columns and 'month' in df.columns:

# Group by month and count fires in 2015

# Plotting the bar chart for 2015

plt.title("Number of Fires Reported in 2015 by Month")

Total number of fires reported in 2015: 324

Passing `palette` without assigning `hue` is deprecated and will be removed in

if 'state' in df.columns and 'month' in df.columns:

monthly_fires = df.groupby(['state', 'month']).size().

You might also like