Untitled5
Untitled5
February 4, 2025
Top 5 rows:
year state month number date
0 1998 Acre Janeiro 0.0 1998-01-01
1 1999 Acre Janeiro 0.0 1999-01-01
2 2000 Acre Janeiro 0.0 2000-01-01
3 2001 Acre Janeiro 0.0 2001-01-01
4 2002 Acre Janeiro 0.0 2002-01-01
Last 5 rows:
year state month number date
6449 2012 Tocantins Dezembro 128.0 2012-01-01
6450 2013 Tocantins Dezembro 85.0 2013-01-01
1
6451 2014 Tocantins Dezembro 223.0 2014-01-01
6452 2015 Tocantins Dezembro 373.0 2015-01-01
6453 2016 Tocantins Dezembro 119.0 2016-01-01
[17]: # --------------- Task 3: Find Shape of Our Dataset (Number of Rows and␣
↪Columns) ---------------
Dataset information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6454 entries, 0 to 6453
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 year 6454 non-null int64
1 state 6454 non-null object
2 month 6454 non-null object
3 number 6454 non-null float64
4 date 6454 non-null object
dtypes: float64(1), int64(1), object(3)
memory usage: 252.2+ KB
[21]: # --------------- Task 5: Check For Duplicate Data and Drop Them ---------------
# Check for duplicate rows
num_duplicates = df.duplicated().sum()
print(f"Number of duplicate rows before dropping: {num_duplicates}")
2
[25]: # --------------- Task 6: Check Null Values in The Dataset ---------------
print("Null values in each column:")
print(df.isnull().sum(), "\n") # displays the count of null values per column
3
# Check if the 'month' column exists and apply the mapping.
if 'month' in df.columns:
df['month'] = df['month'].map(month_mapping).fillna(df['month'])
print("Month names have been renamed to English.\n")
else:
print("Column 'month' not found in the dataset.\n")
[33]: # --------------- Task 10: In Which Month Maximum Number of Forest Fires Were␣
↪Reported? ---------------
fires_by_month = df['month'].value_counts()
max_fires_month = fires_by_month.idxmax()
print(f"Month with maximum number of fires: {max_fires_month}")
print("Fires reported by month:")
print(fires_by_month, "\n")
[35]: # --------------- Task 11: In Which Year Maximum Number of Forest Fires Was␣
↪Reported? ---------------
4
if 'year' in df.columns:
fires_by_year = df['year'].value_counts().sort_index() # sort by year for␣
↪clarity
max_fires_year = fires_by_year.idxmax()
print(f"Year with maximum number of fires: {max_fires_year}")
print("Fires reported by year:")
print(fires_by_year, "\n")
else:
print("Column 'year' not found in the dataset.\n")
[37]: # --------------- Task 12: In Which State Maximum Number of Forest Fires Was␣
↪Reported? ---------------
if 'state' in df.columns:
fires_by_state = df['state'].value_counts()
max_fires_state = fires_by_state.idxmax()
print(f"State with maximum number of fires: {max_fires_state}")
print("Fires reported by state:")
print(fires_by_state, "\n")
else:
print("Column 'state' not found in the dataset.\n")
5
State with maximum number of fires: Rio
Fires reported by state:
state
Rio 697
Mato Grosso 473
Paraiba 472
Acre 239
Pará 239
Sergipe 239
Sao Paulo 239
Santa Catarina 239
Roraima 239
Rondonia 239
Piau 239
Pernambuco 239
Minas Gerais 239
Alagoas 239
Maranhao 239
Goias 239
Espirito Santo 239
Distrito Federal 239
Ceara 239
Bahia 239
Amazonas 239
Amapa 239
Tocantins 239
Name: count, dtype: int64
[39]: # --------------- Task 13: Find Total Number of Fires Reported in Amazonas␣
↪---------------
if 'state' in df.columns:
# Filter rows where state is 'Amazonas'
fires_amazonas = df[df['state'] == 'Amazonas']
total_fires_amazonas = len(fires_amazonas)
print(f"Total number of fires reported in Amazonas:␣
↪{total_fires_amazonas}\n")
else:
print("Column 'state' not found in the dataset.\n")
6
print("Number of fires in Amazonas (Year-wise):")
print(amazonas_yearly, "\n")
else:
print("Required columns ('state' and/or 'year') not found in the dataset.
↪\n")
[45]: # --------------- Task 16: Find Total Number of Fires Reported In 2015 And␣
↪Visualize Data Based on Each 'Month' ---------------
7
fires_2015 = df[df['year'] == 2015]
total_fires_2015 = len(fires_2015)
print(f"Total number of fires reported in 2015: {total_fires_2015}\n")
# The reindex ensures months are ordered as in our mapping (if all months␣
↪are present)
C:\Users\vinee\AppData\Local\Temp\ipykernel_23096\1305613164.py:13:
FutureWarning:
sns.barplot(x=fires_2015_by_month.index, y=fires_2015_by_month.values,
palette="viridis")
8
[47]: # --------------- Task 17: Find Average Number of Fires Reported from Highest␣
↪to Lowest (State-Wise) ---------------
# Here, we assume that we want to find the average number of fires per month␣
↪for each state.
# Then, compute the average fires per month for each state
avg_fires_state = monthly_fires.groupby('state')['fires_count'].mean().
↪sort_values(ascending=False)
print("Average number of fires reported per month by state (from highest to␣
↪lowest):")
print(avg_fires_state, "\n")
else:
print("Required columns ('state' and/or 'month') not found in the dataset.
↪\n")
Average number of fires reported per month by state (from highest to lowest):
state
Rio 58.083333
Mato Grosso 39.416667
9
Paraiba 39.333333
Acre 19.916667
Sergipe 19.916667
Sao Paulo 19.916667
Santa Catarina 19.916667
Roraima 19.916667
Rondonia 19.916667
Piau 19.916667
Pernambuco 19.916667
Pará 19.916667
Minas Gerais 19.916667
Alagoas 19.916667
Maranhao 19.916667
Goias 19.916667
Espirito Santo 19.916667
Distrito Federal 19.916667
Ceara 19.916667
Bahia 19.916667
Amazonas 19.916667
Amapa 19.916667
Tocantins 19.916667
Name: fires_count, dtype: float64
[ ]:
10