0% found this document useful (0 votes)
3 views

Untitled5

Uploaded by

Vineet Paun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Untitled5

Uploaded by

Vineet Paun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

p9ptp45b9

February 4, 2025

[11]: # Import necessary libraries


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Increase default figure size for better visualization (optional)


plt.rcParams["figure.figsize"] = (10, 6)

# --------------- Step 0: Load the dataset ---------------


# Use the appropriate encoding (e.g., 'latin-1') to avoid UnicodeDecodeError.
# Replace 'amazon.csv' with your actual file path if needed.
try:
df = pd.read_csv('amazon.csv', encoding='latin-1')
except UnicodeDecodeError:
# If latin-1 doesn't work, try iso-8859-1
df = pd.read_csv('amazon.csv', encoding='iso-8859-1')

[13]: # --------------- Task 1: Display Top 5 Rows of the Dataset ---------------


print("Top 5 rows:")
print(df.head(), "\n") # .head() displays the first 5 rows by default

Top 5 rows:
year state month number date
0 1998 Acre Janeiro 0.0 1998-01-01
1 1999 Acre Janeiro 0.0 1999-01-01
2 2000 Acre Janeiro 0.0 2000-01-01
3 2001 Acre Janeiro 0.0 2001-01-01
4 2002 Acre Janeiro 0.0 2002-01-01

[15]: # --------------- Task 2: Check Last 5 Rows ---------------


print("Last 5 rows:")
print(df.tail(), "\n") # .tail() displays the last 5 rows

Last 5 rows:
year state month number date
6449 2012 Tocantins Dezembro 128.0 2012-01-01
6450 2013 Tocantins Dezembro 85.0 2013-01-01

1
6451 2014 Tocantins Dezembro 223.0 2014-01-01
6452 2015 Tocantins Dezembro 373.0 2015-01-01
6453 2016 Tocantins Dezembro 119.0 2016-01-01

[17]: # --------------- Task 3: Find Shape of Our Dataset (Number of Rows and␣
↪Columns) ---------------

print("Shape of the dataset (rows, columns):")


print(df.shape, "\n") # returns a tuple (number_of_rows, number_of_columns)

Shape of the dataset (rows, columns):


(6454, 5)

[19]: # --------------- Task 4: Getting Information About Our Dataset ---------------


print("Dataset information:")
df.info() # prints number of rows, columns, datatypes and memory usage
print("\n") # Newline for better separation

Dataset information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6454 entries, 0 to 6453
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 year 6454 non-null int64
1 state 6454 non-null object
2 month 6454 non-null object
3 number 6454 non-null float64
4 date 6454 non-null object
dtypes: float64(1), int64(1), object(3)
memory usage: 252.2+ KB

[21]: # --------------- Task 5: Check For Duplicate Data and Drop Them ---------------
# Check for duplicate rows
num_duplicates = df.duplicated().sum()
print(f"Number of duplicate rows before dropping: {num_duplicates}")

Number of duplicate rows before dropping: 32

[23]: # Drop duplicate rows if any exist


df.drop_duplicates(inplace=True)
print("Duplicates dropped (if any).\n")

Duplicates dropped (if any).

2
[25]: # --------------- Task 6: Check Null Values in The Dataset ---------------
print("Null values in each column:")
print(df.isnull().sum(), "\n") # displays the count of null values per column

Null values in each column:


year 0
state 0
month 0
number 0
date 0
dtype: int64

[27]: # --------------- Task 7: Get Overall Statistics About the Dataframe␣


↪---------------

print("Overall statistics for numerical columns:")


print(df.describe(), "\n") # .describe() provides summary statistics

Overall statistics for numerical columns:


year number
count 6422.000000 6422.000000
mean 2007.490969 108.815178
std 5.731806 191.142482
min 1998.000000 0.000000
25% 2003.000000 3.000000
50% 2007.000000 24.497000
75% 2012.000000 114.000000
max 2017.000000 998.000000

[29]: # --------------- Task 8: Rename Month Names to English ---------------


# Assuming that the original month names are in another language (e.g., Spanish)
# Create a mapping dictionary from Spanish to English.
month_mapping = {
'Enero': 'January',
'Febrero': 'February',
'Marzo': 'March',
'Abril': 'April',
'Mayo': 'May',
'Junio': 'June',
'Julio': 'July',
'Agosto': 'August',
'Septiembre': 'September',
'Octubre': 'October',
'Noviembre': 'November',
'Diciembre': 'December'
}

3
# Check if the 'month' column exists and apply the mapping.
if 'month' in df.columns:
df['month'] = df['month'].map(month_mapping).fillna(df['month'])
print("Month names have been renamed to English.\n")
else:
print("Column 'month' not found in the dataset.\n")

Month names have been renamed to English.

[31]: # --------------- Task 9: Total Number of Fires Registered ---------------


# Assuming that each row represents one fire registration.
total_fires = len(df)
print(f"Total number of fires registered: {total_fires}\n")

Total number of fires registered: 6422

[33]: # --------------- Task 10: In Which Month Maximum Number of Forest Fires Were␣
↪Reported? ---------------

fires_by_month = df['month'].value_counts()
max_fires_month = fires_by_month.idxmax()
print(f"Month with maximum number of fires: {max_fires_month}")
print("Fires reported by month:")
print(fires_by_month, "\n")

Month with maximum number of fires: August


Fires reported by month:
month
August 540
Setembro 540
Outubro 540
Novembro 540
Junho 539
Julho 539
Janeiro 535
Fevereiro 535
Março 534
April 534
Maio 533
Dezembro 513
Name: count, dtype: int64

[35]: # --------------- Task 11: In Which Year Maximum Number of Forest Fires Was␣
↪Reported? ---------------

4
if 'year' in df.columns:
fires_by_year = df['year'].value_counts().sort_index() # sort by year for␣
↪clarity

max_fires_year = fires_by_year.idxmax()
print(f"Year with maximum number of fires: {max_fires_year}")
print("Fires reported by year:")
print(fires_by_year, "\n")
else:
print("Column 'year' not found in the dataset.\n")

Year with maximum number of fires: 1999


Fires reported by year:
year
1998 304
1999 324
2000 324
2001 321
2002 324
2003 324
2004 323
2005 324
2006 323
2007 322
2008 323
2009 323
2010 324
2011 324
2012 324
2013 323
2014 324
2015 324
2016 324
2017 296
Name: count, dtype: int64

[37]: # --------------- Task 12: In Which State Maximum Number of Forest Fires Was␣
↪Reported? ---------------

if 'state' in df.columns:
fires_by_state = df['state'].value_counts()
max_fires_state = fires_by_state.idxmax()
print(f"State with maximum number of fires: {max_fires_state}")
print("Fires reported by state:")
print(fires_by_state, "\n")
else:
print("Column 'state' not found in the dataset.\n")

5
State with maximum number of fires: Rio
Fires reported by state:
state
Rio 697
Mato Grosso 473
Paraiba 472
Acre 239
Pará 239
Sergipe 239
Sao Paulo 239
Santa Catarina 239
Roraima 239
Rondonia 239
Piau 239
Pernambuco 239
Minas Gerais 239
Alagoas 239
Maranhao 239
Goias 239
Espirito Santo 239
Distrito Federal 239
Ceara 239
Bahia 239
Amazonas 239
Amapa 239
Tocantins 239
Name: count, dtype: int64

[39]: # --------------- Task 13: Find Total Number of Fires Reported in Amazonas␣
↪---------------

if 'state' in df.columns:
# Filter rows where state is 'Amazonas'
fires_amazonas = df[df['state'] == 'Amazonas']
total_fires_amazonas = len(fires_amazonas)
print(f"Total number of fires reported in Amazonas:␣
↪{total_fires_amazonas}\n")

else:
print("Column 'state' not found in the dataset.\n")

Total number of fires reported in Amazonas: 239

[41]: # --------------- Task 14: Display Number of Fires Reported in Amazonas␣


↪(Year-Wise) ---------------

if 'year' in df.columns and 'state' in df.columns:


amazonas_yearly = fires_amazonas.groupby('year').size()

6
print("Number of fires in Amazonas (Year-wise):")
print(amazonas_yearly, "\n")
else:
print("Required columns ('state' and/or 'year') not found in the dataset.
↪\n")

Number of fires in Amazonas (Year-wise):


year
1998 12
1999 12
2000 12
2001 12
2002 12
2003 12
2004 12
2005 12
2006 12
2007 12
2008 12
2009 12
2010 12
2011 12
2012 12
2013 12
2014 12
2015 12
2016 12
2017 11
dtype: int64

[43]: # --------------- Task 15: Display Number of Fires Reported in Amazonas␣


↪(Day-Wise) ---------------

# Assuming there is a 'day' column in the dataset.


if 'day' in df.columns and 'state' in df.columns:
amazonas_daywise = fires_amazonas.groupby('day').size()
print("Number of fires in Amazonas (Day-wise):")
print(amazonas_daywise, "\n")
else:
print("Required column 'day' (and/or 'state') not found in the dataset.\n")

Required column 'day' (and/or 'state') not found in the dataset.

[45]: # --------------- Task 16: Find Total Number of Fires Reported In 2015 And␣
↪Visualize Data Based on Each 'Month' ---------------

if 'year' in df.columns and 'month' in df.columns:

7
fires_2015 = df[df['year'] == 2015]
total_fires_2015 = len(fires_2015)
print(f"Total number of fires reported in 2015: {total_fires_2015}\n")

# Group by month and count fires in 2015


fires_2015_by_month = fires_2015['month'].value_counts().
↪reindex(list(month_mapping.values()))

# The reindex ensures months are ordered as in our mapping (if all months␣
↪are present)

# Plotting the bar chart for 2015


plt.figure()
sns.barplot(x=fires_2015_by_month.index, y=fires_2015_by_month.values,␣
↪palette="viridis")

plt.title("Number of Fires Reported in 2015 by Month")


plt.xlabel("Month")
plt.ylabel("Number of Fires")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
else:
print("Required columns ('year' and/or 'month') not found in the dataset.
↪\n")

Total number of fires reported in 2015: 324

C:\Users\vinee\AppData\Local\Temp\ipykernel_23096\1305613164.py:13:
FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be removed in


v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same
effect.

sns.barplot(x=fires_2015_by_month.index, y=fires_2015_by_month.values,
palette="viridis")

8
[47]: # --------------- Task 17: Find Average Number of Fires Reported from Highest␣
↪to Lowest (State-Wise) ---------------

# Here, we assume that we want to find the average number of fires per month␣
↪for each state.

if 'state' in df.columns and 'month' in df.columns:


# First, group by state and month to count number of fires in each␣
↪state-month combination

monthly_fires = df.groupby(['state', 'month']).size().


↪reset_index(name='fires_count')

# Then, compute the average fires per month for each state
avg_fires_state = monthly_fires.groupby('state')['fires_count'].mean().
↪sort_values(ascending=False)

print("Average number of fires reported per month by state (from highest to␣
↪lowest):")

print(avg_fires_state, "\n")
else:
print("Required columns ('state' and/or 'month') not found in the dataset.
↪\n")

Average number of fires reported per month by state (from highest to lowest):
state
Rio 58.083333
Mato Grosso 39.416667

9
Paraiba 39.333333
Acre 19.916667
Sergipe 19.916667
Sao Paulo 19.916667
Santa Catarina 19.916667
Roraima 19.916667
Rondonia 19.916667
Piau 19.916667
Pernambuco 19.916667
Pará 19.916667
Minas Gerais 19.916667
Alagoas 19.916667
Maranhao 19.916667
Goias 19.916667
Espirito Santo 19.916667
Distrito Federal 19.916667
Ceara 19.916667
Bahia 19.916667
Amazonas 19.916667
Amapa 19.916667
Tocantins 19.916667
Name: fires_count, dtype: float64

[ ]:

10

You might also like