0% found this document useful (0 votes)
2 views

L-2 (Data Frame Part 1).Ipynb - Colab

The document is a Jupyter notebook that demonstrates basic operations on data frames using Python's pandas library. It covers importing data, renaming columns, finding smallest and largest values, filtering data, and performing statistical functions like mean, median, mode, and standard deviation. Additionally, it includes examples of cumulative sums and products, as well as correlation and covariance calculations.

Uploaded by

ashishpal2804
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

L-2 (Data Frame Part 1).Ipynb - Colab

The document is a Jupyter notebook that demonstrates basic operations on data frames using Python's pandas library. It covers importing data, renaming columns, finding smallest and largest values, filtering data, and performing statistical functions like mean, median, mode, and standard deviation. Additionally, it includes examples of cumulative sums and products, as well as correlation and covariance calculations.

Uploaded by

ashishpal2804
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

3/14/25, 4:35 PM L-2 (Data Frame Part 1).

ipynb - Colab

keyboard_arrow_down Python Data Frames Part 1


Basic Operations on Data

# import libraries
import numpy as np
import pandas as pd

from google.colab import drive


drive.mount('/content/drive', force_remount=True)
marks = pd.read_csv("/content/drive/MyDrive/Data_Analytics/Test_data.csv")
print(marks)

Mounted at /content/drive
RollNo Name Eco Maths
0 1 Arnab 18 57
1 2 Kritika 23 45
2 3 Divyam 51 37
3 4 Vivaan 40 60
4 5 Aaaroosh 18 27

marks.columns=['ROLLNO', 'NAME', 'ECONOMICS', 'MATHS'] # Renaming of Column


marks

ROLLNO NAME ECONOMICS MATHS

0 1 Arnab 18 57

1 2 Kritika 23 45

2 3 Divyam 51 37

3 4 Vivaan 40 60

4 5 Aaaroosh 18 27
 

# nsmallest(n, column_label) gives the n smallest values in the column, creates a dataframe as its result
least2 = marks.nsmallest(2, "ECONOMICS")
print(least2)

ROLLNO NAME ECONOMICS MATHS


0 1 Arnab 18 57
4 5 Aaaroosh 18 27

# nlargest(n, column_label) gives the n largest values in the column, creates a dataframe as its result
great2 = marks.nlargest(2, "MATHS")
print(great2)

ROLLNO NAME ECONOMICS MATHS


3 4 Vivaan 40 60
0 1 Arnab 18 57

# between checks for values in a range


result = marks["MATHS"].between(35, 45, "both")
print(marks[result]) # Filtering the dataframe on a boolean series

ROLLNO NAME ECONOMICS MATHS


1 2 Kritika 23 45
2 3 Divyam 51 37

print(result)
print(type(result))

0 False
1 True
2 True
3 False
4 False
Name: Maths, dtype: bool
<class 'pandas.core.series.Series'>

datadic = {"P":[2, 9, 8, 7],


"Q":[1, 20, 12, 5],
"R":[14, 30, 18, 52],
"S":[52, 46, 12, 83]}
df = pd.DataFrame(datadic)
df
https://colab.research.google.com/drive/1Bos1V9K5-scUxXEBk7rNViiSpMC4k0wJ#scrollTo=A6cdItmM9Kkg&printMode=true 1/5
3/14/25, 4:35 PM L-2 (Data Frame Part 1).ipynb - Colab

P Q R S

0 2 1 14 52

1 9 20 30 46

2 8 12 18 12

3 7 5 52 83
 

count = df['P'].count()
print(count)

max_val=df['P'].max()
print ("Maximum Value of one Column P\n", max_val)
max_row=df.max(axis=1)
print ("Maximum Value Rowwise\n", max_row)
max_col=df.max(axis=0)
print ("Maximum Value Columnwise\n", max_col)

Maximum Value of one Column P


9
Maximum Value Rowwise
0 52
1 46
2 18
3 83
dtype: int64
Maximum Value Columnwise
P 9
Q 20
R 52
S 83
dtype: int64

min_val=df['P'].min()
print ("Minimum Value of one Column P\n", min_val)
min_row=df.min(axis=1)
print ("Minimum Value Rowwise\n", min_row)
min_col=df.min(axis=0)
print ("Minimum Value Columnwise\n", min_col)

Minimum Value of one Column P


2
Minimum Value Rowwise
0 1
1 9
2 8
3 5
dtype: int64
Minimum Value Columnwise
P 2
Q 1
R 14
S 12
dtype: int64

Basic Statistical Functions

mean_val=df['P'].mean()
print ("Mean Value of one Column P\n", mean_val)
mean_row=df.mean(axis=1)
print ("Mean Value Rowwise\n", mean_row)
mean_col=df.mean(axis=0)
print ("Mean Value Columnwise\n", mean_col)

Mean Value of one Column P


6.5
Mean Value Rowwise
0 17.25
1 26.25
2 12.50
3 36.75
dtype: float64
Mean Value Columnwise
P 6.50
Q 9.50
R 28.50
S 48.25

https://colab.research.google.com/drive/1Bos1V9K5-scUxXEBk7rNViiSpMC4k0wJ#scrollTo=A6cdItmM9Kkg&printMode=true 2/5
3/14/25, 4:35 PM L-2 (Data Frame Part 1).ipynb - Colab
dtype: float64

# mean() function on a dataframe which has Na values.


df = pd.DataFrame({"Anu":[12, 4, 5, None, 1], "Bina":[7, 2, 54, 3, None],
"Chitra":[20, 16, 11, 3, 8], "Deep":[14, 3, None, 2, 6]})
print(df)
# skip the Na values while finding the mean
df.mean(axis = 1, skipna = True) # Mean over the column axis.

Anu Bina Chitra Deep


0 12.0 7.0 20 14.0
1 4.0 2.0 16 3.0
2 5.0 54.0 11 NaN
3 NaN 3.0 3 2.0
4 1.0 NaN 8 6.0
0 13.250000
1 6.250000
2 23.333333
3 2.666667
4 5.000000
dtype: float64

mode_val=df['P'].mode()
print ("Mode Value of one Column P\n", mode_val)
mode_row=df.mode(axis=1)
print ("Mode Value Rowwise\n", mode_row)
mode_col=df.mode(axis=0)
print ("Mode Value Columnwise\n", mode_col)

Mode Value of one Column P


0 2
1 7
2 8
3 9
Name: P, dtype: int64
Mode Value Rowwise
0 1 2 3
0 1.0 2.0 14.0 52.0
1 9.0 20.0 30.0 46.0
2 12.0 NaN NaN NaN
3 5.0 7.0 52.0 83.0
Mode Value Columnwise
P Q R S
0 2 1 14 12
1 7 5 18 46
2 8 12 30 52
3 9 20 52 83

median_val=df['P'].median()
print ("Median Value of one Column P\n", median_val)
median_row=df.median(axis=1)
print ("Median Value Rowwise\n", median_row)
median_col=df.median(axis=0)
print ("Median Value Columnwise\n", median_col)

Median Value of one Column P


7.5
Median Value Rowwise
0 8.0
1 25.0
2 12.0
3 29.5
dtype: float64
Median Value Columnwise
P 7.5
Q 8.5
R 24.0
S 49.0
dtype: float64

std_val=df['P'].std()
print ("Standard Deviation Value of one Column P\n",
std_val)
std_row=df.std(axis=1)
print ("Standard Deviation Value Rowwise\n", std_row)
std_col=df.std(axis=0)
print ("Standard Deviation Value Columnwise\n", std_col)

Standard Deviation Value of one Column P


3.1091263510296048
Standard Deviation Value Rowwise
0 23.907809
1 15.713582
2 4.123106

https://colab.research.google.com/drive/1Bos1V9K5-scUxXEBk7rNViiSpMC4k0wJ#scrollTo=A6cdItmM9Kkg&printMode=true 3/5
3/14/25, 4:35 PM L-2 (Data Frame Part 1).ipynb - Colab
3 37.703890
dtype: float64
Standard Deviation Value Columnwise
P 3.109126
Q 8.346656
R 17.078251
S 29.101833
dtype: float64

print(df.cov())

P Q R S
P 9.666667 22.000000 21.666667 -19.833333
Q 22.000000 69.666667 2.333333 -100.833333
R 21.666667 2.333333 291.666667 379.833333
S -19.833333 -100.833333 379.833333 846.916667

df['P'].cov(df['Q']) # Correlation between two specific columns

print(df.corr())

P Q R S
P 1.000000 0.847758 0.408047 -0.219198
Q 0.847758 1.000000 0.016369 -0.415118
R 0.408047 0.016369 1.000000 0.764239
S -0.219198 -0.415118 0.764239 1.000000

df['P'].corr(df['Q']) # Correlation between two specific columns

print(df)
print(df.cumsum(axis=0))

P Q R S
0 2 1 14 52
1 9 20 30 46
2 8 12 18 12
3 7 5 52 83
P Q R S
0 2 1 14 52
1 11 21 44 98
2 19 33 62 110
3 26 38 114 193

print(df)
print(df.cumsum(axis=1))

P Q R S
0 2 1 14 52
1 9 20 30 46
2 8 12 18 12
3 7 5 52 83
P Q R S
0 2 3 17 69
1 9 29 59 105
2 8 20 38 50
3 7 12 64 147

print(df)
print(df.cumprod(axis=0))

P Q R S
0 2 1 14 52
1 9 20 30 46
2 8 12 18 12
3 7 5 52 83
P Q R S
0 2 1 14 52
1 18 20 420 2392
2 144 240 7560 28704
3 1008 1200 393120 2382432

https://colab.research.google.com/drive/1Bos1V9K5-scUxXEBk7rNViiSpMC4k0wJ#scrollTo=A6cdItmM9Kkg&printMode=true 4/5
3/14/25, 4:35 PM L-2 (Data Frame Part 1).ipynb - Colab

https://colab.research.google.com/drive/1Bos1V9K5-scUxXEBk7rNViiSpMC4k0wJ#scrollTo=A6cdItmM9Kkg&printMode=true 5/5

You might also like