0% found this document useful (0 votes)

22 views6 pages

Prac3.ipynb (Auto-R) - JupyterLab

The document demonstrates how to handle missing values in pandas DataFrames. It shows how to identify and count missing values, drop columns or rows based on conditions, sort the DataFrame, and remove duplicate values.

Uploaded by

Aaryan Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

Prac3.ipynb (Auto-R) - JupyterLab

Uploaded by

Aaryan Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

In [3]: import pandas as pd

import numpy as np

# Set seed for reproducibility

np.random.seed(0)

# Create a DataFrame with 3 columns and 50 rows of random numeric data

data = np.random.rand(50, 3)
df = pd.DataFrame(data, columns=['A', 'B', 'C'])

# Replace 10% of the values by null values whose index positions are generated using random
null_indices = np.random.choice(df.index, size=int(0.1 * len(df)), replace=False)
df.iloc[null_indices] = np.nan

In [4]: # a. Identify and count missing values in a DataFrame

missing_values_count = df.isnull().sum()
print("Missing Values Count:")
print(missing_values_count)

Missing Values Count:

A 5
B 5
C 5
dtype: int64

In [5]: # b. Drop the column having more than 5 null values

df = df.dropna(thresh=len(df) - 5, axis=1)
print("\nDataFrame after dropping columns with more than 5 null values:")
print(df)
DataFrame after dropping columns with more than 5 null values:
A B C
0 0.548814 0.715189 0.602763
1 0.544883 0.423655 0.645894
2 0.437587 0.891773 0.963663
3 0.383442 0.791725 0.528895
4 0.568045 0.925597 0.071036
5 0.087129 0.020218 0.832620
6 NaN NaN NaN
7 0.799159 0.461479 0.780529
8 0.118274 0.639921 0.143353
9 0.944669 0.521848 0.414662
10 0.264556 0.774234 0.456150
11 0.568434 0.018790 0.617635
12 0.612096 0.616934 0.943748
13 0.681820 0.359508 0.437032
14 NaN NaN NaN
15 0.670638 0.210383 0.128926
16 NaN NaN NaN
17 0.438602 0.988374 0.102045
18 0.208877 0.161310 0.653108
19 0.253292 0.466311 0.244426
20 0.158970 0.110375 0.656330
21 0.138183 0.196582 0.368725
22 0.820993 0.097101 0.837945
23 0.096098 0.976459 0.468651
24 0.976761 0.604846 0.739264
25 0.039188 0.282807 0.120197
26 0.296140 0.118728 0.317983
27 0.414263 0.064147 0.692472
28 0.566601 0.265389 0.523248
29 0.093941 0.575946 0.929296
30 0.318569 0.667410 0.131798
31 0.716327 0.289406 0.183191
32 0.586513 0.020108 0.828940
33 0.004695 0.677817 0.270008
34 0.735194 0.962189 0.248753
35 0.576157 0.592042 0.572252
36 0.223082 0.952749 0.447125
37 0.846409 0.699479 0.297437
38 0.813798 0.396506 0.881103
39 0.581273 0.881735 0.692532
40 0.725254 0.501324 0.956084
41 0.643990 0.423855 0.606393
42 0.019193 0.301575 0.660174
43 NaN NaN NaN
44 0.135474 0.298282 0.569965
45 0.590873 0.574325 0.653201
46 0.652103 0.431418 0.896547
47 0.367562 0.435865 0.891923
48 0.806194 0.703889 0.100227
49 NaN NaN NaN

In [7]: # c. Identify the row label having the maximum sum of all values in a row and drop that row
max_sum_row_label = df.sum(axis=1).idxmax()
df = df.drop(index=max_sum_row_label)
print("\nDataFrame after dropping row with maximum sum of values:")
print(df)
DataFrame after dropping row with maximum sum of values:
A B C
0 0.548814 0.715189 0.602763
1 0.544883 0.423655 0.645894
3 0.383442 0.791725 0.528895
4 0.568045 0.925597 0.071036
5 0.087129 0.020218 0.832620
6 NaN NaN NaN
7 0.799159 0.461479 0.780529
8 0.118274 0.639921 0.143353
9 0.944669 0.521848 0.414662
10 0.264556 0.774234 0.456150
11 0.568434 0.018790 0.617635
12 0.612096 0.616934 0.943748
13 0.681820 0.359508 0.437032
14 NaN NaN NaN
15 0.670638 0.210383 0.128926
16 NaN NaN NaN
17 0.438602 0.988374 0.102045
18 0.208877 0.161310 0.653108
19 0.253292 0.466311 0.244426
20 0.158970 0.110375 0.656330
21 0.138183 0.196582 0.368725
22 0.820993 0.097101 0.837945
23 0.096098 0.976459 0.468651
25 0.039188 0.282807 0.120197
26 0.296140 0.118728 0.317983
27 0.414263 0.064147 0.692472
28 0.566601 0.265389 0.523248
29 0.093941 0.575946 0.929296
30 0.318569 0.667410 0.131798
31 0.716327 0.289406 0.183191
32 0.586513 0.020108 0.828940
33 0.004695 0.677817 0.270008
34 0.735194 0.962189 0.248753
35 0.576157 0.592042 0.572252
36 0.223082 0.952749 0.447125
37 0.846409 0.699479 0.297437
38 0.813798 0.396506 0.881103
39 0.581273 0.881735 0.692532
40 0.725254 0.501324 0.956084
41 0.643990 0.423855 0.606393
42 0.019193 0.301575 0.660174
43 NaN NaN NaN
44 0.135474 0.298282 0.569965
45 0.590873 0.574325 0.653201
46 0.652103 0.431418 0.896547
47 0.367562 0.435865 0.891923
48 0.806194 0.703889 0.100227
49 NaN NaN NaN

In [8]: # d. Sort the DataFrame on the basis of the first column

df_sorted = df.sort_values(by='A')
print("\nDataFrame sorted on the basis of the first column:")
print(df_sorted)
DataFrame sorted on the basis of the first column:
A B C
33 0.004695 0.677817 0.270008
42 0.019193 0.301575 0.660174
25 0.039188 0.282807 0.120197
5 0.087129 0.020218 0.832620
29 0.093941 0.575946 0.929296
23 0.096098 0.976459 0.468651
8 0.118274 0.639921 0.143353
44 0.135474 0.298282 0.569965
21 0.138183 0.196582 0.368725
20 0.158970 0.110375 0.656330
18 0.208877 0.161310 0.653108
36 0.223082 0.952749 0.447125
19 0.253292 0.466311 0.244426
10 0.264556 0.774234 0.456150
26 0.296140 0.118728 0.317983
30 0.318569 0.667410 0.131798
47 0.367562 0.435865 0.891923
3 0.383442 0.791725 0.528895
27 0.414263 0.064147 0.692472
17 0.438602 0.988374 0.102045
1 0.544883 0.423655 0.645894
0 0.548814 0.715189 0.602763
28 0.566601 0.265389 0.523248
4 0.568045 0.925597 0.071036
11 0.568434 0.018790 0.617635
35 0.576157 0.592042 0.572252
39 0.581273 0.881735 0.692532
32 0.586513 0.020108 0.828940
45 0.590873 0.574325 0.653201
12 0.612096 0.616934 0.943748
41 0.643990 0.423855 0.606393
46 0.652103 0.431418 0.896547
15 0.670638 0.210383 0.128926
13 0.681820 0.359508 0.437032
31 0.716327 0.289406 0.183191
40 0.725254 0.501324 0.956084
34 0.735194 0.962189 0.248753
7 0.799159 0.461479 0.780529
48 0.806194 0.703889 0.100227
38 0.813798 0.396506 0.881103
22 0.820993 0.097101 0.837945
37 0.846409 0.699479 0.297437
9 0.944669 0.521848 0.414662
6 NaN NaN NaN
14 NaN NaN NaN
16 NaN NaN NaN
43 NaN NaN NaN
49 NaN NaN NaN

In [9]: # e. Remove all duplicates from the first column

df_unique = df.drop_duplicates(subset='A')
print("\nDataFrame after removing duplicates from the first column:")
print(df_unique)
DataFrame after removing duplicates from the first column:
A B C
0 0.548814 0.715189 0.602763
1 0.544883 0.423655 0.645894
3 0.383442 0.791725 0.528895
4 0.568045 0.925597 0.071036
5 0.087129 0.020218 0.832620
6 NaN NaN NaN
7 0.799159 0.461479 0.780529
8 0.118274 0.639921 0.143353
9 0.944669 0.521848 0.414662
10 0.264556 0.774234 0.456150
11 0.568434 0.018790 0.617635
12 0.612096 0.616934 0.943748
13 0.681820 0.359508 0.437032
15 0.670638 0.210383 0.128926
17 0.438602 0.988374 0.102045
18 0.208877 0.161310 0.653108
19 0.253292 0.466311 0.244426
20 0.158970 0.110375 0.656330
21 0.138183 0.196582 0.368725
22 0.820993 0.097101 0.837945
23 0.096098 0.976459 0.468651
25 0.039188 0.282807 0.120197
26 0.296140 0.118728 0.317983
27 0.414263 0.064147 0.692472
28 0.566601 0.265389 0.523248
29 0.093941 0.575946 0.929296
30 0.318569 0.667410 0.131798
31 0.716327 0.289406 0.183191
32 0.586513 0.020108 0.828940
33 0.004695 0.677817 0.270008
34 0.735194 0.962189 0.248753
35 0.576157 0.592042 0.572252
36 0.223082 0.952749 0.447125
37 0.846409 0.699479 0.297437
38 0.813798 0.396506 0.881103
39 0.581273 0.881735 0.692532
40 0.725254 0.501324 0.956084
41 0.643990 0.423855 0.606393
42 0.019193 0.301575 0.660174
44 0.135474 0.298282 0.569965
45 0.590873 0.574325 0.653201
46 0.652103 0.431418 0.896547
47 0.367562 0.435865 0.891923
48 0.806194 0.703889 0.100227

In [10]: # f. Find the correlation between the first and second column and covariance between the sec
correlation_AB = df['A'].corr(df['B'])
covariance_BC = df['B'].cov(df['C'])
print("\nCorrelation between the first and second column:",correlation_AB)
print("Covariance between the second and third column:",covariance_BC)

Correlation between the first and second column: 0.05849765987946871

Covariance between the second and third column: -0.025965685609794554

In [16]: # g. Discretize the second column and create 5 bins

import pandas as pd
import numpy as np

# Assuming df is your DataFrame and 'B' is the second column

df['B_bins'] = pd.qcut(df['B'], q=5, labels=False)
print("\nDataFrame with discretized second column:")
print(df)
DataFrame with discretized second column:
A B C B_bins
0 0.548814 0.715189 0.602763 4.0
1 0.544883 0.423655 0.645894 2.0
3 0.383442 0.791725 0.528895 4.0
4 0.568045 0.925597 0.071036 4.0
5 0.087129 0.020218 0.832620 0.0
6 NaN NaN NaN NaN
7 0.799159 0.461479 0.780529 2.0
8 0.118274 0.639921 0.143353 3.0
9 0.944669 0.521848 0.414662 2.0
10 0.264556 0.774234 0.456150 4.0
11 0.568434 0.018790 0.617635 0.0
12 0.612096 0.616934 0.943748 3.0
13 0.681820 0.359508 0.437032 1.0
14 NaN NaN NaN NaN
15 0.670638 0.210383 0.128926 1.0
16 NaN NaN NaN NaN
17 0.438602 0.988374 0.102045 4.0
18 0.208877 0.161310 0.653108 0.0
19 0.253292 0.466311 0.244426 2.0
20 0.158970 0.110375 0.656330 0.0
21 0.138183 0.196582 0.368725 0.0
22 0.820993 0.097101 0.837945 0.0
23 0.096098 0.976459 0.468651 4.0
25 0.039188 0.282807 0.120197 1.0
26 0.296140 0.118728 0.317983 0.0
27 0.414263 0.064147 0.692472 0.0
28 0.566601 0.265389 0.523248 1.0
29 0.093941 0.575946 0.929296 3.0
30 0.318569 0.667410 0.131798 3.0
31 0.716327 0.289406 0.183191 1.0
32 0.586513 0.020108 0.828940 0.0
33 0.004695 0.677817 0.270008 3.0
34 0.735194 0.962189 0.248753 4.0
35 0.576157 0.592042 0.572252 3.0
36 0.223082 0.952749 0.447125 4.0
37 0.846409 0.699479 0.297437 3.0
38 0.813798 0.396506 0.881103 1.0
39 0.581273 0.881735 0.692532 4.0
40 0.725254 0.501324 0.956084 2.0
41 0.643990 0.423855 0.606393 2.0
42 0.019193 0.301575 0.660174 1.0
43 NaN NaN NaN NaN
44 0.135474 0.298282 0.569965 1.0
45 0.590873 0.574325 0.653201 2.0
46 0.652103 0.431418 0.896547 2.0
47 0.367562 0.435865 0.891923 2.0
48 0.806194 0.703889 0.100227 3.0
49 NaN NaN NaN NaN

In [17]: print('By- Aaryan Pandey 13591')

By- Aaryan Pandey 13591

In [ ]:

Oasis Montaj Complete Workflow PDF
No ratings yet
Oasis Montaj Complete Workflow PDF
0 pages
Silent Cinema Reader PDF
No ratings yet
Silent Cinema Reader PDF
7 pages
Import: Sys - Executable - M Pip Install
No ratings yet
Import: Sys - Executable - M Pip Install
23 pages
Kawasaki FS03N Compact High Speed Industrial Robot: Key Features
No ratings yet
Kawasaki FS03N Compact High Speed Industrial Robot: Key Features
2 pages
PRACTICAL FILE IP - Copy (1)
No ratings yet
PRACTICAL FILE IP - Copy (1)
27 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
Ip Project
No ratings yet
Ip Project
27 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
numpy_dataframe
No ratings yet
numpy_dataframe
12 pages
Dsbda Assignment 1
No ratings yet
Dsbda Assignment 1
5 pages
AI Final PDF
No ratings yet
AI Final PDF
38 pages
L-2 (Data Frame Part 1).Ipynb - Colab
No ratings yet
L-2 (Data Frame Part 1).Ipynb - Colab
5 pages
Practical 1 and 2-1
No ratings yet
Practical 1 and 2-1
33 pages
Unit3_3) Pandas.ipynb - Colab
No ratings yet
Unit3_3) Pandas.ipynb - Colab
11 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Pandas Part-2
No ratings yet
Pandas Part-2
9 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
xử lý số liệu
No ratings yet
xử lý số liệu
11 pages
DATA SCIENCE IDC 302 End Sem Project
No ratings yet
DATA SCIENCE IDC 302 End Sem Project
1 page
Pandas
No ratings yet
Pandas
4 pages
prg7a - Jupyter Notebook
No ratings yet
prg7a - Jupyter Notebook
12 pages
FDA_BATCH2PROGRAM
No ratings yet
FDA_BATCH2PROGRAM
18 pages
Ds Pract 5 Data Analytics1 Vedanti
No ratings yet
Ds Pract 5 Data Analytics1 Vedanti
7 pages
IP.12.MT1.2024
No ratings yet
IP.12.MT1.2024
3 pages
Data_Cleaning
No ratings yet
Data_Cleaning
22 pages
DA lab
No ratings yet
DA lab
27 pages
PCA
No ratings yet
PCA
23 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
1DA (1)
No ratings yet
1DA (1)
18 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Dealing with Missing Values
No ratings yet
Dealing with Missing Values
19 pages
Ml PROGRAMS
No ratings yet
Ml PROGRAMS
55 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Machine Exercise 3 (1)
No ratings yet
Machine Exercise 3 (1)
22 pages
EDP-3[2]
No ratings yet
EDP-3[2]
16 pages
DAV Practicals
No ratings yet
DAV Practicals
26 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
Answers Practical File
No ratings yet
Answers Practical File
19 pages
Python
No ratings yet
Python
32 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Maths Coursework
No ratings yet
Maths Coursework
42 pages
Curentul Electric in Functie de Radacina Patrata A Tensiunii de Franare
No ratings yet
Curentul Electric in Functie de Radacina Patrata A Tensiunii de Franare
5 pages
QP - Info - Gr.12 - June MT - 2022 - MS
No ratings yet
QP - Info - Gr.12 - June MT - 2022 - MS
15 pages
70f626ef676e457578caba2d7bae2f6e
No ratings yet
70f626ef676e457578caba2d7bae2f6e
6 pages
Machine Learning Group Project
No ratings yet
Machine Learning Group Project
22 pages
EDA (2)
No ratings yet
EDA (2)
7 pages
ml labs
No ratings yet
ml labs
14 pages
CS2209 Python Pandas
No ratings yet
CS2209 Python Pandas
30 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
EXP-3
No ratings yet
EXP-3
10 pages
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
No ratings yet
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
19 pages
Python Solution
No ratings yet
Python Solution
30 pages
IP.12.QualiTest.2024
No ratings yet
IP.12.QualiTest.2024
7 pages
ML Journal
No ratings yet
ML Journal
58 pages
Machine Learning Lab Manual (1)
No ratings yet
Machine Learning Lab Manual (1)
42 pages
Short Notes on pandas
No ratings yet
Short Notes on pandas
21 pages
Panda Merged
No ratings yet
Panda Merged
19 pages
The Fibonacci Number Series
From Everand
The Fibonacci Number Series
Michael Husted
5/5 (1)
Strings in C++
No ratings yet
Strings in C++
59 pages
Digital Ballot
No ratings yet
Digital Ballot
11 pages
CT002-3-2 AI Methods: Swarm Intelligence, Technique and application-II
No ratings yet
CT002-3-2 AI Methods: Swarm Intelligence, Technique and application-II
29 pages
NFS2 640
No ratings yet
NFS2 640
8 pages
Adianti Framework
No ratings yet
Adianti Framework
29 pages
B. Inggris X
No ratings yet
B. Inggris X
7 pages
Data Analysis With Spreadsheet Edsheet Final
No ratings yet
Data Analysis With Spreadsheet Edsheet Final
22 pages
Fast Equaliser
No ratings yet
Fast Equaliser
14 pages
Lab07-Apache Pig V1.01
No ratings yet
Lab07-Apache Pig V1.01
7 pages
Brooks Rotameter Variable Area
No ratings yet
Brooks Rotameter Variable Area
22 pages
2023 Sem 1 - CCA Welcome Letter - Robotics-2
No ratings yet
2023 Sem 1 - CCA Welcome Letter - Robotics-2
2 pages
2G - Raw View Creation Template
No ratings yet
2G - Raw View Creation Template
150 pages
Dr. Gauri, (Hariom Verma)
No ratings yet
Dr. Gauri, (Hariom Verma)
6 pages
Small and Midsize Companies (Having Up To 2,500 Employees) With Stable Processes and That Want To Use A Preconfigured System From SAP." True
No ratings yet
Small and Midsize Companies (Having Up To 2,500 Employees) With Stable Processes and That Want To Use A Preconfigured System From SAP." True
19 pages
The Purpose of This Feasibility Study Is To Forecast The Sales of Renewable Stationary Generators Over The Next Three Years
No ratings yet
The Purpose of This Feasibility Study Is To Forecast The Sales of Renewable Stationary Generators Over The Next Three Years
2 pages
LC76G 0LC76F 0L76-L 0L76-LB Data Comparison Report V1.0
No ratings yet
LC76G 0LC76F 0L76-L 0L76-LB Data Comparison Report V1.0
31 pages
CourseOutline_ITU_ATC_course-Technical_business_and_regulatory_aspects_of_5G_network_2025
No ratings yet
CourseOutline_ITU_ATC_course-Technical_business_and_regulatory_aspects_of_5G_network_2025
9 pages
AI Complete Notes - Unit 1 To Unit 5
100% (3)
AI Complete Notes - Unit 1 To Unit 5
62 pages
Machine Translation Using Open NLP and Rules Based System English To Marathi Translator
No ratings yet
Machine Translation Using Open NLP and Rules Based System English To Marathi Translator
4 pages
Julius Caesar
No ratings yet
Julius Caesar
84 pages
FM Communication
No ratings yet
FM Communication
19 pages
Argentina Class I-II Registration Revalidation Form (EN)
No ratings yet
Argentina Class I-II Registration Revalidation Form (EN)
4 pages
Big Data in The Utilities Industry
No ratings yet
Big Data in The Utilities Industry
12 pages
1.6 With KDH Drive
No ratings yet
1.6 With KDH Drive
115 pages
CorrigoE Manual Heating Long Eng
No ratings yet
CorrigoE Manual Heating Long Eng
68 pages
Jaggaer - UpdatedBuyer User Guides - Technical Evaluation V1
No ratings yet
Jaggaer - UpdatedBuyer User Guides - Technical Evaluation V1
10 pages
Comparison of COB Vs SMD in Details PDF
No ratings yet
Comparison of COB Vs SMD in Details PDF
20 pages

Prac3.ipynb (Auto-R) - JupyterLab

Uploaded by

Prac3.ipynb (Auto-R) - JupyterLab

Uploaded by

In [3]: import pandas as pd

# Set seed for reproducibility

# Create a DataFrame with 3 columns and 50 rows of random numeric data

In [4]: # a. Identify and count missing values in a DataFrame

Missing Values Count:

In [5]: # b. Drop the column having more than 5 null values

In [8]: # d. Sort the DataFrame on the basis of the first column

In [9]: # e. Remove all duplicates from the first column

Correlation between the first and second column: 0.05849765987946871

In [16]: # g. Discretize the second column and create 5 bins

# Assuming df is your DataFrame and 'B' is the second column

In [17]: print('By- Aaryan Pandey 13591')

By- Aaryan Pandey 13591

You might also like