Session-25 - Jupyter Notebook
Session-25 - Jupyter Notebook
In [2]: 1 df=pd.read_csv('Iris.csv')
2 df
Out[2]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
In [3]: 1 df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
1. Slicing :-
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 1/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [4]: 1 df=pd.read_csv('Iris.csv')
2 df
Out[4]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
In [5]: 1 df[110:120]
Out[5]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
Out[6]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 2/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [10]: 1 df[['SepalLengthCm','PetalLengthCm']]
Out[10]:
SepalLengthCm PetalLengthCm
0 5.1 1.4
1 4.9 1.4
2 4.7 1.3
3 4.6 1.5
4 5.0 1.4
2.loc :-
In [ ]: 1 # 1. I is used to access the rows and columns by using name of rows and
In [12]: 1 df.loc[10:20,['SepalLengthCm','PetalLengthCm','Species']]
Out[12]:
SepalLengthCm PetalLengthCm Species
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 3/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [14]: 1 df.loc[110:120,'SepalLengthCm':'Species']
Out[14]:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
In [16]: 1 df.loc[110:120,['SepalLengthCm','Species']]
Out[16]:
SepalLengthCm Species
2.iloc:-
In [17]: 1 # 1. It is also used to access the rows and columns on the basis of inde
2 # last index no will be excluded
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 4/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [18]: 1 df=pd.read_csv('Iris.csv')
2 df
Out[18]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
In [19]: 1 df.iloc[1,0]
Out[19]: 2
Out[20]: 3.1
In [21]: 1 df.iloc[2,5]
Out[21]: 'Iris-setosa'
In [22]: 1 df.iloc[2:5]
Out[22]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
In [23]: 1 df.iloc[2:5,1:3]
Out[23]:
SepalLengthCm SepalWidthCm
2 4.7 3.2
3 4.6 3.1
4 5.0 3.6
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 5/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [24]: 1 df1=df.groupby('SepalWidthCm').get_group(3.2)
2 df1
Out[24]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
1. Delete columns :-
In [13]: 1 import pandas as pd
2 import numpy as np
3
4 data={'Name':['A','B','C','D'],
5 'Last_name':['E','F','G','H']}
6 df=pd.DataFrame(data)
7 df
Out[13]:
Name Last_name
0 A E
1 B F
2 C G
3 D H
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 6/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [14]: 1 df['score']=[10,20,30,40]
2 df['Zeros']=np.zeros(4)
3 df['ones']=np.ones(4)
4 df
Out[14]:
Name Last_name score Zeros ones
0 A E 10 0.0 1.0
1 B F 20 0.0 1.0
2 C G 30 0.0 1.0
3 D H 40 0.0 1.0
In [15]: 1 df=df.drop(['Zeros'],axis=1)
2 df
Out[15]:
Name Last_name score ones
0 A E 10 1.0
1 B F 20 1.0
2 C G 30 1.0
3 D H 40 1.0
In [17]: 1 df=df.drop(['score','ones'],axis=1)
2 df
Out[17]:
Name Last_name
0 A E
1 B F
2 C G
3 D H
In [21]: 1 df1=df.T
2 df1
Out[21]:
0 1 2 3
Name A B C D
Last_name E F G H
In [22]: 1 df1=df1.drop('Last_name',axis=0)
2 df1
Out[22]:
0 1 2 3
Name A B C D
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 7/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
Out[23]:
Name Last_name
0 A E
1 B F
2 C G
3 D H
In [24]: 1 df['score']=[10,20,30,40]
2 df['Zeros']=np.zeros(4)
3 df['ones']=np.ones(4)
4 df
Out[24]:
Name Last_name score Zeros ones
0 A E 10 0.0 1.0
1 B F 20 0.0 1.0
2 C G 30 0.0 1.0
3 D H 40 0.0 1.0
loc:-
In [25]: 1 df.loc[1:2]
Out[25]:
Name Last_name score Zeros ones
1 B F 20 0.0 1.0
2 C G 30 0.0 1.0
In [27]: 1 df.loc[1:3,['score','Zeros']]
Out[27]:
score Zeros
1 20 0.0
2 30 0.0
3 40 0.0
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 8/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [29]: 1 df.loc[1:3,'score':'ones']
Out[29]:
score Zeros ones
1 20 0.0 1.0
2 30 0.0 1.0
3 40 0.0 1.0
In [30]: 1 # iloc
In [31]: 1 df
Out[31]:
Name Last_name score Zeros ones
0 A E 10 0.0 1.0
1 B F 20 0.0 1.0
2 C G 30 0.0 1.0
3 D H 40 0.0 1.0
In [32]: 1 df.iloc[0:2]
Out[32]:
Name Last_name score Zeros ones
0 A E 10 0.0 1.0
1 B F 20 0.0 1.0
In [33]: 1 df.iloc[:,:]
Out[33]:
Name Last_name score Zeros ones
0 A E 10 0.0 1.0
1 B F 20 0.0 1.0
2 C G 30 0.0 1.0
3 D H 40 0.0 1.0
In [34]: 1 df.iloc[2,2]
Out[34]: 30
In [35]: 1 df.iloc[:,1:3]
Out[35]:
Last_name score
0 E 10
1 F 20
2 G 30
3 H 40
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 9/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
1 iloc >> index
2 loc >> Name of rows and columns
1 1. head()
2 2. tail()
3 3.info()
4 4.shape() >> tuple of rows and columns
5 5.describe() >> statistical clculation
6 6.Index() >> rows
7 7.columns() >> columns name
8 8.T >>> transpose
9 9. loc
10 10iloc
11 11.drop
In [36]: 1 df=pd.read_excel('Emp_Records.xlsx')
2 df
Out[36]:
Emp First Age in Weight in Age in Unnamed:
City Salary
ID Name Yrs Kgs Company 5
In [37]: 1 df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Emp ID 100 non-null int64
1 First Name 100 non-null object
2 Age in Yrs 100 non-null float64
3 Weight in Kgs 100 non-null int64
4 Age in Company 100 non-null float64
5 Unnamed: 5 0 non-null float64
6 City 100 non-null object
7 Salary 100 non-null int64
dtypes: float64(3), int64(3), object(2)
memory usage: 6.4+ KB
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 10/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [38]: 1 df.dtypes
In [39]: 1 df.columns
Out[39]: Index(['Emp ID', 'First Name', 'Age in Yrs', 'Weight in Kgs', 'Age in Compa
ny',
'Unnamed: 5', 'City', 'Salary'],
dtype='object')
In [40]: 1 df.axes
In [41]: 1 df.describe()
Out[41]:
Weight in Age in Unnamed:
Emp ID Age in Yrs Salary
Kgs Company 5
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 11/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [42]: 1 df
Out[42]:
Emp First Age in Weight in Age in Unnamed:
City Salary
ID Name Yrs Kgs Company 5
use of index_col
In [43]: 1 df=pd.read_excel('Emp_Records.xlsx',index_col='Age in Company')
2 df
Out[43]:
Emp First Age in Weight in Unnamed:
City Salary
ID Name Yrs Kgs 5
Age in
Company
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 12/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
Out[44]:
Emp Age in Weight in Age in Unnamed:
City Salary
ID Yrs Kgs Company 5
First
Name
In [45]: 1 df
Out[45]:
Emp Age in Weight in Age in Unnamed:
City Salary
ID Yrs Kgs Company 5
First
Name
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 13/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [46]: 1 var1={'a':[100,200,300,400],
2 'b':[10,20,30,40]}
3 df=pd.DataFrame(var1)
4 df
Out[46]:
a b
0 100 10
1 200 20
2 300 30
3 400 40
In [47]: 1 df.to_csv('techie.csv')
In [48]: 1 df.to_excel('techie1.xlsx')
sorting :-
In [49]: 1 df=pd.read_excel('Emp_Records.xlsx')
2 df
Out[49]:
Emp First Age in Weight in Age in Unnamed:
City Salary
ID Name Yrs Kgs Company 5
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 14/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [51]: 1 df.sort_index(axis=1)
Out[51]:
Age in Age in Emp First Unnamed: Weight in
City Salary
Company Yrs ID Name 5 Kgs
In [52]: 1 df1=pd.read_excel('Emp_Records.xlsx',index_col='City')
2 df1
Out[52]:
Emp First Age in Weight in Age in Unnamed:
Salary
ID Name Yrs Kgs Company 5
City
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 15/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [53]: 1 df1.sort_index(axis=0)
Out[53]:
Emp Age in Weight in Age in Unnamed:
First Name Salary
ID Yrs Kgs Company 5
City
Whiteman Air
969964 Janice 37.57 56 0.93 NaN 147641
Force Base
In [54]: 1 df.sort_index(axis=1,ascending=False)
Out[54]:
Weight in Unnamed: First Emp Age in Age in
Salary City
Kgs 5 Name ID Yrs Company
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 16/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [55]: 1 df1.sort_index(axis=0,ascending=False)
Out[55]:
Emp Age in Weight in Age in Unnamed:
First Name Salary
ID Yrs Kgs Company 5
City
Whiteman Air
969964 Janice 37.57 56 0.93 NaN 147641
Force Base
In [56]: 1 df1=pd.read_excel('Emp_Records.xlsx',index_col='City')
2 df1
Out[56]:
Emp First Age in Weight in Age in Unnamed:
Salary
ID Name Yrs Kgs Company 5
City
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 17/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [59]: 1 df1.reset_index()
2
Out[59]:
Emp First Age in Weight in Age in Unnamed:
City Salary
ID Name Yrs Kgs Company 5
In [60]: 1 df1=pd.read_excel('Emp_Records.xlsx',index_col='City')
2 df1
Out[60]:
Emp First Age in Weight in Age in Unnamed:
Salary
ID Name Yrs Kgs Company 5
City
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 18/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
In [61]: 1 df1.reset_index(drop=True)
2 # df.rest_index(drop=False) >> i dont have to drop
Emp ID First Name Age in Yrs Weight in Kgs Age in Company Unnamed: 5 Salary
sort values () :-
In [62]: 1 df
Out[62]:
Emp First Age in Weight in Age in Unnamed:
City Salary
ID Name Yrs Kgs Company 5
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 19/20
3/15/24, 9:42 PM Session-25 - Jupyter Notebook
Out[63]:
Emp First Age in Weight in Age in Unnamed:
City Salary
ID Name Yrs Kgs Company 5
Out[64]:
Emp First Age in Weight in Age in Unnamed:
City Salary
ID Name Yrs Kgs Company 5
Lake
41 227922 Amanda 35.02 40 10.28 NaN 114257
Charles
Saranac
82 761821 Ernest 32.77 87 2.49 NaN 176675
Lake
In [ ]: 1
localhost:8888/notebooks/Desktop/Techpaathsala/Session-25.ipynb 20/20