Pandas
Pandas
IMPORT 1
0.1 Import
1 import pandas as pd
if we don’t specifie sheet name then by default pandas will read the first
sheet.
0.6.2 Tail
to see the last rows of hdataframe
1 df . tail ( < number >)
0.9.2 !!!!
1 df [ < column name >] >| <|= < value > ]
df [< columnname >] > | < | =< value > will return a list with size equls to
number of rows of ’df’ and each row contains a boolean (True if row of column
is satisfing the condition, otherwise False).
0.10. LOC VS ILOC 3
0.9.3 isin
0.9.5 filter
Subset the dataframe rows or columns according to the specified index labels.
it do not filter a dataframe on its contents. The filter is applied to the labels
of the index.
1 df . filter ( items =[ list of columns or rows ] , axis =0|1)
0.10.2 iloc
it gets rows (and/or columns) at integer locations (indexes) .
1 df . iloc [0]
2 df . iloc [5:]
4
0.11 Indexing
0.11.1 set index
when reading file
or
1 df . reset_index ( drop = True , inplace = True )
0.14 Syntax
1 DataFrame . describe ( percentiles = None , include = None , exclude = None )
0.15 Parameters
• percentiles: A list-like structure specifying which percentiles to include
in the output. Default is [0.25, 0.5, 0.75].
0.16 Output
By default, describe() returns the following statistics for numeric columns:
0.17 applay
EX transform column from numbers to strings
1 df [ ’ column ’ ]. applay ( lambda x : str ( x ) )
16 })
17
18 print ( result )
Resulat:
0.19 Merge
The merge function combines two DataFrames based on a common key.
1 import pandas as pd
2
3 df1 = pd . DataFrame ({
4 ’ key ’: [ ’A ’ , ’B ’ , ’C ’ , ’D ’] ,
5 ’ value ’: [1 , 2 , 3 , 4]
6 })
7
8 df2 = pd . DataFrame ({
9 ’ key ’: [ ’B ’ , ’D ’ , ’E ’ , ’F ’] ,
10 ’ value ’: [5 , 6 , 7 , 8]
11 })
12
13 result = pd . merge ( df1 , df2 , on = ’ key ’ , how = ’ inner ’)
14 print ( result )
Listing 1: Merge DataFrames
0.20 Join
The join function combines two DataFrames based on their indices.
1 import pandas as pd
2
3 df1 = pd . DataFrame ({
4 ’ value1 ’: [1 , 2 , 3 , 4]
5 } , index =[ ’A ’ , ’B ’ , ’C ’ , ’D ’ ])
6
7 df2 = pd . DataFrame ({
8 ’ value2 ’: [5 , 6 , 7 , 8]
9 } , index =[ ’B ’ , ’D ’ , ’E ’ , ’F ’ ])
10
11 result = df1 . join ( df2 , how = ’ inner ’)
12 print ( result )
Listing 2: Join DataFrames
8
0.21 Concat
The concat function combines two DataFrames along a specified axis.
1 import pandas as pd
2
3 df1 = pd . DataFrame ({
4 ’A ’: [ ’ A0 ’ , ’ A1 ’ , ’ A2 ’] ,
5 ’B ’: [ ’ B0 ’ , ’ B1 ’ , ’ B2 ’]
6 })
7
8 df2 = pd . DataFrame ({
9 ’A ’: [ ’ A3 ’ , ’ A4 ’ , ’ A5 ’] ,
10 ’B ’: [ ’ B3 ’ , ’ B4 ’ , ’ B5 ’]
11 })
12
13 result = pd . concat ([ df1 , df2 ])
14 print ( result )
Listing 3: Concat DataFrames
0.22 Ploting
we use .plot to plot or .plot.¡name plot¿() (plot.scatter, plot.hbar,...).
• x:
– Description: Column name(s) or position(s) for the x-axis.
– Type: str or list of str or int or list of int
– Default: The index of the DataFrame.
• y:
– Description: Column name(s) or position(s) for the y-axis.
– Type: str or list of str or int or list of int
– Default: The columns not specified in x.
• kind:
– Description: Type of plot to be generated.
– Type: str
– Options: ’line’, ’bar’, ’barh’, ’hist’, ’box’, ’kde’, ’density’,
’area’, ’pie’, ’scatter’, etc.
– Default: ’line’
• ax:
0.22. PLOTING 9
• figsize:
• subplots:
• title:
• grid:
• legend:
• xlabel:
• ylabel:
• color:
– Description: Color of the plot elements.
– Type: str or list of str
– Default: Cycle through Matplotlib default colors.
• style:
– Description: Line style or marker for the plot.
– Type: str or list of str
– Default: Matplotlib default styles.
• alpha:
– Description: Transparency level of the plot elements.
– Type: float (0.0 to 1.0)
– Default: None
• rot:
– Description: Rotation angle for the x-axis labels.
– Type: int or float
– Default: None
• logx:
– Description: Use logarithmic scaling for the x-axis.
– Type: bool
– Default: False
• logy:
– Description: Use logarithmic scaling for the y-axis.
– Type: bool
– Default: False
• loglog:
– Description: Use logarithmic scaling for both x and y axes.
– Type: bool
– Default: False
– Default: None
• sharex:
– Description: Share the x-axis with other subplots.
– Type: bool
– Default: True if subplots=True
• sharey:
– Description: Share the y-axis with other subplots.
– Type: bool
– Default: True if subplots=True
12
Chapter 1
Data cleaning
1.5 Fillna
it used to fill NaN values
1 \\ replace NaN with blank
2 df = df . fillna ()
13