0% found this document useful (0 votes)

24 views

Pandas

Uploaded by

zam.pfe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Pandas

Uploaded by

zam.pfe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

0.1.

IMPORT 1

0.1 Import
1 import pandas as pd

0.2 Read files

0.2.1 Read csv
We can read any file (.csv, .txt, ...) with columns are seperated by the same
delimeter.
1 data = pd . read_csv ( r " < data path > " , header = None | row_index , names = <
list of header columns > , sep = < delimeter >)

0.2.2 Read json

1 data = pd . read_json ( r " < json file path > " )

0.2.3 Read excel

1 data = pd . read_excel ( r " < excel file path > " , sheet_name = < sheet_name >)

if we don’t specifie sheet name then by default pandas will read the first
sheet.

0.3 Max row and columns

0.3.1 Max rows
1 pd . set_option ( ’ display . max . rows ’ , < number >)

0.3.2 Max Columns

1 pd . set_option ( ’ display . max . columns ’ , < number >)

0.4 Dataframe infos

we use it to see dataframe columns and rows count, and non null values count
on each column (to see if there are a mising values) also memory usage.
1 df . infos ()
2

0.5 Dataframe shape

to see the shape of dataframe
1 df . shape

0.6 Head and Tail of dataframe

0.6.1 Head
to see the first rows of hdataframe
1 df . head ( < number >)

0.6.2 Tail
to see the last rows of hdataframe
1 df . tail ( < number >)

0.7 set index

we can set rows index by:
1 df . set_index ( [ list of indexes wich equals to df rows ])

0.8 Add columns to dataframe

1 df [ < new column name >|[ < columns names >]] = [ < column ( s ) with the same
number of rows of df >]

0.9 Column filtering

0.9.1 basic
supposing that we have three rows in our dataset
1 df [ [ True , False , True ] ]

this will give us the first and third rows

0.9.2 !!!!
1 df [ < column name >] >| <|= < value > ]

df [< columnname >] > | < | =< value > will return a list with size equls to
number of rows of ’df’ and each row contains a boolean (True if row of column
is satisfing the condition, otherwise False).
0.10. LOC VS ILOC 3

0.9.3 isin

1 df [ df [ df [ column ]. isin ([ item1 ,...]) ]]

0.9.4 string operations

it only works with string values,we use the str, we can use function like: con-
tains, endswith, ...
1 df [ df [ df [ column ]. str . contains ( " < string > " ) ]]

0.9.5 filter
Subset the dataframe rows or columns according to the specified index labels.
it do not filter a dataframe on its contents. The filter is applied to the labels
of the index.
1 df . filter ( items =[ list of columns or rows ] , axis =0|1)

1 df . filter ( like = " < value > " , axis =0|1)

• axis 0: rows axis

• axis 1:c columns axis

0.10 loc vs iloc

0.10.1 loc
it gets rows (and/or columns) with particular labels.
1 df . loc [0]
2 df . loc [5:]

0.10.2 iloc
it gets rows (and/or columns) at integer locations (indexes) .
1 df . iloc [0]
2 df . iloc [5:]
4

0.11 Indexing
0.11.1 set index
when reading file

1 df = pd . read_csv ( path , index_col = " column name " | [ list of columns ]

|[ list of indexed ])

after reading file

1 df = df . set_index ( ’ column name ’| [ list of columns ] | [ list of

indexes ])
2 // or
3 df . set_index ( ’ column name ’ , [ list of indexes ] , inplace = True )

0.11.2 reste index

rest indexes to default values
1 df = df . reset_index ( drop = True )

or
1 df . reset_index ( drop = True , inplace = True )

0.11.3 sort index

1 df = df . sort_index ( ascending = boolean |[ list of booleans ])
0.12. SORT VALUES 5

0.12 Sort values

1 df = df . sort_values [ by =[ list of indexes ] , ascending =[ < boolean
> ,...]]

0.13 describe dataframe (or grouped dataframe)

The describe() function in Pandas provides a quick summary of the central
tendencies, dispersion, and shape of a dataset’s distribution, excluding NaN val-
ues. This function is particularly useful for getting an overview of the dataset’s
numerical columns, but it can also be applied to object (string) columns.
Here’s how the describe() function works and what it provides:

0.14 Syntax
1 DataFrame . describe ( percentiles = None , include = None , exclude = None )

0.15 Parameters
• percentiles: A list-like structure specifying which percentiles to include
in the output. Default is [0.25, 0.5, 0.75].

• include: A white-list of data types to include in the result. Can be a

string or a list-like structure.

• exclude: A black-list of data types to exclude from the result. Can be a

string or a list-like structure.

0.16 Output
By default, describe() returns the following statistics for numeric columns:

• count: The number of non-null entries.

• mean: The average (mean) value.

• std: The standard deviation.

• min: The minimum value.

• 25%: The 25th percentile (first quartile).

• 50%: The 50th percentile (median or second quartile).

• 75%: The 75th percentile (third quartile).

• max: The maximum value.

For object (string) columns, it returns:
• count: The number of non-null entries.
• unique: The number of unique values.
• top: The most frequent value.
• freq: The frequency of the most frequent value.

0.17 applay
EX transform column from numbers to strings
1 df [ ’ column ’ ]. applay ( lambda x : str ( x ) )

0.18 Group by and Aggregate functions

To group a datafarme
1 groupedFrame = df . groupby ([ list of column ])

Then we can applay aggregate functions:

1 groupedFrame . mean ( numeric_only = True )
2 groupedFrame . count ( numeric_only = True )
3 groupedFrame . min ( numeric_only = True )
4 groupedFrame . avg ( numeric_only = True )
5 // etc ....

0.18.1 agg function

Using agg() with groupby in Pandas allows you to perform aggregation oper-
ations on groups of data within a DataFrame. This is particularly useful for
summarizing data by categories or groups.
1 import pandas as pd
2
3 # Sample DataFrame
4 data = {
5 ’ Category ’: [ ’A ’ , ’A ’ , ’B ’ , ’B ’ , ’C ’ , ’C ’] ,
6 ’ Values1 ’: [1 , 2 , 3 , 4 , 5 , 6] ,
7 ’ Values2 ’: [10 , 20 , 30 , 40 , 50 , 60]
8 }
9
10 df = pd . DataFrame ( data )
11
12 # Group by ’ Category ’ and aggregate with multiple functions
13 result = df . groupby ( ’ Category ’) . agg ({
14 ’ Values1 ’: [ ’ mean ’ , ’ sum ’] ,
15 ’ Values2 ’: [ ’ min ’ , ’ max ’]
0.19. MERGE 7

16 })
17
18 print ( result )

Resulat:

Category Values1 Values2

mean sum min max
A 1.5 3 10 20
B 3.5 7 30 40
C 5.5 11 50 60

0.19 Merge
The merge function combines two DataFrames based on a common key.
1 import pandas as pd
2
3 df1 = pd . DataFrame ({
4 ’ key ’: [ ’A ’ , ’B ’ , ’C ’ , ’D ’] ,
5 ’ value ’: [1 , 2 , 3 , 4]
6 })
7
8 df2 = pd . DataFrame ({
9 ’ key ’: [ ’B ’ , ’D ’ , ’E ’ , ’F ’] ,
10 ’ value ’: [5 , 6 , 7 , 8]
11 })
12
13 result = pd . merge ( df1 , df2 , on = ’ key ’ , how = ’ inner ’)
14 print ( result )
Listing 1: Merge DataFrames

0.20 Join
The join function combines two DataFrames based on their indices.
1 import pandas as pd
2
3 df1 = pd . DataFrame ({
4 ’ value1 ’: [1 , 2 , 3 , 4]
5 } , index =[ ’A ’ , ’B ’ , ’C ’ , ’D ’ ])
6
7 df2 = pd . DataFrame ({
8 ’ value2 ’: [5 , 6 , 7 , 8]
9 } , index =[ ’B ’ , ’D ’ , ’E ’ , ’F ’ ])
10
11 result = df1 . join ( df2 , how = ’ inner ’)
12 print ( result )
Listing 2: Join DataFrames
8

0.21 Concat
The concat function combines two DataFrames along a specified axis.
1 import pandas as pd
2
3 df1 = pd . DataFrame ({
4 ’A ’: [ ’ A0 ’ , ’ A1 ’ , ’ A2 ’] ,
5 ’B ’: [ ’ B0 ’ , ’ B1 ’ , ’ B2 ’]
6 })
7
8 df2 = pd . DataFrame ({
9 ’A ’: [ ’ A3 ’ , ’ A4 ’ , ’ A5 ’] ,
10 ’B ’: [ ’ B3 ’ , ’ B4 ’ , ’ B5 ’]
11 })
12
13 result = pd . concat ([ df1 , df2 ])
14 print ( result )
Listing 3: Concat DataFrames

0.22 Ploting
we use .plot to plot or .plot.¡name plot¿() (plot.scatter, plot.hbar,...).

The df.plot method in Pandas offers a variety of parameters to customize

plots. Here is a detailed explanation of these parameters:

• x:
– Description: Column name(s) or position(s) for the x-axis.
– Type: str or list of str or int or list of int
– Default: The index of the DataFrame.
• y:
– Description: Column name(s) or position(s) for the y-axis.
– Type: str or list of str or int or list of int
– Default: The columns not specified in x.
• kind:
– Description: Type of plot to be generated.
– Type: str
– Options: ’line’, ’bar’, ’barh’, ’hist’, ’box’, ’kde’, ’density’,
’area’, ’pie’, ’scatter’, etc.
– Default: ’line’
• ax:
0.22. PLOTING 9

– Description: Matplotlib axes object to which the plot is added.

– Type: matplotlib.axes.Axes or None
– Default: None

• figsize:

– Description: Size of the figure (width, height) in inches.

– Type: tuple of (int, int)
– Default: (6, 4)

• subplots:

– Description: Create a separate subplot for each column.

– Type: bool
– Default: False

• title:

– Description: Title of the plot.

– Type: str
– Default: None

• grid:

– Description: Whether to show grid lines.

– Type: bool
– Default: None (grid is shown if True)

• legend:

– Description: Whether to show the legend.

– Type: bool
– Default: True if the plot contains multiple series.

• xlabel:

– Description: Label for the x-axis.

– Type: str
– Default: None

• ylabel:

– Description: Label for the y-axis.

– Type: str
– Default: None
10

• color:
– Description: Color of the plot elements.
– Type: str or list of str
– Default: Cycle through Matplotlib default colors.

• style:
– Description: Line style or marker for the plot.
– Type: str or list of str
– Default: Matplotlib default styles.

• alpha:
– Description: Transparency level of the plot elements.
– Type: float (0.0 to 1.0)
– Default: None

• rot:
– Description: Rotation angle for the x-axis labels.
– Type: int or float
– Default: None

• logx:
– Description: Use logarithmic scaling for the x-axis.
– Type: bool
– Default: False

• logy:
– Description: Use logarithmic scaling for the y-axis.
– Type: bool
– Default: False

• loglog:
– Description: Use logarithmic scaling for both x and y axes.
– Type: bool
– Default: False

• xerr and yerr:

– Description: Error bars for the x and y data.
– Type: float or DataFrame or Series
0.22. PLOTING 11

– Default: None
• sharex:
– Description: Share the x-axis with other subplots.
– Type: bool
– Default: True if subplots=True
• sharey:
– Description: Share the y-axis with other subplots.
– Type: bool
– Default: True if subplots=True
12
Chapter 1

Data cleaning

we could clean our data with function like

1.1 Delete duplicates

1 df . d r op _d up l ic at es ()

1.2 Drop rows

1 df = df . drop ( < row index ( loc ) >)

1.3 Drop columns

1 df = df . drop ( columns = < column name >|[ < list of columns >])
2 // or
3 df . drop ( columns = < column name >|[ < list of columns >] , inplace = True )

1.4 Drop NaN

1 df = dropna ( subset = ’ < column > ’)

1.5 Fillna
it used to fill NaN values
1 \\ replace NaN with blank
2 df = df . fillna ()

Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
PYTHON Pandas and Manipulation Data
No ratings yet
PYTHON Pandas and Manipulation Data
36 pages
Pandas
No ratings yet
Pandas
94 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas
No ratings yet
Pandas
5 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Phan1_Pandas_Numpy_Matplotlib
No ratings yet
Phan1_Pandas_Numpy_Matplotlib
158 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
Pandas
No ratings yet
Pandas
25 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
python 2.1.3 (2)
No ratings yet
python 2.1.3 (2)
6 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
What is pandas
No ratings yet
What is pandas
9 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
PPT for Assignment-3 (Final_Pandas_Lab)
No ratings yet
PPT for Assignment-3 (Final_Pandas_Lab)
40 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
Reference Guide - Pandas Tools For Structuring A Dataset
No ratings yet
Reference Guide - Pandas Tools For Structuring A Dataset
5 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Pandas
No ratings yet
Pandas
9 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Pandas (Ziad)
No ratings yet
Pandas (Ziad)
38 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Pandas Interview Questions
No ratings yet
Pandas Interview Questions
21 pages
Pandas
No ratings yet
Pandas
29 pages
Lecture 14
No ratings yet
Lecture 14
33 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
PANDAS Python
No ratings yet
PANDAS Python
2 pages
Panda Cheatsheet
No ratings yet
Panda Cheatsheet
17 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
MLStackCafe2
No ratings yet
MLStackCafe2
11 pages
Pandas Cheat Sheet
83% (12)
Pandas Cheat Sheet
2 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
RA Continuing Education (Data Processing With Pandas)
No ratings yet
RA Continuing Education (Data Processing With Pandas)
77 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
1 page
Data Frames
No ratings yet
Data Frames
60 pages
Chapter 1 - Part 2 - DataFrame (1)
No ratings yet
Chapter 1 - Part 2 - DataFrame (1)
48 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Pandas_Notes_Design
No ratings yet
Pandas_Notes_Design
5 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
TD824 1F161F28K
100% (1)
TD824 1F161F28K
310 pages
ECSS S 00A (13december2005) PDF
No ratings yet
ECSS S 00A (13december2005) PDF
36 pages
Compiler answers
No ratings yet
Compiler answers
14 pages
Twin Primes Conjecture (Mantzakouras Nikos) (Typeset - Io)
No ratings yet
Twin Primes Conjecture (Mantzakouras Nikos) (Typeset - Io)
7 pages
Miscellaneous Instruction: Truck, Medium, Mc2 - Unimog - All Types
No ratings yet
Miscellaneous Instruction: Truck, Medium, Mc2 - Unimog - All Types
6 pages
LAB-04-functions in MATLAB-solutions Exos1-4
No ratings yet
LAB-04-functions in MATLAB-solutions Exos1-4
7 pages
ويبنار تصميم محطات تحلية المياه
No ratings yet
ويبنار تصميم محطات تحلية المياه
72 pages
Locomotive Wheel Slip Control Method Based On An Unscented Kalman Filter
No ratings yet
Locomotive Wheel Slip Control Method Based On An Unscented Kalman Filter
10 pages
Object Detection Using Circular Hough Transform
No ratings yet
Object Detection Using Circular Hough Transform
5 pages
Module Code & Module Title CU6051NA - Artificial Intelligence Assessment Weightage & Type 20% Individual Coursework Year and Semester 2019-20 Autumn
No ratings yet
Module Code & Module Title CU6051NA - Artificial Intelligence Assessment Weightage & Type 20% Individual Coursework Year and Semester 2019-20 Autumn
10 pages
CDIT 404 - Introduction To Wireless and Mobile Computing
No ratings yet
CDIT 404 - Introduction To Wireless and Mobile Computing
5 pages
Mixing Tee
No ratings yet
Mixing Tee
4 pages
Flyer FFPM 2022 - Semua - 07
No ratings yet
Flyer FFPM 2022 - Semua - 07
1 page
Cybersecurity for Artificial Intelligence Mark Stamp - Read the ebook online or download it to own the full content
100% (1)
Cybersecurity for Artificial Intelligence Mark Stamp - Read the ebook online or download it to own the full content
73 pages
2.1. Overview of PCI Express Bus
No ratings yet
2.1. Overview of PCI Express Bus
19 pages
Katalog LTPO Seamless Pipes 2019 Web
No ratings yet
Katalog LTPO Seamless Pipes 2019 Web
9 pages
43UH6030-SB - 43UH603T-DB - 3702 - 0022 - SmartGuide 2 PDF
0% (1)
43UH6030-SB - 43UH603T-DB - 3702 - 0022 - SmartGuide 2 PDF
54 pages
Java Lab Record
No ratings yet
Java Lab Record
110 pages
Symbols & Standards
No ratings yet
Symbols & Standards
9 pages
Unit 22: Onboard Passenger Operations
No ratings yet
Unit 22: Onboard Passenger Operations
17 pages
Matrices in Computer Graphics: Nataša LONČARIĆ, Damira KEČEK, Marko KRALJIĆ
No ratings yet
Matrices in Computer Graphics: Nataša LONČARIĆ, Damira KEČEK, Marko KRALJIĆ
4 pages
Take A Risk - Part 1
No ratings yet
Take A Risk - Part 1
2 pages
Títulos de Ensayos Sobre Tecnología
No ratings yet
Títulos de Ensayos Sobre Tecnología
6 pages
Department of Civil Engineering
No ratings yet
Department of Civil Engineering
2 pages
Fast Track Your Multicloud Monitoring Initiative
No ratings yet
Fast Track Your Multicloud Monitoring Initiative
11 pages
Problem Solving Process Report Template and Tools
No ratings yet
Problem Solving Process Report Template and Tools
2 pages
Business Letters (Format)
No ratings yet
Business Letters (Format)
14 pages
Dokument WOLFconfig
No ratings yet
Dokument WOLFconfig
14 pages
Masters2 99426
No ratings yet
Masters2 99426
155 pages
Weller Portasol P-1K English
No ratings yet
Weller Portasol P-1K English
3 pages

Pandas

Uploaded by

Pandas

Uploaded by

0.1.

0.2 Read files

0.2.2 Read json

0.2.3 Read excel

0.3 Max row and columns

0.3.2 Max Columns

0.4 Dataframe infos

0.5 Dataframe shape

0.6 Head and Tail of dataframe

0.7 set index

0.8 Add columns to dataframe

0.9 Column filtering

this will give us the first and third rows

1 df [ df [ df [ column ]. isin ([ item1 ,...]) ]]

0.9.4 string operations

1 df . filter ( like = " < value > " , axis =0|1)

• axis 0: rows axis

• axis 1:c columns axis

0.10 loc vs iloc

1 df = pd . read_csv ( path , index_col = " column name " | [ list of columns ]

after reading file

1 df = df . set_index ( ’ column name ’| [ list of columns ] | [ list of

0.11.2 reste index

0.11.3 sort index

0.12 Sort values

0.13 describe dataframe (or grouped dataframe)

• include: A white-list of data types to include in the result. Can be a

• exclude: A black-list of data types to exclude from the result. Can be a

• count: The number of non-null entries.

• mean: The average (mean) value.

• std: The standard deviation.

• min: The minimum value.

• 25%: The 25th percentile (first quartile).

• 50%: The 50th percentile (median or second quartile).

• 75%: The 75th percentile (third quartile).

• max: The maximum value.

0.18 Group by and Aggregate functions

Then we can applay aggregate functions:

0.18.1 agg function

Category Values1 Values2

The df.plot method in Pandas offers a variety of parameters to customize

– Description: Matplotlib axes object to which the plot is added.

– Description: Size of the figure (width, height) in inches.

– Description: Create a separate subplot for each column.

– Description: Title of the plot.

– Description: Whether to show grid lines.

– Description: Whether to show the legend.

– Description: Label for the x-axis.

– Description: Label for the y-axis.

• xerr and yerr:

we could clean our data with function like

1.1 Delete duplicates

1.2 Drop rows

1.3 Drop columns

1.4 Drop NaN

You might also like