Exp 25_26
Exp 25_26
Installing Pandas
The first step in working with Pandas is to ensure whether it is installed in the system or not.
If not, then we need to install it on our system using the pip command.
pip install pandas
Importing Pandas
After the Pandas have been installed in the system, we need to import the library. This
module is generally imported as follows:
import pandas as pd
Note: Here, pd is referred to as an alias for the Pandas. However, it is not necessary to
import the library using the alias, it just helps in writing less code every time a method or
property is called.
Data Structures in Pandas Library
Pandas generally provide two data structures for manipulating data. They are:
Series
DataFrame
# simple array
data = [1, 2, 3, 4]
ser = pd.Series(data)
print(ser)
Output
0 1
1 2
2 3
3 4
dtype: int64
The axis labels are collectively called index. Pandas Series is nothing but a column in an excel
sheet.
Labels need not be unique but must be a hashable type. The object supports both integer
and label-based indexing and provides a host of methods for performing operations
involving the index.
Output
0 g
1 e
2 e
3 k
4 s
dtype: object
Creating a series from Lists:
In order to create a series from list, we have to first create a list after that we can create a
series from list.
import pandas as pd
# a simple list
list = ['g', 'e', 'e', 'k', 's']
Output
0 g
1 e
2 e
3 k
4 s
dtype: object
For more details refer to Creating a Pandas Series.
Accessing element of Series
There are two ways through which we can access element of series, they are :
Accessing Element from Series with Position
Accessing Element Using Label (index)
Accessing Element from Series with Position : In order to access the series element refers to
the index number. Use the index operator [ ] to access an element in a series. The index
must be an integer. In order to access multiple elements from a series, we use Slice
operation.
Accessing first 5 elements of Series.
# import pandas and numpy
import pandas as pd
import numpy as np
# creating simple array
data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])
ser = pd.Series(data)
#retrieve the first element
print(ser[5:])
Output
0 g
1 e
2 e
3 k
4 s
dtype: object
Accessing Element Using Label (index) :
In order to access an element from series, we have to set values by index label. A Series is
like a fixed-size dictionary in that you can get and set values by index label.
Accessing a single element using index label.
# import pandas and numpy
import pandas as pd
import numpy as np
# creating simple array
data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])
ser = pd.Series(data,index=[10,11,12,13,14,15,16,17,18,19,20,21,22])
Output
o
For more details refer to Accessing element of Series
Indexing and Selecting Data in Series
Indexing in pandas means simply selecting particular data from a Series. Indexing could
mean selecting all the data, some of the data from particular columns. Indexing can also be
known as Subset Selection.
Indexing a Series using indexing operator [] :
Indexing operator is used to refer to the square brackets following an object.
The .loc and .iloc indexers also use the indexing operator to make selections. In this indexing
operator to refer to df[ ].
# importing pandas module
import pandas as pd
# making data frame
df = pd.read_csv("nba.csv")
ser = pd.Series(df['Name'])
data = ser.head(10)
data
ser = pd.Series(df['Name'])
data = ser.head(10)
data
Output:
a 1
b 6
d 4
e 9
dtype: int64
Now we add two series using .add() function.
# adding two series using
# .add
data.add(data1, fill_value=0)
Output :
Code #2:
# importing pandas module
import pandas as pd
# creating a series
data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])
# creating a series
data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])
print(data, "\n\n", data1)
Output
a 5
b 2
c 3
d 7
dtype: int64
a 1
b 6
d 4
e 9
dtype: int64
Now we subtract two series using .sub function.
# subtracting two series using
# .sub
data.sub(data1, fill_value=0)
Output :
Pandas DataFrame
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous
tabular data structure with labeled axes (rows and columns). A Data frame is a two-
dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.
Pandas DataFrame consists of three principal components, the data, rows, and columns.
Creating a Pandas DataFrame
Pandas DataFrame will be created by loading the datasets from existing storage, storage can
be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists,
dictionary, and from a list of dictionary etc.
Here are some ways by which we create a dataframe:
Creating a dataframe using List: DataFrame can be created using a single list or a list of lists.
import pandas as pd
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks']
# Create DataFrame
df = pd.DataFrame(data)
Output:
For more details refer to Creating a Pandas DataFrame
Table of Content
Dealing with Rows and Columns
Indexing and Selecting Data
Selecting a single row
Working with Missing Data
Iterating over rows and columns
Output:
Row Selection: Pandas provide a unique method to retrieve rows from a Data
frame. DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can
also be selected by passing integer location to an iloc[] function.
Output:
As shown in the output image, two series were returned since there was only one parameter
both of the times.
For more Details refer to Dealing with Rows and Columns
Indexing and Selecting Data in Pandas
Indexing in pandas means simply selecting particular rows and columns of data from a
DataFrame. Indexing could mean selecting all the rows and some of the columns, some of
the rows and all of the columns, or some of each of the rows and columns. Indexing can also
be known as Subset Selection.
Indexing a Dataframe using indexing operator []
Indexing operator is used to refer to the square brackets following an object.
The .loc and .iloc indexers also use the indexing operator to make selections. In this indexing
operator to refer to df[].
In order to select a single column, we simply put the name of the column in-between the
brackets
# importing pandas package
import pandas as pd
Output: