Pandas
Pandas
Pandas Introduction
Pandas is defined as an open-source library that provides high-performance data manipulation in Python. The name
of Pandas is derived from the word Panel Data, which means an Econometrics from Multidimensional data. It is used
for data analysis in Python and developed by Wes McKinney in 2008.
Data analysis requires lots of processing, such as restructuring, cleaning or merging, etc. There are different tools are
available for fast data processing, such as Numpy, Scipy, Cython, and Panda. But we prefer Pandas because working
with Pandas is fast, simple and more expressive than other tools.
Pandas is built on top of the Numpy package, means Numpy is required for operating the Pandas.
Before Pandas, Python was capable for data preparation, but it only provided limited support for data analysis. So,
Pandas came into the picture and enhanced the capabilities of data analysis. It can perform five significant steps
required for processing and analysis of data irrespective of the origin of the data, i.e., load, manipulate, prepare,
model, and analyze.
Benefits of Pandas
The benefits of pandas over using other language are as follows:
Data Representation: It represents the data in a form that is suited for data analysis through its DataFrame
and Series.
Clear code: The clear API of the Pandas allows you to focus on the core part of the code. So, it provides clear
and concise code for the user.
Installation of Pandas
If you have Python and PIP already installed on a system, then installation of Pandas is very easy.
Install it using this command:
C:\Users\Your Name>pip install pandas
Import Pandas
Once Pandas is installed, import it in your applications by adding the import keyword:
import pandas
#example
import pandas
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pandas.DataFrame(mydataset)
print(myvar)
Pandas as pd
Pandas is usually imported under the pd alias.
import pandas as pd
print(pd.__version__)
1) Series
It is defined as a one-dimensional array that is capable of storing various data types. The row labels of series are
called the index. We can easily convert the list, tuple, and dictionary into series using "series' method. A Series
cannot contain multiple columns. It has one parameter:
Data: It can be any list, dictionary, or scalar value.
Input
import pandas as pd
import numpy as np
info = np.array(['P','a','n','d','a','s'])
a = pd.Series(info)
print(a)
Output
0 P
1 a
2 n
3 d
4 a
5 s
dtype: object
Explanation: In this code, firstly, we have imported the pandas and numpy library with the pd and np alias. Then, we
have taken a variable named "info" that consist of an array of some values. We have called the info variable through
a Series method and defined it in an "a" variable. The Series has printed by calling the print(a) method.
The columns can be heterogeneous types like int, bool, and so on.
It can be seen as a dictionary of Series structure where both the rows and columns are indexed. It is denoted
as "columns" in case of columns and "index" in case of rows.
import pandas as pd
# a list of strings
x = ['Python', 'Pandas']
Output
0
0 Python
1 Pandas
Explanation: In this code, we have defined a variable named "x" that consist of string values. The DataFrame
constructor is being called on a list to print the values.