Pandas
Pandas
By MS SAIKUMAR
Package overview
pandas is a Python package providing fast, flexible,
and expressive data structures designed to make
working with “relational” or “labeled” data both
easy and intuitive.
It aims to be the fundamental high-level building
block for doing practical, real-world data analysis in
Python.
Additionally, it has the broader goal of becoming the
most powerful and flexible open source data
analysis/manipulation tool available in any language.
It is already well on its way toward this goal.
pandas is well suited for many different kinds of data:
• Tabular data with heterogeneously-typed columns, as in an SQL
table or Excel spreadsheet
• Ordered and unordered (not necessarily fixed-frequency) time
series data.
• Arbitrary matrix data (homogeneously typed or heterogeneous)
with row and column labels
• Any other form of observational / statistical data sets. The data
need not be labeled at all to be placed into a pandas data
structure.
The two primary data structures of pandas,
1. Series (1-dimensional) and
2. DataFrame (2-dimensional),
handle the vast majority of typical use cases in finance, statistics,
social science, and many areas of engineering.
Here are just a few of the things that pandas does well:
• Easy handling of missing data (represented as NaN) in floating point
as well as non-floating point data
• Size mutability: columns can be inserted and deleted from
DataFrame and higher dimensional objects
• Automatic and explicit data alignment: objects can be explicitly
aligned to a set of labels, or the user can simply ignore the labels and
let Series, DataFrame, etc. automatically align the data for you in
computations
• Powerful, flexible group by functionality to perform split-apply-
combine operations on data sets, for both aggregating and
transforming data
• Make it easy to convert ragged, differently-indexed data in other
Python and NumPy data structures into DataFrame objects
• Intelligent label-based slicing, fancy indexing, and subsetting of large
data sets
• Intuitive merging and joining data sets
Data structures
Dimensions Name Description
1 Series 1D labeled homogeneously-
typed array
Team Contributors
https://pandas.pydata.org/about/team.html
Creating a Series:
We can create a Series in two ways:
• Create an empty Series
• Create a Series using inputs.
Create an Empty Series:
• We can easily create an empty series in Pandas which means it will
not have any value.
The syntax that is used for creating an Empty Series:
<series object> = pandas.Series()
Create a DataFrame:
dict
Lists
Numpy ndarrrays
Series
Create an empty DataFrame
The below code shows how to create an empty DataFrame in Pandas:
# using Dictionary
import pandas as pd
info = {'ID' :[101, 102, 103],'Department' :['B.Sc','B.Tech','M.Tech',]}
df = pd.DataFrame(info)
print (df)
Example:
import pandas as pd