SlideShare a Scribd company logo
pandas is the library function in python which
is used in dataframe works and data science
analytics
Prepared by,
VJ Solutions 2.0
Installation of pandas using command prompt by the following
command:
pip install pandas
Import pandas in our
program
import pandas as pd
INTRODUCTION TO PANDAS :
At the basic level ,Pandas object can be through of as enhanced
versions of NumPy arrays structured arrays in which rows and columns are
identified with labels rether than simple integer indicates.
Pandas is a Python library used for working with data sets. It has functions for
analyzing, cleaning, exploring, and manipulating data. The name "Pandas" has a reference
to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in
2008.
Benefits of Pandas
The benefits of pandas over using other language are as follows:
Data Representation: It represents the data in a form that is
suited for data analysis through its DataFrame and Series.
Clear code: The clear API of the Pandas allows you to focus on
. the core part of the code. So, it provides clear and concise code
for the user
CREATING SERIES FROM ARRAYS:
Before creating a Series, Firstly, we have to import the numpy module and then use array()
function in the program.
import pandas as pd
import numpy as np
info = np.array(['P','a','n','d','a','s'])
a = pd.Series(info)
print(a)
PROGRAM Output
0 P
1 a
2 n
3 d
4 a
5 s
dtype: object
Create a DataFrame using List:
We can easily create a DataFrame in Pandas using list.
import pandas as pd
# a list of strings
x = ['Python', 'Pandas']
# Calling DataFrame constructor on list
df = pd.DataFrame(x)
print(df)
PROGRAM
Output
0
0
Python
1 Pandas
import numpy as np
import pandas as
pd
data = pd.Series([0.25, 0.5, 0.75,
1.0])
print(data)
PROGRAM
:
OUTPUT:
0 0.25
1 0.50
2 0.75
3 1.00
dtype:
float64
As we see in the preceding output, the Series wraps both a sequence of values and a sequence of
indices, which we can access with the values and index attributes. The values are simply a familiar
NumPy array:
data.values Out[3]: array([ 0.25, 0.5 , 0.75, 1. ])
The index is an array-like object of type pd.Index, which we’ll discuss in more detail momentarily:
data.index
Like with a NumPy array, data can be accessed by the associated index via the familiar Python
square-bracket notatioN
we can also slice pandas like numpy using index value within square bracket
data[1] 0.5
data[1:3] Out[6]: 1 0.50
2 0.75
dtype: float64
As we will see, though, the Pandas Series is much more general and flexible
than the one-dimensional NumPy array that it emulates.
Out[4]: RangeIndex(start=0, stop=4,
step=1)
In[11]:
population_dict = {'California': 38332521, 'Texas': 26448193,'New York': 19651127,
'Florida': 19552860,'Illinois': 12882135}
population = pd.Series(population_dict)
print(population)
Out[11]: California 38332521
Floririda 19552860
Illinois 12882135
New York 19651127
Texas 26448193
dtype: int64
By default, a Series will be created where the index is drawn from the sorted keys.
From here, typical dictionary-style item access can be performed:
In[12]:
population['California'] Out[12]: 38332521
Unlike a dictionary, though, the Series also supports array-style operations such as
slicing:
In[13]: population['California':'Illinois']
Out[13]: California 38332521
Florida 19552860
Illinois 12882135
dtype: int64
Data Indexing and Selection:
In Chapter 2, we looked in detail at methods and tools to access, set, and modify
values in NumPy arrays.
These included indexing (e.g., arr[2, 1]), slicing (e.g., arr[:,
1:5]), masking (e.g., arr[arr > 0]), fancy indexing (e.g., arr[0, [1, 5]]), and
combinations thereof (e.g., arr[:, [1, 5]]). Here we’ll look at similar means of
accessing and modifying values in Pandas Series and DataFrame objects.
If you have used the NumPy patterns, the corresponding patterns in Pandas will
feel very familiar, though there are a few quirks to be aware of.We’ll start with the
simple case of the one-dimensional Series object, and then move on to the more
complicated two-dimensional DataFrame object.
Data Selection in Series
As we saw in the previous section, a Series object acts in many ways like a one-
dimensional NumPy array, and in many ways like a standard Python dictionary.
If we keep these two overlapping analogies in mind, it will help us to understand
the patterns of data indexing and selection in these arrays.
Series as dictionary
Like a dictionary, the Series object provides a mapping from a collection of keys
to a collection of values:
In[1]: import pandas as pd
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
print(data)
Out[1]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype:
In[2]:print( data['b']) Out[2]: 0.5
We can also use dictionary-like Python expressions and methods to examine
the keys/indices and values:
In[3]: 'a' in data
Out[3]: True
In[4]: data.keys()
Out[4]: Index(['a', 'b', 'c', 'd'], dtype='object')
In[5]: list(data.items())
Out[5]: [('a', 0.25), ('b', 0.5), ('c', 0.75), ('d', 1.0)]
Series objects can even be modified with a dictionary-like syntax. Just as you can
extend a dictionary by assigning to a new key, you can extend a Series by assigning
to a new index value:
In[6]: data['e'] = 1.25
data
Out[6]:
a 0.25
b 0.50
c 0.75
d 1.00
e 1.25
dtype: float64
Series as one-dimensional array
A Series builds on this dictionary-like interface and provides array-style item
selection via the same basic mechanisms as NumPy arrays—that is, slices,
masking, and fancy indexing.
Examples of these are as follows:
In[7]: # slicing by explicit index
data['a':'c']
Out[7]:
a 0.25
b 0.50
c 0.75
dtype: float64
In[8]: # slicing by implicit integer
index
data[0:2]
Out[8]:
a 0.25
b 0.50
dtype: float64
In[9]: # masking
data[(data > 0.3) & (data < 0.8)]
Out[9]:
b 0.50
c 0.75
dtype: float64
In[10]: # fancy indexing
data[['a', 'e']]
Out[10]:
a 0.25
e 1.25
Among these, slicing may be the source of the most confusion. Notice that
when you are slicing with an explicit index (i.e., data['a':'c']), the final index is
included in the slice, while when you’re slicing with an implicit index
(i.e.,data[0:2]), the final index is excluded from the slice.
Indexers: loc, iloc, and ix
These slicing and indexing conventions can be a source of confusion. For example,
if
your Series has an explicit integer index, an indexing operation such as data[1] will
use the explicit indices, while a slicing operation like data[1:3] will use the implicit
Python-style index.
In[11]: data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])
data
Out[11]:
1 a
3 b
5 c
In[12]: # explicit index when
indexing
data[1]
Out[12]:
'a'
In[13]: # implicit index when slicing
data[1:3]
Out[13]:
3 b
5 c
dtype: object
Because of this potential confusion in the case of integer indexes, Pandas
provides some special indexer attributes that explicitly expose certain indexing
schemes.
These are not functional methods, but attributes that expose a particular slicing
interface to the data in the Series.
First, the loc attribute allows indexing and slicing that always references the
explicit
index:
In[14]: data.loc[1]
Out[14]:
'a'
In[15]: data.loc[1:3]
Out[15]:
1 a
3 b
dtype: object
The iloc attribute allows indexing and slicing that always references the implicit
Python-style index:
In[16]: data.iloc[1]
Out[16]:
'b'
In[17]: data.iloc[1:3]
Out[17]: 3 b
5 c
Data Selection in DataFrame
Recall that a DataFrame acts in many ways like a two-dimensional or
structured array, and in other ways like a dictionary of Series structures sharing
the same index.
These analogies can be helpful to keep in mind as we explore data
selection within this structure.
DataFrame as a dictionary
The first analogy we will consider is the DataFrame as a dictionary of related
Series objects. Let’s return to our example of areas and populations of states:
In[18]:
area = pd.Series({'California': 423967, 'Texas': 695662, 'New York': 141297,
. 'Florida': 170312,'Illinois': 149995})
pop = pd.Series({'California': 38332521, 'Texas': 26448193,'New York': .
. 19651127, 'Florida': 19552860,'Illinois': 12882135})
data = pd.DataFrame({'area':area, 'pop':pop})
data
Out[18]:
area pop
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
New York 141297 19651127
Texas 695662 26448193
The individual Series that make up the columns of the DataFrame can be
accessed
via dictionary-style indexing of the column name:
In[19]: v=data['area']
print(v)
Out[19]:
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64
In[20]: data.area
Out[20]:
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64
In[21]: data.area is data['area']
Out[21]:
True
Though this is a useful shorthand, keep in mind that it does not work for all cases!
For example, if the column names are not strings, or if the column names conflict
with methods of the DataFrame, this attribute-style access is not possible.
For example, the DataFrame has a pop() method, so data.pop will point to this rather
than the "pop" column:
In[22]: data.pop is data['pop']
Out[22]: False
Like with the Series objects discussed earlier, this dictionary-style syntax can also be
used to modify the object, in this case to add a new column:
In[23]: data['density'] = data['pop'] / data['area']
data
Out[23]:
area pop density
California 423967 38332521 90.413926
Florida 170312 19552860 114.806121
Illinois 149995 12882135 85.883763
New York 141297 19651127 139.076746
Texas 695662 26448193 38.018740
DataFrame as two-dimensional array
As mentioned previously, we can also view the DataFrame as an enhanced
two-dimensional array. We can examine the raw underlying data array using the values
attribute:
In[24]: data.values
Out[24]: array([[ 4.23967000e+05, 3.83325210e+07, 9.04139261e+01],
[ 1.70312000e+05, 1.95528600e+07, 1.14806121e+02],
[ 1.49995000e+05, 1.28821350e+07, 8.58837628e+01],
[ 1.41297000e+05, 1.96511270e+07, 1.39076746e+02],
[ 6.95662000e+05, 2.64481930e+07, 3.80187404e+01]])
With this picture in mind, we can do many familiar array-like observations on the
DataFrame itself. For example, we can transpose the full DataFrame to swap rows and
columns:
we can transpose the full DataFrame to swap rows and columns:
In[25]: data.T
Out[25]:
California Florida Illinois New York Texas
area 4.239670e+05 1.703120e+05 1.499950e+05 1.412970e+05 6.956620e+05
pop 3.833252e+07 1.955286e+07 1.288214e+07 1.965113e+07 2.644819e+07
density 9.041393e+01 1.148061e+02 8.588376e+01 1.390767e+02 3.801874e+01
In particular, passing a single index to an array accesses a row:
In[26]: data.values[0] #we can access the single row
Out[26]:
array([ 4.23967000e+05, 3.83325210e+07, 9.04139261e+01])
Thus for array-style indexing, we need another convention. Here Pandas again uses the loc, iloc, and
ix indexers mentioned earlier. Using the iloc indexer, we can index the underlying array as if it is a
simple NumPy array (using the implicit Python-style index), but the DataFrame index and column
labels are maintained in the result:
In[28]: data.iloc[:3, :2]
Out[28]: area pop
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
In[29]: data.loc[:'Illinois', :'pop']
Out[29]: area pop
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
The ix indexer allows a hybrid of these two approaches:
In[30]: data.ix[:3, :'pop']
Out[30]: area pop
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135

More Related Content

Similar to pandasppt with informative topics coverage.pptx (20)

PPTX
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
PPTX
Data Analysis with Python Pandas
Neeru Mittal
 
PPTX
introduction to data structures in pandas
vidhyapm2
 
PPTX
introduction to pandas data structure.pptx
ssuserc26f8f
 
PPTX
PANDAS IN PYTHON (Series and DataFrame)
Harshitha190299
 
PPTX
pandas for series and dataframe.pptx
ssuser52a19e
 
PPTX
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
PPTX
Introducing Pandas Objects.pptx
ssuser52a19e
 
PDF
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
KrishnaJyotish1
 
PPTX
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
PPTX
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
PPTX
Presentation on the basic of numpy and Pandas
ipazhaniraj
 
PPT
Python Panda Library for python programming.ppt
tejaskumbhani111
 
PDF
ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...
DineshThallapelly
 
PPT
Python Pandas
Sunil OS
 
PPTX
Introduction to pandas
Piyush rai
 
PDF
Introduction to Data Analtics with Pandas [PyCon Cz]
Alexander Hendorf
 
PDF
Panda data structures and its importance in Python.pdf
sumitt6_25730773
 
PDF
pandas dataframe notes.pdf
AjeshSurejan2
 
PDF
Python pandas I .pdf gugugigg88iggigigih
rajveerpersonal21
 
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
Data Analysis with Python Pandas
Neeru Mittal
 
introduction to data structures in pandas
vidhyapm2
 
introduction to pandas data structure.pptx
ssuserc26f8f
 
PANDAS IN PYTHON (Series and DataFrame)
Harshitha190299
 
pandas for series and dataframe.pptx
ssuser52a19e
 
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
Introducing Pandas Objects.pptx
ssuser52a19e
 
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
KrishnaJyotish1
 
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
Presentation on the basic of numpy and Pandas
ipazhaniraj
 
Python Panda Library for python programming.ppt
tejaskumbhani111
 
ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...
DineshThallapelly
 
Python Pandas
Sunil OS
 
Introduction to pandas
Piyush rai
 
Introduction to Data Analtics with Pandas [PyCon Cz]
Alexander Hendorf
 
Panda data structures and its importance in Python.pdf
sumitt6_25730773
 
pandas dataframe notes.pdf
AjeshSurejan2
 
Python pandas I .pdf gugugigg88iggigigih
rajveerpersonal21
 

Recently uploaded (20)

PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Research Methodology Overview Introduction
ayeshagul29594
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Ad

pandasppt with informative topics coverage.pptx

  • 1. pandas is the library function in python which is used in dataframe works and data science analytics Prepared by, VJ Solutions 2.0
  • 2. Installation of pandas using command prompt by the following command: pip install pandas Import pandas in our program import pandas as pd
  • 3. INTRODUCTION TO PANDAS : At the basic level ,Pandas object can be through of as enhanced versions of NumPy arrays structured arrays in which rows and columns are identified with labels rether than simple integer indicates. Pandas is a Python library used for working with data sets. It has functions for analyzing, cleaning, exploring, and manipulating data. The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
  • 4. Benefits of Pandas The benefits of pandas over using other language are as follows: Data Representation: It represents the data in a form that is suited for data analysis through its DataFrame and Series. Clear code: The clear API of the Pandas allows you to focus on . the core part of the code. So, it provides clear and concise code for the user
  • 5. CREATING SERIES FROM ARRAYS: Before creating a Series, Firstly, we have to import the numpy module and then use array() function in the program. import pandas as pd import numpy as np info = np.array(['P','a','n','d','a','s']) a = pd.Series(info) print(a) PROGRAM Output 0 P 1 a 2 n 3 d 4 a 5 s dtype: object
  • 6. Create a DataFrame using List: We can easily create a DataFrame in Pandas using list. import pandas as pd # a list of strings x = ['Python', 'Pandas'] # Calling DataFrame constructor on list df = pd.DataFrame(x) print(df) PROGRAM Output 0 0 Python 1 Pandas
  • 7. import numpy as np import pandas as pd data = pd.Series([0.25, 0.5, 0.75, 1.0]) print(data) PROGRAM : OUTPUT: 0 0.25 1 0.50 2 0.75 3 1.00 dtype: float64 As we see in the preceding output, the Series wraps both a sequence of values and a sequence of indices, which we can access with the values and index attributes. The values are simply a familiar NumPy array: data.values Out[3]: array([ 0.25, 0.5 , 0.75, 1. ])
  • 8. The index is an array-like object of type pd.Index, which we’ll discuss in more detail momentarily: data.index Like with a NumPy array, data can be accessed by the associated index via the familiar Python square-bracket notatioN we can also slice pandas like numpy using index value within square bracket data[1] 0.5 data[1:3] Out[6]: 1 0.50 2 0.75 dtype: float64 As we will see, though, the Pandas Series is much more general and flexible than the one-dimensional NumPy array that it emulates. Out[4]: RangeIndex(start=0, stop=4, step=1)
  • 9. In[11]: population_dict = {'California': 38332521, 'Texas': 26448193,'New York': 19651127, 'Florida': 19552860,'Illinois': 12882135} population = pd.Series(population_dict) print(population) Out[11]: California 38332521 Floririda 19552860 Illinois 12882135 New York 19651127 Texas 26448193 dtype: int64 By default, a Series will be created where the index is drawn from the sorted keys. From here, typical dictionary-style item access can be performed: In[12]: population['California'] Out[12]: 38332521
  • 10. Unlike a dictionary, though, the Series also supports array-style operations such as slicing: In[13]: population['California':'Illinois'] Out[13]: California 38332521 Florida 19552860 Illinois 12882135 dtype: int64
  • 11. Data Indexing and Selection: In Chapter 2, we looked in detail at methods and tools to access, set, and modify values in NumPy arrays. These included indexing (e.g., arr[2, 1]), slicing (e.g., arr[:, 1:5]), masking (e.g., arr[arr > 0]), fancy indexing (e.g., arr[0, [1, 5]]), and combinations thereof (e.g., arr[:, [1, 5]]). Here we’ll look at similar means of accessing and modifying values in Pandas Series and DataFrame objects. If you have used the NumPy patterns, the corresponding patterns in Pandas will feel very familiar, though there are a few quirks to be aware of.We’ll start with the simple case of the one-dimensional Series object, and then move on to the more complicated two-dimensional DataFrame object.
  • 12. Data Selection in Series As we saw in the previous section, a Series object acts in many ways like a one- dimensional NumPy array, and in many ways like a standard Python dictionary. If we keep these two overlapping analogies in mind, it will help us to understand the patterns of data indexing and selection in these arrays. Series as dictionary Like a dictionary, the Series object provides a mapping from a collection of keys to a collection of values: In[1]: import pandas as pd data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd']) print(data) Out[1]: a 0.25 b 0.50 c 0.75 d 1.00 dtype:
  • 13. In[2]:print( data['b']) Out[2]: 0.5 We can also use dictionary-like Python expressions and methods to examine the keys/indices and values: In[3]: 'a' in data Out[3]: True In[4]: data.keys() Out[4]: Index(['a', 'b', 'c', 'd'], dtype='object') In[5]: list(data.items()) Out[5]: [('a', 0.25), ('b', 0.5), ('c', 0.75), ('d', 1.0)]
  • 14. Series objects can even be modified with a dictionary-like syntax. Just as you can extend a dictionary by assigning to a new key, you can extend a Series by assigning to a new index value: In[6]: data['e'] = 1.25 data Out[6]: a 0.25 b 0.50 c 0.75 d 1.00 e 1.25 dtype: float64
  • 15. Series as one-dimensional array A Series builds on this dictionary-like interface and provides array-style item selection via the same basic mechanisms as NumPy arrays—that is, slices, masking, and fancy indexing. Examples of these are as follows: In[7]: # slicing by explicit index data['a':'c'] Out[7]: a 0.25 b 0.50 c 0.75 dtype: float64
  • 16. In[8]: # slicing by implicit integer index data[0:2] Out[8]: a 0.25 b 0.50 dtype: float64 In[9]: # masking data[(data > 0.3) & (data < 0.8)] Out[9]: b 0.50 c 0.75 dtype: float64 In[10]: # fancy indexing data[['a', 'e']] Out[10]: a 0.25 e 1.25
  • 17. Among these, slicing may be the source of the most confusion. Notice that when you are slicing with an explicit index (i.e., data['a':'c']), the final index is included in the slice, while when you’re slicing with an implicit index (i.e.,data[0:2]), the final index is excluded from the slice. Indexers: loc, iloc, and ix These slicing and indexing conventions can be a source of confusion. For example, if your Series has an explicit integer index, an indexing operation such as data[1] will use the explicit indices, while a slicing operation like data[1:3] will use the implicit Python-style index. In[11]: data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5]) data Out[11]: 1 a 3 b 5 c
  • 18. In[12]: # explicit index when indexing data[1] Out[12]: 'a' In[13]: # implicit index when slicing data[1:3] Out[13]: 3 b 5 c dtype: object Because of this potential confusion in the case of integer indexes, Pandas provides some special indexer attributes that explicitly expose certain indexing schemes. These are not functional methods, but attributes that expose a particular slicing interface to the data in the Series.
  • 19. First, the loc attribute allows indexing and slicing that always references the explicit index: In[14]: data.loc[1] Out[14]: 'a' In[15]: data.loc[1:3] Out[15]: 1 a 3 b dtype: object The iloc attribute allows indexing and slicing that always references the implicit Python-style index: In[16]: data.iloc[1] Out[16]: 'b' In[17]: data.iloc[1:3] Out[17]: 3 b 5 c
  • 20. Data Selection in DataFrame Recall that a DataFrame acts in many ways like a two-dimensional or structured array, and in other ways like a dictionary of Series structures sharing the same index. These analogies can be helpful to keep in mind as we explore data selection within this structure. DataFrame as a dictionary The first analogy we will consider is the DataFrame as a dictionary of related Series objects. Let’s return to our example of areas and populations of states:
  • 21. In[18]: area = pd.Series({'California': 423967, 'Texas': 695662, 'New York': 141297, . 'Florida': 170312,'Illinois': 149995}) pop = pd.Series({'California': 38332521, 'Texas': 26448193,'New York': . . 19651127, 'Florida': 19552860,'Illinois': 12882135}) data = pd.DataFrame({'area':area, 'pop':pop}) data Out[18]: area pop California 423967 38332521 Florida 170312 19552860 Illinois 149995 12882135 New York 141297 19651127 Texas 695662 26448193
  • 22. The individual Series that make up the columns of the DataFrame can be accessed via dictionary-style indexing of the column name: In[19]: v=data['area'] print(v) Out[19]: California 423967 Florida 170312 Illinois 149995 New York 141297 Texas 695662 Name: area, dtype: int64 In[20]: data.area Out[20]: California 423967 Florida 170312 Illinois 149995 New York 141297 Texas 695662 Name: area, dtype: int64
  • 23. In[21]: data.area is data['area'] Out[21]: True Though this is a useful shorthand, keep in mind that it does not work for all cases! For example, if the column names are not strings, or if the column names conflict with methods of the DataFrame, this attribute-style access is not possible. For example, the DataFrame has a pop() method, so data.pop will point to this rather than the "pop" column: In[22]: data.pop is data['pop'] Out[22]: False
  • 24. Like with the Series objects discussed earlier, this dictionary-style syntax can also be used to modify the object, in this case to add a new column: In[23]: data['density'] = data['pop'] / data['area'] data Out[23]: area pop density California 423967 38332521 90.413926 Florida 170312 19552860 114.806121 Illinois 149995 12882135 85.883763 New York 141297 19651127 139.076746 Texas 695662 26448193 38.018740
  • 25. DataFrame as two-dimensional array As mentioned previously, we can also view the DataFrame as an enhanced two-dimensional array. We can examine the raw underlying data array using the values attribute: In[24]: data.values Out[24]: array([[ 4.23967000e+05, 3.83325210e+07, 9.04139261e+01], [ 1.70312000e+05, 1.95528600e+07, 1.14806121e+02], [ 1.49995000e+05, 1.28821350e+07, 8.58837628e+01], [ 1.41297000e+05, 1.96511270e+07, 1.39076746e+02], [ 6.95662000e+05, 2.64481930e+07, 3.80187404e+01]]) With this picture in mind, we can do many familiar array-like observations on the DataFrame itself. For example, we can transpose the full DataFrame to swap rows and columns:
  • 26. we can transpose the full DataFrame to swap rows and columns: In[25]: data.T Out[25]: California Florida Illinois New York Texas area 4.239670e+05 1.703120e+05 1.499950e+05 1.412970e+05 6.956620e+05 pop 3.833252e+07 1.955286e+07 1.288214e+07 1.965113e+07 2.644819e+07 density 9.041393e+01 1.148061e+02 8.588376e+01 1.390767e+02 3.801874e+01 In particular, passing a single index to an array accesses a row: In[26]: data.values[0] #we can access the single row Out[26]: array([ 4.23967000e+05, 3.83325210e+07, 9.04139261e+01])
  • 27. Thus for array-style indexing, we need another convention. Here Pandas again uses the loc, iloc, and ix indexers mentioned earlier. Using the iloc indexer, we can index the underlying array as if it is a simple NumPy array (using the implicit Python-style index), but the DataFrame index and column labels are maintained in the result: In[28]: data.iloc[:3, :2] Out[28]: area pop California 423967 38332521 Florida 170312 19552860 Illinois 149995 12882135 In[29]: data.loc[:'Illinois', :'pop'] Out[29]: area pop California 423967 38332521 Florida 170312 19552860 Illinois 149995 12882135 The ix indexer allows a hybrid of these two approaches: In[30]: data.ix[:3, :'pop'] Out[30]: area pop California 423967 38332521 Florida 170312 19552860 Illinois 149995 12882135