Introducing Pandas String Operations & Plots
Introducing Pandas String Operations & Plots
# In[5]
monte.str.lower()
# Out[5]
0 graham chapman
1 john cleese
2 terry gilliam
3 eric idle
4 terry jones
5 michael palin
dtype: object
# In[6]
monte.str.len()
# Out[6]
0 14
1 11
2 13
3 9
4 11
5 13
dtype: int64
# In[7]
monte.str.startswith('T')
# Out[7]
0 False
1 False
2 True
3 False
4 True
5 False
dtype: bool
# In[8]
monte.str.split()
# Out[8]
0 [Graham, Chapman]
1 [John, Cleese]
2 [Terry, Gilliam]
3 [Eric, Idle]
4 [Terry, Jones]
5 [Michael, Palin]
dtype: object
Methods Using Regular Expression
There are several methods that accept regular expression (regexps) to examine the
content of each string element, and follow some of the API conventions of Python's
built-in re module.
Mapping between Pandas methods and functions in Python's re module
Metho
Description
d
match Calls re.match on each element, returning a Boolean
Calls re.match on each element, returning matched groups as
extract
strings
findall Calls re.findall on each element
replace Replaces occurrences of pattern with some other string
contains Calls re.search on each element, returning a Boolean
count Counts occurrences of pattern
split Equivalent to str.split, but accepts regexps
rsplit Equivalent to str.rsplit, but accepts regexps
With these, we can do a wide range of operations.
# In[9]
monte.str.extract('([A-Za-z]+)',expand=False)
# Out[9]
0 Graham
1 John
2 Terry
3 Eric
4 Terry
5 Michael
dtype: object
# In[10]
monte.str.findall(r'^[^AEIOU].*[^aeiou]$')
# Out[10]
0 [Graham Chapman]
1 []
2 [Terry Gilliam]
3 []
4 [Terry Jones]
5 [Michael Palin]
dtype: object
In here, start-of-string(^) and end-of-string($) are used as regular expression
characters.
If you want to know more about Pandas string methods and regular expressions, reference
these urls :
1. About string methods
2. About regular expressions
Miscellaneous Methods
Other Pandas string methods
Method Description
get Indexes each element
slice Slices each element
slice_replace Replaces slice in each element with the passed value
cat Concatenates strings
repeat Repeats values
normalize Returns Unicode form of strings
pad Adds whitespace to left, right, or both sides of strings
wrap Splits long strings into lines with length less than a given width
join Joins strings in each element of the Series with the passed separator
get_dummie
Extracts dummy variable as a DataFrame
s
Vectorized item access and slicing
The get and slice operations, in particular, enable vectorized element access from each
array.
We can get a slice of the first three characters of each array using str.sliec(0,3)
This behavior is also available through Python's normal indexing
syntax; df.str.slice(0,3) is equivalent to df.str[0,3]
# In[11]
monte.str[0:3]
# Out[11]
0 Gra
1 Joh
2 Ter
3 Eri
4 Ter
5 Mic
dtype: object
Indexing via df.str.get(i) and df.str[i] are likewise similar.
These indexing methods also let you access elements of arrays returned by split
# In[12]
monte.str.split().str[-1]
# Out[12]
0 Chapman
1 Cleese
2 Gilliam
3 Idle
4 Jones
5 Palin
dtype: object
Indicator variables
get_dummies method is useful when your data has a column containing some sort of
coded indicator.
# In[13]
full_monte=pd.DataFrame({'name':monte,
'info':['B | C | D','B | D','A | C','B | D','B | C','B | C | D']})
full_monte
# Out[13]
name info
0 Graham Chapman B | C | D
1 John Cleese B|D
2 Terry Gilliam A|C
3 Eric Idle B|D
4 Terry Jones B|C
5 Michael Palin B|C|D
The get_dummies routine lets us split out these indicator variables into a DataFrame.
# In[14]
full_monte['info'].str.get_dummies('|')
# Out[14]
A B C D
0 0 1 1 1
1 0 1 0 1
2 1 0 1 0
3 0 1 0 1
4 0 1 1 0
5 0 1 1 1
With these operations as building blocks, you can construct an endless range of string
processing procedures when cleaning your data.
Plotting with pandas and matplotlib: Line Plots, Bar Plots, Histograms and Density
Plots, Scatter or Point Plots.
We have different types of plots in matplotlib library which can help us to make a suitable
graph as you needed. As per the given data, we can make a lot of graph and with the help of
pandas, we can create a dataframe before doing plotting of data. Let’s discuss the different
types of plot in matplotlib by using Pandas.
Use these commands to install matplotlib, pandas and numpy:
pip install matplotlib
pip install pandas
pip install numpy
Types of Plots:
Basic plotting: In this basic plot we can use the randomly generated data to plot graph
using series and matplotlib.
Python3
# import libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
plt.show()
Output:
lot of different data: Using more than one list of data in a plot.
Python3
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(1000, 4),
index = ts.index, columns = list('ABCD'))
df = df.cumsum()
plt.figure()
df.plot()
plt.show()
Output:
Plot on given axis: We can explicitly define the name of axis and plot the data on the
basis of this axis.
Python3
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df3['A'] = pd.Series(list(range(len(df))))
df3.plot(x ='A', y ='B')
plt.show()
Output:
Bar plot using matplotlib: Find different types of bar plot to clearly understand the
behaviour of given data.
Python3
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df3['A'] = pd.Series(list(range(len(df))))
df3.iloc[5].plot.bar()
plt.axhline(0, color ='k')
plt.show()
Output:
Histograms:
Python3
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df4.plot.hist(alpha = 0.5)
plt.show()
Output:
Box plot using Series and matplotlib: Use box to plot the data of dataframe.
Python3
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10, 5),
columns =['A', 'B', 'C', 'D', 'E'])
df.plot.box()
plt.show()
Output:
Density plot:
Python3
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10, 5),
columns =['A', 'B', 'C', 'D', 'E'])
ser = pd.Series(np.random.randn(1000))
ser.plot.kde()
plt.show()
Output:
Area plot using matplotlib:
Python3
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10, 5),
columns =['A', 'B', 'C', 'D', 'E'])
df.plot.area()
plt.show()
Output:
Scatter plot:
Python3
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(500, 4),
columns =['a', 'b', 'c', 'd'])
Output:
Hexagonal Bin Plot:
Python3
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
Output:
Pie plot:
Python3
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
Output: