0% found this document useful (0 votes)
17 views

UNIT-5 NOTES

Uploaded by

riddhijain1003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

UNIT-5 NOTES

Uploaded by

riddhijain1003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Notes on Python Programing

UNIT 5: PYTHON PACKAGES

Python Packages:
● A Package is a file containing Python definitions (i.e. functions) and
statements.
● Standard library of Python is extended as a Python package(s) to a
programmer.
● Definitions from the Package/module can be used within the code of
a program. To use these modules in the program, a programmer
needs to import the Package/module.
● Once you import a module, you can reference (use)any of its
functions or variables in your code.
● There are many ways to import a module in your program, the one’s
which you should know are:
i. import
ii. From

● It is the simplest and most common way to use modules in our code.
Its syntax is:
import modulename1 [,modulename2, ---------]

Example:
>>> import random
● On execution of this statement, Python will
(i) search for the file ‘random.py’.
(ii) Create space where modules definition & variable will be created,
(iii) then execute the statements in the module.
● Now the definitions of the module will become part of the code in
which the module was imported.
● To use/ access/invoke a function, you will specify the module name
and name of the function- separated by dot (.).
● This format is also known as dot notation.
Example:
>>>random.random ( )
From Statement:
● It is used to get a specific function in the code instead of the
complete module file.
● If we know beforehand which function(s)we will be needing, then we
may use from.
● For modules having large no. of functions, it is recommended to use
from instead of import.
● Its syntax is
>>> from modulename import functionname [, functionname…..]

Example:
>>> from random import randint

NUMPY:
● NumPy stands for ‘Numerical Python’. It is a package for data
analysis and scientific computing with Python.
● NumPy uses a multidimensional array object, and has functions and
tools for working with these arrays.
● The powerful n-dimensional array in NumPy speeds-up data
processing.
● NumPy can be easily interfaced with other Python packages and
provides tools for integrating with other programming languages like
C, C++ etc.

Installing NumPy:
● NumPy can be installed by typing following command:
pip install NumPy

Array:
● An array is a data type used to store multiple values using a single
identifier (variable name).
● An array contains an ordered collection of data elements where each
element is of the same type and can be referenced by its index
(position).
● The important characteristics of an array are:
• Each element of the array is of the same data type, though the
values stored in them may be different.
• The entire array is stored contiguously in memory. This makes
operations on arrays fast.
• Each element of the array is identified or referred using the name of
the Array along with the index of that element, which is unique for
each element.
• The index of an element is an integral value associated with the
element, based on the element’s position in the array.
For example consider an array with 5 numbers:
[ 10, 9, 99, 71, 90 ]

NumPy Array:
● NumPy arrays are used to store lists of numerical data, vectors and
matrices.
● The NumPy library has a large set of built-in functions for creating,
manipulating, and transforming NumPy arrays.
● The Python language also has an array data structure, but it is not
as versatile, efficient and useful as the NumPy array. The NumPy
array is officially called ndarray but commonly known as array.

Difference Between List and Array:

List Array
List can have elements of different All elements of an array are of same
data types for example, [1,3.4, ‘hello’, data type for example, an array of
‘a@’] floats may be: [1.2, 5.4, 2.7]

Elements of a list are not stored Array elements are stored in


contiguously in memory. contiguous memory locations. This
makes operations on arrays faster
than lists.

Lists do not support element wise Arrays support element wise


operations, for example, addition, operations. For example, if A1 is an
multiplication, etc. because array, it is possible to say A1/3 to
elements may not be of same type divide each element of the array by
3.

Lists can contain objects of NumPy array takes up less space in


different datatype that Python must memory as compared to a list
store the type information for every because arrays do not require to
element along with its element store datatype of each element
value. Thus lists take more space in separately.
memory and are less efficient.

List is a part of core Python Array (ndarray) is a part of NumPy


library.
Creation of NumPy Arrays from List:
● There are several ways to create arrays.
● To create an array and to use its methods, first we need to import the
NumPy library.

#NumPy is loaded as np numpy must be written in lowercase


>>> import numpy as np

● The NumPy’s array() function converts a given list into an array.


● For example,
#Create an array called array1 from the given list.
>>> array1 = np.array([10,20,30])
#Display the contents of the array
>>> array1 array([10, 20, 30])

Creating a 1-D Array:


● An array with only a single row of elements is called a 1-D array.
● Let us try to create a 1-D array from a list which contains numbers as
well as strings.
>>> array2 = np.array([5,-7.4,'a',7.2])
>>> array2
array(['5', '-7.4', 'a', '7.2'],
dtype='<U32') #U32 means Unicode-32 data type.
● Observe that since there is a string value in the list, all integer and
float values have been promoted to string, while converting the list to
array.

Creating a 2-D Array:


● We can create two dimensional (2-D) arrays by passing nested lists to
the array( ) function.
Example
>>> array3 = np.array([[2.4,3], [4.91,7],[0,-1]])
>>> array3
array ( [ [ 2.4 , 3. ],
[ 4.91, 7. ],
[ 0. , -1. ] ] )
● Observe that the integers 3, 7, 0 and -1 have been promoted to floats.
Attributes of NumPy Array:
1. ndarray.ndim: gives the number of dimensions of the array as an
integer value.
Example
>>> array1.ndim 1
2. ndarray.shape: It gives the sequence of integers indicating the size of
the array for each dimension.
Example:
>>> array3.shape (3, 2)
3. ndarray.size: It gives the total number of elements of the array. This is
equal to the product of the elements of shape.
Example
>>> array1.size 3
4. ndarray.dtype: is the data type of the elements of the array. All the
elements of an array are of the same data type. Common data types
are int32, int64, float32, float64, U32, etc.
Example
>>> array1.dtype
dtype('int32')
>>> array2.dtype
dtype('<U32>')
>>> array3.dtype
dtype('float64')
5. ndarray.itemsize: It specifies the size in bytes of each element of the
array.
Example:
>>> array1.itemsize 4 # memory allocated to integer
>>> array2.itemsize 128 # memory allocated to string
>>> array3.itemsize 8 #memory allocated to float type

Other Ways of Creating NumPy Arrays:


● We can specify data type (integer, float, etc.) while creating an array
using dtype as an argument to array().
● This will convert the data automatically to the mentioned type.

>>> array4 = np.array( [ [1,2], [3,4] ], dtype=float)


>>> array4
array([[1., 2.],
[3., 4.]])
● We can create an array with numbers in a given range and sequence
using the arange() function.
>>> array7 = np.arange(6)
>>> array7
array ( [0, 1, 2, 3, 4, 5] )

INDEXING AND SLICING: NumPy arrays can be indexed, sliced and


iterated over.

Indexing:
● For 2-D arrays indexing for both dimensions starts from 0, and each
element is referenced through two indexes i and j, where i represents
the row number and j represents the column number.

Name Maths English Science

Ramesh 78 67 56

Ramesh 76 75 47

Harun 84 59 60

Prasad 67 72 54

● Here, marks[i,j] refers to the element at (i+1)th row and (j+1)th column
because the index values start at 0.
● Thus marks[3,1] is the element in the 4th row and second column
which is 72 (marks of Prasad in English).

Slicing:
● Sometimes we need to extract part of an array. This is done through
slicing.
● We can define which part of the array to be sliced by specifying the
start and end index values using [start : end] along with the array
name.
E.g. 1
>>> array8 array([-2, 2, 6, 10, 14, 18, 22])
>>> array8[3:5] # excludes the value at the end index
array([10, 14])

E.g. 2
>>> array8[ : : -1] # reverse the array
array([22, 18, 14, 10, 6, 2, -2])

>>> array9 = np.array ( [ [ -7, 0, 10, 20],


[ -5, 1, 40, 200],
[ -1, 1, 4, 30 ] ] )
# access all the elements in the 3rd column
>>> array9[0:3,2]
array([10, 40, 4])

Operations on Arrays:
1. Arithmetic Operations:

#Element-wise addition of two matrices.


>>> array1 = np.array([[3,6],[4,2]])
>>> array2 = np.array([[10,20],[15,12]])
>>> array1 + array2
array([[13, 26],
[19, 14]])

#Subtraction
>>> array1 - array2
array([[ -7, -14],
[-11, -10]])

#Multiplication
>>> array1 * array2
array([[ 30, 120],
[ 60, 24]])

#Matrix Multiplication
>>> array1 @ array2
array([[120, 132],
[ 70, 104]])

#Exponentiation
>>> array1 ** 3
array([[ 27, 216],
[ 64, 8]], dtype=int32)
#Division
>>> array2 / array1
array([[3.33333333, 3.33333333],
[3.75 , 6. ]])

#Element wise Remainder of Division (Modulo)


>>> array2 % array1
array([[1, 2],
[3, 0]], dtype=int32)

2. Transpose: Transposing an array turns its rows into columns


and columns into rows just like matrices in mathematics.

>>> array3 = np.array([[10,-7,0, 20],


[-5,1,200,40],
[30,1,-1,4]])
>>> array3
array([[ 10, -7, 0, 20], [ -5, 1, 200, 40], [ 30, 1, -1, 4]])

# the original array does not change


>>> array.transpose()
array([[ 10, -5, 30],
[ -7, 1, 1],
[ 0, 200, -1],
[ 20, 40, 4]])

3. Sorting: Sorting is to arrange the elements of an array in


hierarchical order either ascending or descending. By
default, numpy does sorting in ascending order.
>>> array4 = np.array([1,0,2,-3,6,8,4,7])
>>> array4.sort()
>>> array4
array([-3, 0, 1, 2, 4, 6, 7, 8])

Concatenating Arrays:
● Concatenation means joining two or more arrays.
● Concatenating 1-D arrays means appending the sequences one after
another.
● NumPy.concatenate() function can be used to concatenate two or
more 2-D arrays either row-wise or column-wise.
● All the dimensions of the arrays to be concatenated must match
exactly except for the dimension or axis along which they need to be
joined.
● Any mismatch in the dimensions results in an error. By default, the
concatenation of the arrays happens along axis=0.

>>> array1 = np.array([[10, 20], [-30,40]])


>>> array2 = np.zeros((2, 3), dtype=array1. dtype)
>>> np.concatenate((array1,array2), axis=1)
array([[ 10, 20, 0, 0, 0], [-30, 40, 0, 0, 0]])

Reshaping Arrays:
● We can modify the shape of an array using the reshape() function.
● Reshaping an array cannot be used to change the total number of
elements in the array.
● Attempting to change the number of elements in the array using
reshape() results in an error.

>>> array3 = np.arange(10,22)


>>> array3
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21])
>>> array3.reshape(3,4)
array([[10, 11, 12, 13],
[14, 15, 16, 17],
[18, 19, 20, 21]])

Splitting Arrays:
● We can split an array into two or more subarrays.
● numpy.split() splits an array along the specified axis.
● We can either specify a sequence of index values where an array is to
be split; or we can specify an integer N, that indicates the number of
equal parts in which the array is to be split, as parameter(s) to the
NumPy.split() function.
● By default, NumPy.split() splits along axis = 0. Consider the array given
below:

>>> array4
array ( [ [ 10, -7, 0, 20],
[ -5, 1, 200, 40],
[ 30, 1, -1, 4],
[ 1, 2, 0, 4],
[ 0, 1, 0, 2 ] ] )

# [1,3] indicate the row indices on which to split the array


>>> first, second, third = numpy split(array4, [1, 3])

# array4 is split on the first row and stored on the sub-array


first
>>> first
array ( [ [10, -7, 0, 20 ] ] )

# array4 is split after the first row and upto the third row and
stored on the sub-array second
>>> second
array ( [ [ -5, 1, 200, 40],
[ 30, 1, -1, 4 ] ] )

# the remaining rows of array4 are stored on the sub-array


third
>>> third
array ( [ [1, 2, 0, 4],
[0, 1, 0, 2 ] ] )

Statistical Operations on Arrays: NumPy provides functions to perform


many useful statistical operations on arrays

Let us consider two arrays:


>>> arrayA = np.array([1,0,2,-3,6,8,4,7])
>>> arrayB = np.array([[3,6],[4,2]])

1. The max() function finds the maximum element from an array.

# max element form the whole 1-D array


>>> arrayA.max()
8
# max element form the whole 2-D array
>>> arrayB.max()
6

2. The min() function finds the minimum element from an array.


>>> arrayA.min()
-3
>>> arrayB.min()
2
>>> arrayB.min(axis=0)
array([3, 2])

3. The sum() function finds the sum of all elements of an array.


>>> arrayA.sum()
25
>>> arrayB.sum()
15
#axis is used to specify the dimension on which sum is to be
made. Here axis = 1 means the sum of elements on the first row
>>> arrayB.sum(axis=1)
array([9, 6])

4. The mean() function finds the average of elements of the array.

>>> arrayA.mean()
3.125
>>> arrayB.mean()
3.75
>>> arrayB.mean(axis=0)
array([3.5, 4. ])
>>> arrayB.mean(axis=1)
array([4.5, 3. ])

5. The std() function is used to find standard deviation of an array


of elements.

>>> arrayA.std()
3.550968177835448
>>> arrayB.std()
1.479019945774904
>>> arrayB.std(axis=0)
array([0.5, 2. ])
>>> arrayB.std(axis=1)
array([1.5, 1. ])

Loading Arrays from Files:


● Sometimes, we may have data in files and we may need to load
that data in an array for processing. numpy.
● loadtxt() and numpy.genfromtxt()are the two functions that can
be used to load data from text files.
● The most commonly used file type to handle large amounts of
data is called CSV (Comma Separated Values).
● Each row in the text file must have the same number of values in
order to load data from a text file into a numpy array.
● Let us say we have the following data in a text file named
data.txt stored in the folder C:/PYTHON
RollNo Marks1 Marks2 Marks3
1 36 18 57

2 22 23 45

3 43 51 37

4 41 40 60

5 13 18 37
.
● We can load the data from the data.txt file into an array say,
studentdata in the following manner:

>>> studentdata = np.loadtxt('C:/PYTHON/ data.txt', skiprows=1,


delimiter=',', dtype = int)

● The parameter skiprows=1 indicates that the first row is the header
row and therefore we need to skip it as we do not want to load it in
the array.
● The delimiter specifies whether the values are separated by comma,
semicolon, tab or space (the four are together called whitespace), or
any other character. The default value for delimiter is space.
● We can also specify the data type of the array to be created by
specifying through the dtype argument. By default, dtype is float.

PANDAS:
● PANDAS (PANel DAta) is a high-level data manipulation tool used for
analyzing data.
● It is very easy to import and export data using Pandas library which
has a very rich set of functions.
● It is built on packages like NumPy and Matplotlib and gives us a
single, convenient place to do most of our data analysis and
visualization work.
● Pandas has three important data structures, namely – Series,
DataFrame and Panel to make the process of analyzing data
organized, effective and efficient.

What the need for Pandas is when NumPy can be used for
data analysis. Following are some of the differences between Pandas
and Numpy:
1. A Numpy array requires homogeneous data, while a Pandas
DataFrame can have different data types (float, int, string, datetime,
etc.).
2. Pandas have a simpler interface for operations like file loading,
plotting, selection, joining, GROUP BY, which come very handy in
data-processing applications.
3. Pandas DataFrames (with column names) make it very easy to keep
track of data.
4. Pandas is used when data is in Tabular Format, whereas Numpy is
used for numeric array based data manipulation.

Installing Pandas
● Installing Pandas is very similar to installing NumPy. To install Pandas
from command line, we need to type in:
pip install pandas

Data Structure in Pandas


● A data structure is a collection of data values and operations that
can be applied to that data.
● It enables efficient storage, retrieval and modification to the data.
● Two commonly used data structures in Pandas are:
• Series
• DataFrame

Series:
● A Series is a one-dimensional array containing a sequence of values
of any data type (int, float, list, string, etc) which by default have
numeric data labels starting from zero.
● The data label associated with a particular value is called its index.
● We can also assign values of other data types as indexes.
● We can imagine a Pandas Series as a column in a spreadsheet.
● Example of a series containing names of students is given below:
Index Value
0 Arnab
1 Samridhi
2 Ramit
3 Divyam
4 Kritika
Creation of Series: There are different ways in which a series can be
created in Pandas. To create or use series, we first need to import the
Pandas library.
(A) Creation of Series from Scalar Values
● A Series can be created using scalar values as shown in the
example below:
>>> import pandas as pd
>>> series1 = pd.Series([10,20,30])
>>> print(series1)
Output:
0 10
1 20
2 30
dtype: int64

User defined indexes:


>>> series2 = pd.Series(["Kavi","Shyam","Ra vi"], index=[3,5,1])
>>> print(series2)
Output:
3 Kavi
5 Shyam
1 Ravi
dtype: object

(B) Creation of Series from NumPy Arrays


● We can create a series from a one-dimensional (1D) NumPy
array, as shown below:
>>> import numpy as np
>>> import pandas as pd
>>> array1 = np.array([1,2,3,4])
>>> series3 = pd.Series(array1)
>>> print(series3)
Output:
0 1
1 2
2 3
3 4
dtype: int32

(C) Creation of Series from Dictionary


● Python dictionary has key: value pairs and a value can be
quickly retrieved when its key is known.
● Dictionary keys can be used to construct an index for a Series,
as shown in the following example. Here, keys of the dictionary
dict1 become indices in the series.
>>> dict1 = {'India': 'NewDelhi', 'UK': 'London', 'Japan': 'Tokyo'}
>>> print(dict1)
{'India': 'NewDelhi', 'UK': 'London', 'Japan': 'Tokyo'}
>>> series8 = pd.Series(dict1)
>>> print(series8)
India NewDelhi
UK London
Japan Tokyo
dtype: object

Accessing Elements of a Series: There are two common ways for


accessing the elements of a series: Indexing and Slicing.

(A) Indexing
● Indexing in Series is similar to that for NumPy arrays, and is
used to access elements in a series.
● Indexes are of two types: positional index and labelled index.
● Positional index takes an integer value that corresponds to its
position in the series starting from 0, whereas labelled index
takes any user-defined label as index.
● Following example shows usage of the positional index for
accessing a value from a Series.
>>> seriesNum = pd.Series([10,20,30])
>>> seriesNum[2]
30
● When labels are specified, we can use labels as indices while
selecting values from a Series, as shown below. Here, the value 3
is displayed for the labelled index Mar.
>>> seriesMnths = pd.Series([2,3,4],index=["Feb
","Mar","Apr"])
>>> seriesMnths["Mar"]
3
(B) Slicing
● Sometimes, we may need to extract a part of a series, This can
be done through slicing.
● We can define which part of the series is to be sliced by
specifying the start and end parameters [start :end] with the
series name.
● When we use positional indices for slicing, the value at the
endindex position is excluded, i.e., only (end - start) number of
data values of the series are extracted.
● Consider the following series seriesCapCntry:

>>> seriesCapCntry = pd.Series(['NewDelhi',


'WashingtonDC', 'London', 'Paris'], index=['India',
'USA', 'UK', 'France'])
>>> seriesCapCntry[1:3]
USA WashingtonDC
UK London
dtype: object

● If labelled indexes are used for slicing, then value at the end index
label is also included in the output, for example:
>>> seriesCapCntry['USA' : 'France']
USA WashingtonDC
UK London
France Paris
dtype: object
● We can also get the series in reverse order, for example:
>>> seriesCapCntry[ : : -1]
France Paris
UK London
USA WashingtonDC
India NewDelhi
dtype: object

Attributes of Series
● We can access certain properties called attributes of a series by
using that property with the series name.

Example:
>>> seriesCapCntry
India NewDelhi
USA WashingtonDC
UK London
France Paris
Attribute Name Purpose Example

name assigns a name to >>> seriesCapCntry.name =


the Series ‘Capitals’
>>> print(seriesCapCntry)
India NewDelhi
USA WashingtonDC
UK London
France Paris

index.name assigns a name to >>>seriesCapCntry.index.name =


the index of the ‘Countries’
series >>> print(seriesCapCntry)
Countries
India NewDelhi
USA WashingtonDC
UK London
France Paris

values prints a list of the >>> print(seriesCapCntry.values)


values in the series [‘NewDelhi’ ‘WashingtonDC’
‘London’ ‘Paris’]

size prints the number of >>> print(seriesCapCntry.size)


values in the Series 4
object

empty prints True if the >>> seriesCapCntry.empty


series is empty, and False
False otherwise

Methods of Series
● we are going to discuss some of the methods that are available
for Pandas Series.

>>> seriesTenTwenty=pd.Series(np.arange( 10, 20, 1 ))


>>> print(seriesTenTwenty)
0 10
1 11
2 12
3 13
4 14
5 15
6 16
7 17
8 18
9 19

Method Explanation Example

head(n) Returns the first n members of >>> seriesTenTwenty.head(2)


the series. If the value for n is 0 10
not passed, then by default n 1 11
takes 5 and the first five >>> seriesTenTwenty.head() 0
members are displayed. 10
1 11
2 12
3 13
4 14

count() Returns the number of >>> seriesTenTwenty.count()


non-NaN values in the Series 10

tail(n) Returns the last n members of >>> seriesTenTwenty.tail(2)


the series. If the value for n is 8 18
not passed, then by default n 9 19
takes 5 and the last five >>> seriesTenTwenty.tail()
members are displayed. 5 15
6 16
7 17
8 18
9 19

Mathematical Operations on Series


● Addition of two Series
● Subtraction of two Series
● Multiplication of two Series
● Division of two Series
DataFrame
● Sometimes we need to work on multiple columns at a time, i.e., we
need to process the tabular data. For example, the result of a class,
items in a restaurant’s menu, reservation chart of a train, etc.
● Pandas store such tabular data using a DataFrame.
● A DataFrame is a two-dimensional labelled data structure like a table
of MySQL.
● It contains rows and columns, and therefore has both a row and
column index.
● Each column can have a different type of value such as numeric,
string, boolean, etc., as in tables of a database.

State Geographical Area (sq Km) Area under Very Dense


Forests (sq Km)

1 Assam 78438 2797

2 Delhi 1483 6.72

3 Kerala 38852 1663

Creation of DataFrame
● There are a number of ways to create a DataFrame.

A) Creation of an empty DataFrame


● An empty DataFrame can be created as follows:
>>> import pandas as pd
>>> dFrameEmt = pd.DataFrame()
>>> dFrameEmt
Empty DataFrame
Columns: [ ]
Index: [ ]
(B) Creation of DataFrame from NumPy ndarrays
● Consider the following three NumPy ndarrays
>>> import numpy as np
>>> array1 = np.array([10,20,30])
>>> array2 = np.array([100,200,300])
>>> array3 = np.array([-10,-20,-30, -40])

>>> dFrame4 = pd.DataFrame(array1)


>>> dFrame4
0
0 10
1 20
2 30
● We can create a DataFrame using more than one ndarrays, as
shown in the following example:
>>> dFrame5 = pd.DataFrame([array1, array3, array2], columns=[
'A', 'B', 'C', 'D'])
>>> dFrame5
A B C D
0 10 20 30 NaN
1 -10 -20 -30 -40.0
2 100 200 300 NaN
(C) Creation of DataFrame from List of Dictionaries
● We can create DataFrame from a list of Dictionaries, for
example:
>>> listDict = [{'a':10, 'b':20}, {'a':5, 'b':10, 'c':20}]
>>> dFrameListDict = pd.DataFrame(listDict)
>>> dFrameListDict
a b c
0 10 20 NaN
1 5 10 20.0
(D) Creation of DataFrame from Dictionary of Lists
● DataFrames can also be created from a dictionary of lists.
● Consider the following dictionary consisting of the keys ‘State’,
‘GArea’ (geographical area) and ‘VDF’ (very dense forest) and the
corresponding values as list.
>>> dictForest = {'State': ['Assam', 'Delhi',
'Kerala'], 'GArea': [78438, 1483, 38852] , 'VDF' :
[2797, 6.72,1663]}
>>> dFrameForest= pd.DataFrame(dictForest)
>>> dFrameForest
State GArea VDF
0 Assam 78438 2797.00
1 Delhi 1483 6.72
2 Kerala 38852 1663.00
(E) Creation of DataFrame from Series
● Consider the following three Series:
seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series ([1000,2000,-1000,-5000,1000], index = ['a', 'b', 'c', 'd',
'e'])
seriesC = pd.Series([10,20,-10,-50,100], index = ['z', 'y', 'a', 'c', 'e'])
● We can create a DataFrame using a single series as shown
below:
>>> dFrame6 = pd.DataFrame(seriesA)
>>> dFrame6
0
a 1
b 2
c 3
d 4
e 5
Operations on rows and columns in DataFrames
● We can perform some basic operations on rows and columns of a
DataFrame like selection, deletion, addition, and renaming

(A) Adding a New Column to a DataFrame


● If we want to add a new column in the below table named students:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

>>> students['Preeti']=[89,78,76]
>>> students
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76

(B) Adding a New Row to a DataFrame


>>> students.loc['English'] = [85, 86, 83, 80, 90, 89]
>>> students
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76
English 85 86 83 80 90 89

(C) Deleting Rows or Columns from a DataFrame


>>> students = students.drop('Science', axis=0)
>>> students
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 90 92 89 81 94 89
Hindi 97 96 88 67 99 76
English 85 86 83 80 90 89

● If the DataFrame has more than one row with the same label, the
DataFrame.drop() method will delete all the matching rows from it

(D) Renaming Row Labels of a DataFrame


>>> students=students.rename({'Maths':'Sub1', ‘Science':'Sub2',
'Hindi':'Sub3'}, axis='index')
>>> print(students)
Arnab Ramit Samridhi Riya Mallika
Sub1 90 92 89 81 94
Sub2 91 81 91 71 95
Sub3 97 96 88 67 99
(E) Renaming Column Labels of a DataFrame

Accessing DataFrames Element through Indexing


● Data elements in a DataFrame can be accessed using indexing
● DataFrame.loc[ ] is an important method that is used for label based
indexing with DataFrames.
● Let's continue with students table
>>> students
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

>>> students.loc['Science']
Arnab 91
Ramit 81
Samridhi 91
Riya 71
Mallika 95

Accessing DataFrames Element through Slicing


● We can use slicing to select a subset of rows and/or columns from a
DataFrame.
● To retrieve a set of rows, slicing can be used with row labels.

>>> student.loc['Maths': 'Science']


Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95

● We may use a slice of labels with a column name to access values of


those rows in that column only.

>>> students.loc['Maths': 'Science', ‘Arnab’]


Maths 90
Science 91
Name: Arnab, dtype: int64

Joining, Merging and Concatenation of DataFrames


(A) Joining:
● We can use the pandas.DataFrame.append() method to merge two
DataFrames.
● It appends rows of the second DataFrame at the end of the first
DataFrame
● Columns not present in the first DataFrame are added as new
columns.
Example: we have two data frames dFrame1 and dFrame2
>>> dFrame1
C1 C2 C3
R1 1 2.0 3.0
R2 4 5.0 NaN
R3 6 NaN NaN
>>> dFrame2
C2 C5
R4 10 20.0
R2 30 NaN
R5 40 50.0

>>> dFrame1=dFrame1.append(dFrame2)
>>> dFrame1
C1 C2 C3 C5
R1 1.0 2.0 3.0 NaN
R2 4.0 5.0 NaN NaN
R3 6.0 NaN NaN NaN
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN
R5 NaN 40.0 NaN 50.0
Attributes of Pandas DataFrame

Attribute Name Purpose

DataFrame.index to display row labels

DataFrame.columns to display column labels

DataFrame.dtypes to display data type of each


column in the DataFrame

DataFrame.values to display a NumPy ndarray having


all the values in the DataFrame,
without the axes labels

DataFrame.shape to display a tuple representing the


dimensionality of the DataFrame

DataFrame.size to display a tuple representing the


dimensionality of the DataFrame

DataFrame.T to transpose the DataFrame.


Means, row indices and column
labels of the DataFrame replace
each other’s position

DataFrame.head(n) to display the first n rows in the


DataFrame

DataFrame.tail(n) to display the last n rows in the


DataFrame

DataFrame.empty to returns the value True if


DataFrame is empty and False
otherwise

Importing and Exporting Data between CSV Files and


DataFrames
● We can create a DataFrame by importing data from CSV files where
values are separated by commas.
● Similarly, we can also store or export data in a DataFrame as a .csv
file.

Importing a CSV file to a DataFrame


● Let us assume that we have the following data in a csv file named
ResultData.csv stored in the folder C:/PYTHON
● We can load the data from the ResultData.csv file into a DataFrame,
say marks using Pandas read_csv() function as shown below:

>>> marks = pd.read_csv("C:/PYTHON/ResultData. csv",sep =",",


header=0)
● The first parameter to the read_csv() is the name of the comma
separated data file along with its path. •
● The parameter sep specifies whether the values are separated by
comma, semicolon, tab, or any other character. The default value for
sepsis a space.
● The parameter header specifies the number of the row whose values
are to be used as the column names. It also marks the start of the
data to be fetched. header=0 implies that column names are inferred
from the first line of the file. By default, header=0.

Exporting a DataFrame to a CSV file


● We can use the to_csv() function to save a DataFrame to a text or csv
file.
● For example, to save the DataFrame students created before; we can
use the following statement:

>>> students.to_csv(path_or_buf='C:/PYTHON/ resultout.csv', sep=',')


● This creates a file by the name resultout.csv in the folder C:/PYTHON
on the hard disk.
MATPLOTLIB:
● We have learned how to organize and analyze data and perform
various statistical operations on Pandas DataFrames and analyze
numerical data using NumPy
● The results obtained after analysis are used to make inferences or
draw conclusions about data as well as to make important business
decisions.
● Sometimes, it is not easy to infer by merely looking at the results. In
such cases, visualization helps in better understanding of results of
the analysis.
● Data visualization means graphical or pictorial representation of the
data using graphs, charts, etc.
● The purpose of plotting data is to visualize variation or show
relationships between variables.

Plotting using Matplotlib


● Matplotlib library is used for creating static, animated, and
interactive 2D- plots or figures in Python.
● It can be installed using the following pip command from the
command prompt:
pip install matplotlib
● For plotting using Matplotlib, we need to import its Pyplot module
using the following command:
import matplotlib.pyplot as plt
● The pyplot module of matplotlib contains a collection of functions
that can be used to work on a plot.
● The plot() function of the pyplot module is used to create a figure.
● A figure is the overall window where the outputs of pyplot functions
are plotted.
● A figure contains a plotting area, legend, axis labels, ticks, title, etc.
● Each function makes some change to a figure: example, creates a
figure, creates a plotting area in a figure, plots some lines in a
plotting area, decorates the plot with labels, etc.
● It is always expected that the data presented through charts is easily
understood. Hence, while presenting data we should always give a
chart title, label the axis of the chart and provide legend in case we
have more than one plotted data.
● To plot x versus y, we can write plt.plot(x,y). The show() function is used
to display the figure created using the plot() function.

List of Pyplot functions to plot different charts

plot(\*args[, scalex, scaley, data]) Plot x versus y as lines


and/or markers.
bar(x, height[, width, bottom, align, Make a bar plot.
data])
boxplot(x[, notch, sym, vert, whis, ...]) Make a box and whisker plot.
hist(x[, bins, range, density, weights, Plot a histogram.
...])
pie(x[, explode, labels, colors, Plot a pie chart.
autopct, ...])
scatter(x, y[, s, c, marker, cmap, A scatter plot of x versus y
norm, ...])

Customisation of Plots
● Pyplot library gives us numerous functions, which can be used to
customize charts such as adding titles or legends.
grid([b, which, axis]) Configure the grid lines.

legend(\*args, \*\*kwargs) Place a legend on the axes.

savefig(\*args, \*\*kwargs) Save the current figure.

show(\*args, \*\*kw) Display all figures.

title(label[, fontdict, loc, pad]) Set a title for the axes.

xlabel(xlabel[, fontdict, labelpad]) Set the label for the x-axis.

xticks([ticks, labels]) Get or set the current tick locations


and labels of the x-axis

ylabel(ylabel[, fontdict, labelpad]) Set the label for the y-axis.

yticks([ticks, labels]) Get or set the current tick locations


and labels of the y-axis.

Example: Plotting a line chart of date versus temperature by adding Label


on X and Y axis, and adding a Title and Grids to the chart.

import matplotlib.pyplot as plt


date=["25/12","26/12","27/12"]
temp=[8.5,10.5,6.8]
plt.plot(date, temp)
plt.xlabel("Date") #add the Label on x-axis
plt.ylabel("Temperature") #add the Label on y-axis
plt.title("Date wise Temperature") #add the title to the chart
plt.grid(True) #add gridlines to the background
plt.yticks(temp)
plt.show()
Marker
● We can make certain other changes to plots by passing various
parameters to the plot() function.
● A marker is any symbol that represents a data value in a line chart or
a scatter plot.
Example:

Colour
● It is also possible to format the plot further by changing the color of
the plotted data.
● We can either use character codes or the color names as values to
the parameter color in the plot().
Linewidth and Line Style
● The linewidth and linestyle property can be used to change the width
and the style of the line chart.
● Linewidth is specified in pixels.
● The default line width is 1 pixel showing a thin line.
● Thus, a number greater than 1 will output a thicker line depending on
the value provided.
● We can also set the line style of a line chart using the linestyle
parameter.
● It can take a string such as "solid", "dotted", "dashed" or "dashdot"

Problem: Consider the average heights and weights of


persons aged 8 to 16 stored in the following
two lists:
height = [121.9,124.5,129.5,134.6,139.7,147.3,
152.4, 157.5,162.6]
weight= [19.7,21.3,23.5,25.9,28.5,32.1,35.7,39.6,
43.2]
Let us plot a line chart where:
i. x axis will represent weight
ii. y axis will represent height
iii. x axis label should be “Weight in kg”
iv. y axis label should be “Height in cm”
v. colour of the line should be green
vi. use * as marker
vii. Marker size as10
viii. The title of the chart should be “Average
weight with respect to average height”.
ix. Line style should be dashed
x. Linewidth should be 2.
Solution:
import matplotlib.pyplot as plt
import pandas as pd
height=[121.9,124.5,129.5,134.6,139.7,147.3,152.4,157.5,162.6]
weight=[19.7,21.3,23.5,25.9,28.5,32.1,35.7,39.6,43.2]
df=pd.DataFrame({"height":height,"weight":weight})
#Set xlabel for the plot
plt.xlabel('Weight in kg')
#Set ylabel for the plot
plt.ylabel('Height in cm')
#Set chart title:
plt.title('Average weight with respect to average height')
#plot using marker'-*' and line colour as green
plt.plot(df.weight,df.height,marker='*',markersize=10,color='green
',linewidth=2, linestyle='dashdot')
plt.show()
PROBLEM:
Smile NGO has participated in a three week cultural mela. Using Pandas,
they have stored the sales (in Rs) made day wise for every week in a CSV
file named “MelaSales.csv”, as shown in Table

Depict the sales for the three weeks using a Line chart. It should have the
following:
i. Chart title as “Mela Sales Report”.
ii. axis label as Days.
iii. axis label as “Sales in Rs”.
Line colours are red for week 1, blue for week 2 and brown
for week 3.

SOLUTION:
import pandas as pd
import matplotlib.pyplot as plt
# reads "MelaSales.csv" to df by giving path to the file
df=pd.read_csv("MelaSales.csv")
#create a line plot of different color for each week
df.plot(kind='line', color=['red','blue','brown'])
# Set title to "Mela Sales Report"
plt.title('Mela Sales Report')
# Label x axis as "Days"
plt.xlabel('Days')
# Label y axis as "Sales in Rs"
plt.ylabel('Sales in Rs')
#Display the figure
plt.show()
Tkinter Introduction
● Tkinter module in Python is a standard library in Python used for
creating Graphical User Interface (GUI) for Desktop Applications.
● With the help of Tkinter developing desktop applications is not a
tough task.
● The Tkinter module in Python is a good way to start creating simple
projects in Python.
● The Tkinter library provides us with a lot of built-in widgets (also
called Tk widgets or Tk interface) that can be used to create different
desktop applications.
● Among various GUI Frameworks, Tkinter is the only framework that is
built-in into Python's Standard Library.
● An important feature in favor of Tkinter is that it is cross-platform, so
the same code can easily work on Windows, macOS, and Linux.
● Tkinter is a lightweight module.
● It comes as part of the standard Python installation, so you don't
have to install it separately.
● It supports a lot of built-in widgets that can be used directly to create
desktop applications.

What are Tcl, Tk, and Tkinter?

● Tkinter is based upon the Tk toolkit, which was originally designed for
the Tool Command Language (Tcl). As Tk is very popular thus it has
been ported to a variety of other scripting languages, including Perl
(Perl/Tk), Ruby (Ruby/Tk), and Python (Tkinter).
● The wide variety of widgets, portability, and flexibility of Tk makes it
the right tool which can be used to design and implement a wide
variety of simple and complex projects.
● Python with Tkinter provides a faster and more efficient way to build
useful desktop applications that would have taken much time if you
had to program directly in C/C++ with the help of native OS system
libraries.

Using Tkinter to Create Desktop Applications

The basic steps of creating a simple desktop application using the Tkinter
module in Python are as follows:

1. First of all, import the Tkinter module.


2. The second step would be to create a basic window for the desktop
application.
3. Then you can add different GUI components to the window and
functionality to these components or widgets.
4. Then enter the main event loop using mainloop() function to run the
Desktop application.

Hello World tkinter Example

● When you create a desktop application, the first thing that you will
have to do is create a new window for the desktop application.
● The main window object is created by the Tk class in Tkinter.
● Once you have a window, you can add text, input fields, buttons, etc.
to it.

Here's a code example,

import tkinter as tk

win = tk.Tk()
win.title('Hello World!')
# you can add widgets here

win.mainloop()

Tkinter Methods used above:

The two main methods that are used while creating desktop applications in
Python are:

1. Tk( )

The syntax for the Tk() method is:

Tk(screenName=None, baseName=None, className='Tk', useTk=1)

This method is used to create the main window.

This is how you can use it, just like in the Hello World code example,

win = tkinter.Tk() ## where win indicates name of the main window object

2. The mainloop() Function

This method is used to start the application. The mainloop() function is an


infinite loop that is used to run the application.

It will wait for events to occur and process the events as long as the window
is not closed.

Common Tkinter Widgets

Here are some of the commonly used Tkinter widgets:


1. Button: A simple button that can be clicked to perform an action.
2. Label: This is used to display simple text.
3. Entry: Simple input field to take input from the user. This is used for
single-line input.
4. Text: This can be used for multi-line text fields, to show text, or to take
input from the user.
5. Canvas: This widget can be used for drawing shapes, images, and
custom graphics.
6. Frame: This acts as a container widget that can be used to group
other widgets together.
7. Checkbutton: A checkbox widget used for creating toggle options on
or off.
8. Radiobutton: This is used to create Radio buttons.
9. Listbox: This is used to create a List widget used for displaying a list
of values.
10. Scrollbar: This widget is used for scrolling content in other widgets
like Text and Listbox.

Here's a simple example where we have used some of the widgets,

import tkinter as tk

root = tk.Tk()
root.title("Tkinter World")

label = tk.Label(root, text="Hello, Studytonight!")


label.pack()

entry = tk.Entry(root)
entry.pack()

button = tk.Button(root, text="Click Me!")


button.pack()

root.mainloop()

You might also like