LAB 2 DWM
LAB 2 DWM
Objective: To provide basic knowledge about basic python libraries, pandas, numpy, matplotlib.
Python
Python is a popular programming language. It was created by Guido van Rossum, and released in 1991.
It is used for:
software development,
mathematics,
system scripting.
Python can connect to database systems. It can also read and modify files.
Python can be used to handle big data and perform complex mathematics.
Python can be used for rapid prototyping, or for production-ready software development.
Pandas
The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created
by Wes McKinney in 2008.
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
Installation of Pandas
If you have Python and PIP already installed on a system, then installation of Pandas is very easy.
Import Pandas
Once Pandas is installed, import it in your applications by adding the import keyword:
Pandas as pd
Pandas Series
With the index argument, you can name your own labels.
Pandas DataFrames
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows
and columns.
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
Named Indexes
With the index argument, you can name your own indexes.
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
A simple way to store big data sets is to use CSV files (comma separated files).
CSV files contains plain text and is a well know format that can be read by everyone including Pandas.
import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())
Read JSON
import pandas as pd
df = pd.read_json('data.json')
print(df.to_string())
NumPy
It also has functions for working in domain of linear algebra, fourier transform, and matrices.
NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.
NumPy is used to work with arrays. The array object in NumPy is called ndarray.
import numpy as np
print(arr)
print(type(arr))
Dimensions in Arrays
When the array is created, you can define the number of dimensions by using the ndmin argument.
import numpy as np
print(arr)
print('number of dimensions :', arr.ndim)
Iterating Arrays
As we deal with multi-dimensional arrays in numpy, we can do this using basic for loop of python.
The function nditer() is a helping function that can be used from very basic to very advanced iterations. It
solves some basic issues which we face in iteration, lets go through it with examples.
In basic for loops, iterating through each scalar of an array we need to use n for loops which can be
difficult to write for arrays with very high dimensionality.
import numpy as np
for x in np.nditer(arr):
print(x)
In SQL we join tables based on a key, whereas in NumPy we join arrays by axes.
We pass a sequence of arrays that we want to join to the concatenate() function, along with the axis. If
axis is not explicitly passed, it is taken as 0.
import numpy as np
print(arr)
Joining Arrays Using Stack Functions
Stacking is same as concatenation, the only difference is that stacking is done along a new axis.
We can concatenate two 1-D arrays along the second axis which would result in putting them one over
the other, ie. stacking.
We pass a sequence of arrays that we want to join to the stack() method along with the axis. If axis is not
explicitly passed it is taken as 0.
import numpy as np
print(arr)
Joining merges multiple arrays into one and Splitting breaks one array into multiple.
We use array_split() for splitting arrays, we pass it the array we want to split and the number of splits.
import numpy as np
newarr = np.array_split(arr, 3)
print(newarr)
You can search an array for a certain value, and return the indexes that get a match.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x)
Search Sorted
There is a method called searchsorted() which performs a binary search in the array, and returns the
index where the specified value would be inserted to maintain the search order.
import numpy as np
x = np.searchsorted(arr, 7)
print(x)
Multiple Values
To search for more than one value, use an array with the specified values.
import numpy as np
print(x)
Ordered sequence is any sequence that has an order corresponding to elements, like numeric or
alphabetical, ascending or descending.
The NumPy ndarray object has a function called sort(), that will sort a specified array.
print(np.sort(arr))
Getting some elements out of an existing array and creating a new array out of them is called filtering.
Create a filter array that will return only values higher than 42:
import numpy as np
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)
Matplotlib
Matplotlib is a low level graph plotting library in python that serves as a visualization utility.
Matplotlib is mostly written in python, a few segments are written in C, Objective-C and Javascript for
Platform compatibility.
Matplotlib Pyplot
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported under
the plt alias:
Example
plt.plot(xpoints, ypoints)
plt.show()
To plot only the markers, you can use shortcut string notation parameter 'o', which means 'rings'.
Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):
Matplotlib Markers
You can use the keyword argument marker to emphasize each point with a specified marker.
Marker Reference
Marker Description
'o' Circle
'*' Star
'.' Point
',' Pixel
'x' X
'X' X (filled)
'+' Plus
's' Square
'D' Diamond
'p' Pentagon
'H' Hexagon
'h' Hexagon
'^' Triangle Up
'2' Tri Up
'|' Vline
'_' Hline
Line Reference
Color Reference
'r' Red
'g' Green
'b' Blue
'c' Cyan
'm' Magenta
'y' Yellow
'k' Black
'w' White
Matplotlib Labels and Title
With Pyplot, you can use the xlabel() and ylabel() functions to set a label for the x- and y-axis.
With Pyplot, you can use the title() function to set a title for the plot.
Example
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.show()
Matplotlib Subplot
With the subplot() function you can draw multiple plots in one figure.
The subplot() function takes three arguments that describes the layout of the figure.
The layout is organized in rows and columns, which are represented by the first and second argument.
Example
Draw 2 plots:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.show()
Exercise:
Q1. Write code to find the indexes where the values are negative in the array [5, -3, 7, -9, 2, -1].
Q2. Create a Pandas Series with a custom index for the sales data:
Days: ["Monday", "Tuesday", "Wednesday", "Thursday"] Perform a query to display sales greater
than 400.
Q3. Given the array [12, 24, 7, 35, 50, 9, 21], use NumPy to filter and display only the elements greater
than 15.
Q4. Given the array [15, 22, 8, 19, 30, 42], filter and display only elements divisible by 3 using NumPy.
Q5. Write code to generate a plot with a dashed red line from point (0, 0) to point (5, 10).
References:
https://www.w3schools.com/python/default.asp