0% found this document useful (0 votes)
93 views

CS229 Section: Python Tutorial: Maya Srikanth

This document provides an overview of Python and related tools for data science. It discusses Python versions and the high-level object-oriented nature of Python. It recommends text editors and IDEs like PyCharm, Visual Studio Code, and Sublime Text. It covers basic Python concepts like strings, lists, dictionaries, NumPy for scientific computing, and tools for plotting like Matplotlib. It provides examples of string manipulation, list operations, NumPy array creation and math functions, broadcasting, and creating simple line plots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

CS229 Section: Python Tutorial: Maya Srikanth

This document provides an overview of Python and related tools for data science. It discusses Python versions and the high-level object-oriented nature of Python. It recommends text editors and IDEs like PyCharm, Visual Studio Code, and Sublime Text. It covers basic Python concepts like strings, lists, dictionaries, NumPy for scientific computing, and tools for plotting like Matplotlib. It provides examples of string manipulation, list operations, NumPy array creation and math functions, broadcasting, and creating simple line plots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

CS229 Section: Python

Tutorial
Maya Srikanth
Content adapted from past CS229 iterations
Python
Python 2.0 released in 2000
(Python 2.7 “end-of-life” in
2020)

Python 3.0 released in 2008


(Python 3.6+ for CS 229)

- High-level object-oriented, interpreted


language

https://www.researchgate.net/figure/Genealogy-of-Programming-Languages-doi101371-
journalpone0088941g001_fig1_260447599
Text editor/IDE options.. (don’t settle with notepad)

• PyCharm (IDE)

• Visual Studio Code (IDE)

• Sublime Text (IDE)

• Atom

• Notepad ++/gedit

• Vim (for Linux)


PyCharm IDE
PyCharm
• Good debugger
• Project management

FYI, professional version free for students: https://www.jetbrains.com/student/


Visual Studio IDE
Visual Studio Code
• Light weight
• Wide variety of plugins
to enable support for
all languages
Basic Python: Strings, Lists,
Dictionaries
String manipulation
print('I love CS229. (upper)'.upper())
Formatting print('I love CS229. (rjust 20)'.rjust(20))
print('we love CS229. (capitalize)'.capitalize())
print(' I love CS229. (strip) '.strip())

Concatenation print('I like ' + str(cs_class_code) + ' a lot!')


print(f'{print} (print a function)')
print(f'{type(229)} (print a type)')

Formatting print('Old school formatting: {.2F}'.format(1.358))


List list_1 = ['one', 'two', 'three’]

List creation list_1.append(4)


list_1.insert(0, 'ZERO’)
list_2 = [1, 2, 3]
Insertion/extension list_1.extend(list_2)

long_list = [i for i in range(9)]


long_long_list = [(i, j) for i in range(3)
List comprehension for j in range(5)]
long_list_list = [[i for i in range(3)]
for _ in range(5)]

Sorting sorted(random_list)
random_list_2 = [(3, 'z'), (12, 'r'), (6, 'e’),
(8, 'c'), (2, 'g')]
sorted(random_list_2, key=lambda x: x[1])
Dictionary and Set
my_set = {i ** 2 for i in range(10)}
Set
{0, 1, 64, 4, 36, 9, 16, 49, 81, 25}
(unordered, unique)

Dictionary my_dict = {(5-i): i ** 2 for i in range(10)}


{5: 0, 4: 1, 3: 4, 2: 9, 1: 16, 0: 25, -1:
(mapping) 36, -2: 49, -3: 64, -4: 81}
dict_keys([5, 4, 3, 2, 1, 0, -1, -2, -3, -
4])
Dictionary update second_dict = {'a': 10, 'b': 11}
my_dict.update(second_dict)

Iterate through items for k, it in my_dict.items():


print(k, it)
NumPy
What is • Package for scientific computing in Python
• Vector and matrix manipulation
NumPy • Broadcasting and vectorization (matrix operations)
saves time & cleans up code

and why?
Convenient math functions, read before use!
Python Command Description

np.linalg.inv Inverse of matrix (numpy as equivalent)

np.linalg.eig Get eigen values & eigen vectors of arr

np.matmul Matrix multiply

np.zeros Create a matrix filled with zeros (Read on np.ones)

np.arange Start, stop, step size (Read on np.linspace)

np.identity Create an identity matrix

np.vstack Vertically stack 2 arrays (Read on np.hstack)


Debugging tools…

Python Command Description

array.shape Get shape of numpy array

array.dtype Check data type of array (for precision, for weird behavior)

type(stuff) Get type of a variable

import pdb; pdb.set_trace() Set a breakpoint (https://docs.python.org/3/library/pdb.html)

print(f’My name is {name}’) Easy way to construct a message


Basic NumPy Usage
array_1d = np.array([1, 2, 3, 4])
Initialization from Python lists array_1by4 = np.array([[1, 2, 3, 4]])
large_array = np.array([i for i in range(400)])
large_array = large_array.reshape((20, 20))

Lists with different types from_list = np.array([1, 2, 3])


(NumPy auto-casts to higher from_list_2d = np.array([[1, 2, 3.0], [4, 5, 6]])
from_list_bad_type = np.array([1, 2, 3, 'a'])
precision, but it should be
print(f'Data type of integer is {from_list.dtype}')
reasonably consistent) print(f'Data type of float is {from_list_2d.dtype}')

array_1 + 5
array_1 * 5
NumPy supports many types of np.sqrt(array_1)
np.power(array_1, 2)
algebra on an entire array np.exp(array_1)
np.log(array_1)
Dot product and matrix multiplication
array_1 @ array_2
A few ways to write dot product array_1.dot(array_2)
np.dot(array_1, array_2)

Matrix multiplication like Ax weight_matrix = np.array([1, 2, 3, 4]).reshape(2, 2)


sample = np.array([[50, 60]]).T
np.matmul(weight_matrix, sample)

2D matrix multiplication mat1 = np.array([[1, 2], [3, 4]])


mat2 = np.array([[5, 6], [7, 8]])
np.matmul(mat1, mat2)
Element-wise multiplication a = np.array([i for i in range(10)]).reshape(2, 5)
a * a
np.multiply(a, a)
np.multiply(a, 10)
Broadcasting
NumPy compares dimensions of operands, then infers missing/mismatched
dimensions so the operation is still valid. Be careful with dimensions!
op1 = np.array([i for i in range(9)]).reshape(3, 3)
op2 = np.array([[1, 2, 3]]) array([[ 1, 3, 5],
op3 = np.array([1, 2, 3]) [ 4, 6, 8],
[ 7, 9, 11]])
# Notice that the results here are DIFFERENT!
print(op1 + op2) array([[ 1, 2, 3],
print(op1 + op2.T) [ 5, 6, 7],
[ 9, 10, 11]])

# Notice that the results here are THE SAME! array([[ 1, 3, 5],
print(op1 + op3) [ 4, 6, 8],
print(op1 + op3.T)
[ 7, 9, 11]])
array([[ 1, 3, 5],
[ 4, 6, 8],
[ 7, 9, 11]])
Broadcasting for pairwise distance
samples = np.random.random((15, 5))

# Without broadcasting
expanded1 = np.expand_dims(samples, axis=1)
tile1 = np.tile(expanded1, (1, samples.shape[0], 1))
Both achieve the effect
expanded2 = np.expand_dims(samples, axis=0) of
tile2 = np.tile(expanded2, (samples.shape[0], 1 ,1))
diff = tile2 - tile1
distances = np.linalg.norm(diff, axis=-1)

# With broadcasting
diff = samples[: ,np.newaxis, :]
- samples[np.newaxis, :, :]
distances = np.linalg.norm(diff, axis=-1)

# With scipy (another math toolbox)


import scipy
distances = scipy.spatial.distance.cdist(samples, samples)
Why should I vectorize my code? (dot product)
Shorter code, faster execution
a = np.random.random(500000)
b = np.random.random(500000)

With loop Numpy dot product

dot = 0.0 print(np.array(a).dot(np.array(b)))


for i in range(len(a)):
dot += a[i] * b[i]

print(dot)
Wall time: 345ms Wall time: 2.9ms
An example with pairwise distance
Speed up depends on setup and nature of computation

samples = np.random.random((100, 5))

With loop Numpy with broadcasting

total_dist = []
for s1 in samples: diff = samples[: ,np.newaxis, :] -
for s2 in samples: samples[np.newaxis, :, :]
d = np.linalg.norm(s1 - s2) distances = np.linalg.norm(diff, axis=-1)
total_dist.append(d) avg_dist = np.mean(distances)

avg_dist = np.mean(total_dist)

Wall time: 162ms Wall time: 3.5ms


(even worse without NumPy
norm)
Tools for Plotting
Other Python packages/tools
Jupyter Notebook
• Interactive, re-execution, result storage

Matplotlib / Seaborn
• Visualization (line, scatter, bar, images
and even interactive 3D)

Pandas (https://pandas.pydata.org/)
• DataFrame (database/Excel-like)
• Easy filtering, aggregation (also plotting, but less
features than dedicated datavis packages)
Example plots
https://matplotlib.org/3.1.1/gallery/index.html

import matplotlib
Import import matplotlib.pyplot as plt
import numpy as np

# Data for plotting


Create data t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)

fig, ax = plt.subplots()
Plotting ax.plot(t, s)

ax.set(xlabel='time (s)', ylabel='voltage (mV)',


Format plot title='About as simple as it gets, folks')
ax.grid()

fig.savefig("test.png")
Save/show plt.show()
Plot with dash lines and legend
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 10, 500)


y = np.sin(x)

fig, ax = plt.subplots()

line1, = ax.plot(x, y, label='Using set_dashes()')


# 2pt line, 2pt break, 10pt line, 2pt break
line1.set_dashes([2, 2, 10, 2])

line2, = ax.plot(x, y - 0.2, dashes=[6, 2],


label='Using the dashes parameter')

ax.legend()
plt.show()
Using subplot

x = np.arange(0, 3 * np.pi, 0.1)


y_sin = np.sin(x)
y_cos = np.cos(x)

# Setup grid with height 2 and col 1.


# Plot the 1st subplot
plt.subplot(2, 1, 1)

plt.grid()
plt.plot(x, y_sin)
plt.title('Sine Wave')

# Now plot on the 2nd subplot


plt.subplot(2, 1, 2)
plt.plot(x, y_cos)
plt.title('Cosine Wave')

plt.grid()
plt.tight_layout()
Plot area under curve
Confusion matrix
https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
fig, ax = plt.subplots()
im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
ax.figure.colorbar(im, ax=ax)
# We want to show all ticks...
ax.set(xticks=np.arange(cm.shape[1]),
yticks=np.arange(cm.shape[0]),
xticklabels=classes, yticklabels=classes,
ylabel='True label', xlabel='Predicted label’,
title=title)

# Rotate the tick labels and set their alignment.


plt.setp(ax.get_xticklabels(), rotation=45, ha='right',
rotation_mode='anchor')

# Loop over data dimensions and create text annotations.


fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i in range(cm.shape[0]):
for j in range(cm.shape[1]):
ax.text(j, i, format(cm[i, j], fmt),
ha='center', va='center',
color="white" if cm[i, j] > thresh else "black")
fig.tight_layout()
● DEMO…
Good luck on HW and projects!

Questions?
Supplementary Slides

Questions?
Where does my program start?
It just works
A function

Properly
What is a class?

Initialize the class to


get an instance using
some parameters

Instance variable

Does something
with the instance
To use a class

Instantiate a class,
get an instance

Call an instance method


String manipulation
Formatting
stripped = ' I love CS229! '.strip()
upper_case = 'i love cs 229! '.upper()
capitalized = 'i love cs 229! '.capitalize()
Concatenation
joined = ‘string 1’ + ‘ ’ + ‘string 2'
Formatting
formatted = ‘Formatted number {.2F}’.format(1.2345)
Basic data structures
List
example_list = [1, 2, '3', 'four’]
Set (unordered, unique)
example_set = set([1, 2, '3', 'four’])
Dictionary (mapping)
example_dictionary =
{
'1': 'one',
'2': 'two',
'3': 'three'
}
More on List

2D list
list_of_list = [[1,2,3], [4,5,6], [7,8,9]]
List comprehension
initialize_a_list = [i for i in range(9)]
initialize_a_list = [i ** 2 for i in range(9)]
initialize_2d_list = [[i + j for i in range(5)] for j in range(9)]
Insert/Pop
my_list.insert(0, ‘stuff)
print(my_list.pop(0))
More on List

Sort a list
random_list = [3,12,5,6]
sorted_list = sorted(random_list)

random_list = [(3, ‘A’),(12, ’D’),(5, ‘M’),(6, ‘B’)]


sorted_list = sorted(random_list, key=lambda x: x[1])
More on Dict/Set

Comprehension
my_dict = {i: i ** 2 for i in range(10)}
my_set = {i ** 2 for i in range(10)}

Get dictionary keys


my_dict.keys()
Another way for legend
Scatter plot

You might also like