0% found this document useful (0 votes)
94 views

PythonTraining MD Saiful Azad UMP

This document provides an introduction to basic Python concepts including its history, why it is popular, different ways to run Python code, basic instructions like formatting, variables, objects, assignment, arithmetic operations, lists, range function, control flow, functions, and the math module. It is intended as an overview for someone new to Python.

Uploaded by

Madam Azimah
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

PythonTraining MD Saiful Azad UMP

This document provides an introduction to basic Python concepts including its history, why it is popular, different ways to run Python code, basic instructions like formatting, variables, objects, assignment, arithmetic operations, lists, range function, control flow, functions, and the math module. It is intended as an overview for someone new to Python.

Uploaded by

Madam Azimah
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Basic Training on Python

Saiful Azad, PhD


FKom, UMP, Gambang

Contact Details
Cell phone: +601124153527
Email: [email protected], [email protected]
Website: https://sites.google.com/view/saifulazad (new website)
http://saifulazad.weebly.com (old website)
Skype: sazad_m684
Introduction to Python

• Invented in Netherlands, early 90s by Guido


van Rossum
• Open sourced from the beginning
• Considered a scripting language, but is much more
– No compilation needed
– Scripts are evaluated by the interpreter, line by line
– Functions need to be defined before they are called
Why Python???

Reason for increasing demand2

Demand on 20191
https://hackernoon.com/how-big-the-demand-for-python-in-2019-is-or-why-python-has-suddenly-become-so-popular-0va3n7m
1

https://data-flair.training/blogs/why-is-python-in-demand/
2
Why Python???
The unreasonable effectiveness of Deep Learning (CNNs)
Performance of deep learning systems over time:

Human performance
5.1% error

2015

Krizhevsky, Sutskever, and Hinton, NIPS 2012


Different ways to run python
• Call python program via python interpreter from a Unix/windows command line
– $ python testScript.py
– Or make the script directly executable, with additional header lines in the script
• Using python console
– Typing in python statements. Limited functionality
>>> 3 +3
6
>>> exit()
• Using ipython console
– Typing in python statements. Very interactive.
In [167]: 3+3
Out [167]: 6
– Typing in %run testScript.py
– Many convenient “magic functions”
Anaconda for python3
• We’ll be using anaconda which includes python environment and an IDE (spyder) as well as
many additional features
Basic Instructions of Python
Formatting
• Many languages use curly braces to delimit blocks of code. Python uses
indentation. Incorrect indentation causes error.
• Comments start with #
• Colons start a new block in many constructs, e.g. function definitions, if-then
clause, for, while

for i in [1, 2, 3, 4, 5]:


# first line in "for i" block
print (i)
for j in [1, 2, 3, 4, 5]:
# first line in "for j" block
print (j)
# last line in "for j" block
print (i + j)
# last line in "for i" block print "done looping
print (i)
print ("done looping”)
Modules

• Certain features of Python are not loaded by default


• In order to use these features, you’ll need to import the
modules that contain them.
• E.g.
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
from scipy import io as sio
Variables and objects
• Variables are created the first time it is assigned a value
– No need to declare type
– Types are associated with objects not variables
• X=5
• X = [1, 3, 5]
• X = ‘python’
– Assignment creates references, not copies
X = [1, 3, 5]
Y= X
X[0] = 2
Print (Y) # Y is [2, 3, 5]
Assignment & Arithmatic
• You can assign to multiple names • a = 5 + 2 # a is 7
at the same time • b = 9 – 3. # b is 6.0
x, y = 2, 3 • c = 5 * 2 # c is 10
• To swap values
• d = 5**2 # d is 25
x, y = y, x • e = 5 % 2 # e is 1
• Assignments can be chained
x=y=z=3
Built in numerical types:
• Accessing a name before it’s been
int, float, complex
created (by assignment), raises an
error
List - 1
integer_list = [1, 2, 3]
heterogeneous_list = ["string", 0.1, True]
list_of_lists = [ integer_list, heterogeneous_list, [] ]
list_length = len(integer_list) # equals 3
list_sum = sum(integer_list) # equals 6
• Get the i-th element of a list
x = [i for i in range(10)] # is the list [0, 1, ..., 9]
zero = x[0] # equals 0, lists are 0-indexed
one = x[1] # equals 1
nine = x[-1] # equals 9, 'Pythonic' for last element
eight = x[-2] # equals 8, 'Pythonic' for next-to-last element
• Get a slice of a list
one_to_four = x[1:5] # [1, 2, 3, 4]
first_three = x[:3] # [0, 1, 2]
last_three = x[-3:] # [7, 8, 9]
three_to_end = x[3:] # [3, 4, ..., 9]
without_first_and_last = x[1:-1] # [1, 2, ..., 8]
copy_of_x = x[:] # [0, 1, 2, ..., 9]
another_copy_of_x = x[:3] + x[3:] # [0, 1, 2, ..., 9]
List - 2
• Check for memberships
1 in [1, 2, 3] # True
0 in [1, 2, 3] # False
• Concatenate lists
x = [1, 2, 3]
y = [4, 5, 6]
x.extend(y) # x is now [1,2,3,4,5,6]

x = [1, 2, 3]
y = [4, 5, 6]
z = x + y # z is [1,2,3,4,5,6]; x is unchanged.
• List unpacking (multiple assignment)
x, y = [1, 2] # x is 1 and y is 2
[x, y] = 1, 2 # same as above
x, y = [1, 2] # same as above
x, y = 1, 2 # same as above
_, y = [1, 2] # y is 2, didn't care about the first element
List - 3
• Modify content of list
x = [0, 1, 2, 3, 4, 5, 6, 7, 8]
x[2] = x[2] * 2 # x is [0, 1, 4, 3, 4, 5, 6, 7, 8]
x[-1] = 0 # x is [0, 1, 4, 3, 4, 5, 6, 7, 0]
x[3:5] = x[3:5] * 3 # x is [0, 1, 4, 9, 12, 5, 6, 7, 0]
x[5:6] = [] # x is [0, 1, 4, 9, 12, 7, 0]
del x[:2] # x is [4, 9, 12, 7, 0]
del x[:] # x is []
del x # referencing to x hereafter is a NameError

• Strings can also be sliced. But they cannot modified (they are immutable)
s = 'abcdefg'
a = s[0] # 'a'
x = s[:2] # 'ab'
y = s[-3:] # 'efg'
s[:2] = 'AB' # this will cause an error
s = 'AB' + s[2:] # str is now ABcdefg
The range() function

for i in range(5):
print (i) # will print 0, 1, 2, 3, 4 (in separate lines)
for i in range(2, 5):
print (i) # will print 2, 3, 4
for i in range(0, 10, 2):
print (i) # will print 0, 2, 4, 6, 8
for i in range(10, 2, -2):
print (i) # will print 10, 8, 6, 4
>>> a = ['Mary', 'had', 'a', 'little', 'lamb']
>>> for i in range(len(a)):
... print(i, a[i])
...
0 Mary
1 had
2 a
3 little
4 lamb
Control flow - 1

• if-else
if 1 > 2:
message = "if only 1 were greater than two..."
elif 1 > 3:
message = "elif stands for 'else if'"
else:
message = "when all else fails use else (if you want
to)"
print (message)
parity = "even" if x % 2 == 0 else "odd"
Truthiness

• True All keywords are case


• False sensitive.
0, 0.0, [], (), ‘’, None are considered
• None False. Most other values are True.
• and
In [137]: print ("True") if '' else print
• or ('False')
False
• not a = [0, 0, 0, 1]

• any any(a)
Out[135]: True
• all
all(a)
Out[136]: False
Comparison

Operatio a = [0, 1, 2, 3, 4]
Meaning
n
b = a
< strictly less than c = a[:]

<= less than or equal a == b


> strictly greater than Out[129]: True

>= greater than or equal a is b


Out[130]: True
== equal
!= not equal a == c
Out[132]: True
is object identity
a is c
is not negated object identity Out[133]: False

Bitwise operators: & (AND), | (OR), ^ (XOR), ~(NOT), << (Left Shift), >>
(Right Shift)
Control flow - 2

• loops
x = 0
while x < 10:
print (x, "is less than 10“)
x += 1

What happens if we forgot to indent?

for x in range(10): Keyword pass in loops:


pass Does nothing, empty statement
placeholder
for x in range(10):
if x == 3:
continue # go immediately to the next iteration
if x == 5:
break # quit the loop entirely
print (x)
Functions - 1
• Functions are defined using def
def double(x):
"""this is where you put an optional docstring
that explains what the function does.
for example, this function multiplies its
input by 2"""
return x * 2
• You can call a function after it is defined
z = double(10) # z is 20
• You can give default values to parameters
def my_print(message="my default message"):
print (message)

my_print("hello") # prints 'hello'


my_print() # prints 'my default message‘
Functions - 2

• Sometimes it is useful to specify arguments by name

def subtract(a=0, b=0):


return a – b

subtract(10, 5) # returns 5
subtract(0, 5) # returns -5
subtract(b = 5) # same as above
subtract(b = 5, a = 0) # same as above
Sorting list

• Sorted(list): keeps the original list intact and


returns a new sorted list
• list.sort: sort the original list
x = [4,1,2,3]
y = sorted(x) # is [1,2,3,4], x is unchanged
x.sort() # now x is [1,2,3,4]

• Change the default behavior of sorted


# sort the list by absolute value from largest to smallest
x = [-4,1,-2,3]
y = sorted(x, key=abs, reverse=True) # is [-4,3,-2,1]
Module math
Command name Description Constant Description
abs(value) absolute value e 2.7182818...
ceil(value) rounds up pi 3.1415926...
cos(value) cosine, in radians
floor(value) rounds down
log(value) logarithm, base e
log10(value) logarithm, base 10
max(value1, value2) larger of two values
min(value1, value2) smaller of two values
round(value) nearest whole number # preferred.
sin(value) sine, in radians import math
sqrt(value) square root math.abs(-0.5)

#bad style. Many unknown #This is fine


#names in name space. from math import abs
from math import * abs(-0.5)
abs(-0.5)
Module random

• Generating random numbers are important


in statistics
In [75]: import random
    ...: four_uniform_randoms = [random.random() for _ in range(4)]
    ...: four_uniform_randoms
    ...:
Out[75]:
[0.5687302894847388,
0.6562738117250464,
0.3396960191199996,
0.016968446644451407]
• Other useful functions: seed(), randint, randrange, shuffle, etc.
• Type in “random” and then use tab completion to see available
functions and use “?” to see docstring of function.
Finding Mean, Median, and Mode:

without libraries and with libraries


Mean
• The sample mean, also called the sample arithmetic mean or
simply the average, is the arithmetic average of all the items in a
dataset. The mean of a dataset 𝑥 is mathematically expressed as
Σᵢ𝑥ᵢ/ 𝑛, where 𝑖 = 1, 2, …, 𝑛. In other words, it’s the sum of all the
elements 𝑥ᵢ divided by the number of items in the dataset 𝑥.
• mean_ = sum(x) / len(x)
• Although this is clean and elegant, you can also apply built-in
Python statistics functions:
• mean_ = statistics.mean(x)
• mean_ = statistics.fmean(x)
Mean (Contd.)
• However, if there are nan values among your data, then
statistics.mean() and statistics.fmean() will return nan as the
output:
• mean_ = statistics.mean(x_with_nan)
• This result is consistent with the behavior of sum(), because
sum(x_with_nan) also returns nan.
• If you use NumPy, then you can get the mean with np.mean():
• mean_ = np.mean(y)
• mean_ = y.mean()
• np.mean(y_with_nan)
Mean (contd.)
• You often don’t need to get a nan value as a result. If you prefer to
ignore nan values, then you can use np.nanmean():
• np.nanmean(y_with_nan)
• pd.Series objects also have the method .mean():
• mean_ = z.mean()
• As you can see, it’s used similarly as in the case of NumPy.
However, .mean() from Pandas ignores nan values by default:
• z_with_nan.mean()
Median
• The sample median is the middle element of a sorted dataset.
• Here is one of many possible pure Python implementations of the
median:
• n = len(x)
• if n % 2:
• ... median_ = sorted(x)[round(0.5*(n-1))]
• ... else:
• ... x_ord, index = sorted(x), round(0.5 * n)
• ... median_ = 0.5 * (x_ord[index-1] + x_ord[index])
• You can get the median with statistics.median():
• median_ = statistics.median(x)
Median (Contd.)
• median_low() and median_high() are two more functions related
to the median in the Python statistics library.
– If the number of elements is odd, then there’s a single middle value, so
these functions behave just like median().
– If the number of elements is even, then there are two middle values. In
this case, median_low() returns the lower and median_high() the higher
middle value.
• statistics.median_low(x) ------ statistics.median_high(x)
• You can also get the median with np.median():
• median_ = np.median(y) ------- np.nanmedian(y_with_nan)
• Pandas Series objects have the method .median() that ignores
nan values by default: z.median()
Mode
• The sample mode is the value in the dataset that occurs most
frequently. If there isn’t a single such value, then the set is
multimodal since it has multiple modal values. This is how you can
get the mode with pure Python:
• mode_ = max((u.count(item), item) for item in set(u))[1]
• You can obtain the mode with statistics.mode() and
statistics.multimode():
• mode_ = statistics.mode(u)
• mode_ = statistics.multimode(u)
• As you can see, mode() returned a single value, while
multimode() returned the list that contains the result.
Mode (contd.)

• You can also get the mode with scipy.stats.mode():


• u, v = np.array(u), np.array(v)
• mode_ = scipy.stats.mode(u)
• Pandas Series objects have the method .mode() that
handles multimodal values well and ignores nan values
by default:
• u.mode()
Finding Variance & Standard Deviation, Quartiles,
Quantiles and Interquartile.
Variance
• The sample variance quantifies the spread of the data.
• Here’s how you can calculate the sample variance with pure
Python:
• n = len(x)
• mean_ = sum(x) / n
• var_ = sum((item - mean_)**2 for item in x) / (n - 1)
• Find variance using statistics package
• var_ = statistics.variance(x)
• You can also calculate the sample variance with NumPy. You
should use the function np.var() or the corresponding
method .var():
Variance (contd.)
• var_ = np.var(y, ddof=1)
• var_ = y.var(ddof=1)
• It’s very important to specify the parameter ddof=1. That’s how
you set the delta degrees of freedom to 1.
• np.nanvar(y_with_nan, ddof=1)
• np.nanvar() ignores nan values. It also needs you to specify
ddof=1.
• pd.Series objects have the method .var() that skips nan values by
default:
• z.var(ddof=1)
• z_with_nan.var(ddof=1)
Standard Deviation
• The sample standard deviation is another measure of data
spread. Once you get the variance, you can calculate the
standard deviation with pure Python:
• std_ = var_ ** 0.5
• Although this solution works, you can also use statistics.stdev():
• std_ = statistics.stdev(x)
• You can get the standard deviation with NumPy in almost the
same way.
• np.std(y, ddof=1)
• y.std(ddof=1)
Standard Deviation (contd.)
• np.nanstd(y_with_nan, ddof=1)
• Don’t forget to set the delta degrees of freedom to 1!
• pd.Series objects also have the method .std() that skips nan by
default:
• z.std(ddof=1)
• z_with_nan.std(ddof=1)
• The parameter ddof defaults to 1, so you can omit it. Again, if you
want to treat nan values differently, then apply the parameter
skipna.
Percentile
• You can use np.percentile() to determine any sample percentile in
your dataset. For example, this is how you can find the 5th and
95th percentiles:
• np.percentile(y, 5)
• np.percentile(y, 95)
• percentile() takes several arguments. You have to provide the
dataset as the first argument and the percentile value as the
second.
• np.percentile(y, [25, 50, 75])
• If you want to ignore nan values, then use np.nanpercentile()
instead:
• np.nanpercentile(y_with_nan, [25, 50, 75])
Quantile
• NumPy also offers you very similar functionality in
quantile() and nanquantile().
• np.quantile(y, 0.05)
• np.quantile(y, 0.95)
• np.quantile(y, [0.25, 0.5, 0.75])
• np.nanquantile(y_with_nan, [0.25, 0.5, 0.75])
• pd.Series objects have the method .quantile():
• z.quantile(0.05)
• z.quantile(0.95)
• z.quantile([0.25, 0.5, 0.75])
Interquartile Range (IQR)
• Interquartile range using numpy.percentile
• data = [32, 36, 46, 47, 56, 69, 75, 79, 79, 88, 89, 91, 92, 93, 96,
97, 101, 105, 112, 116]
• # First quartile (Q1)
• Q1 = np.percentile(data, 25, interpolation = 'midpoint')
• # Third quartile (Q3)
• Q3 = np.percentile(data, 75, interpolation = 'midpoint')
• # Interquaritle range (IQR)
• IQR = Q3 - Q1
• print(IQR)
Interquartile Range (IQR) and Quartile Deviation
• Interquartile range using scipy.stats.iqr
• # Import stats from scipy library
• from scipy import stats
• data = [32, 36, 46, 47, 56, 69, 75, 79, 79, 88, 89, 91, 92, 93, 96,
97, 01, 105, 112, 116]
• # Interquartile range (IQR)
• IQR = stats.iqr(data, interpolation = 'midpoint')
• print(IQR)
Quartile Deviation
• qd = IQR / 2
Plotting
import matplotlib.pyplot as plt

years = list(range(1950, 2011, 10))


gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]

# create a line chart, years on x-axis, gdp on y-axis


plt.plot(years, gdp, color='green', marker='o', linestyle='solid')

# add a title
plt.title("Nominal GDP")

# add a label to the y-axis


plt.ylabel("Billions of $")

# add a label to the x-axis


plt.xlabel("Year")
plt.show()

Line graph.
• Good for showing trend.
• Type plt.plot? to see more options, such as
different marker and line styles, colors, etc.
import matplotlib.pyplot as plt

years = list(range(1950, 2011, 10))


gdp1 = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]
gdp2 = [226.0, 362.0, 928.0, 1992.0, 4931.0, 7488.0, 12147.0]
gdp3 = [1206.0, 1057.0, 1081.0, 2940.0, 8813.0, 13502.0, 19218.0]

# create a line chart, years on x-axis, gdp on y-axis


# use format string to specify color, marker, and line style
# e.g. ‘bo-’: color=‘blue’, marker=‘o’, linestyle=‘solid’
plt.plot(years, gdp1, ‘bo-',
years, gdp2, ‘r*:',
years, gdp3, ‘gd-.')
# add a title
plt.title("Nominal GDP")
# add a label to the y-axis
plt.ylabel("Billions of $")
# add a label to the x-axis
plt.xlabel("Year")
# add legend
plt.legend([‘countryA', ‘countryB', ‘countryC'])
plt.show()
Plotting in logarithm scale

Logarithm scale plotting is often preferred to visualize changes over time.

plt.semilogy(years, gdp1, 'bo-',


years, gdp2, 'r*:',
years, gdp3, 'gd-.',
years, gdp4, 'c--<')
# add a title
plt.title("Nominal GDP")
# add a label to the y-axis
plt.ylabel("Billions of $")
# add a label to the x-axis
plt.xlabel("Year")
# add legend
plt.legend(['countryA', 'countryB',
'countryC', 'countryD'])
plt.show()
Bar charts
• Good for presenting/comparing numbers in discrete set
of items
movies = ["Annie Hall", "Ben-Hur", "Casablanca", "Gandhi", "West Side Story"]
num_oscars = [5, 11, 3, 8, 10]

xs = range(len(movies)) # xs is range(5)
# plot bars with left x-coordinates [xs],
# heights [num_oscars]
plt.bar(xs, num_oscars)
# label x-axis with movie names at bar centers
plt.xticks(xs, movies)
# alternatively, use the following to replace
# the two lines above
#plt.bar(xs, num_oscars, tick_label=movies)

plt.ylabel("# of Academy Awards")


plt.title("My Favorite Movies")
plt.show()
Barh vs bar
plt.barh(xs, num_oscars, tick_label=movies)
plt.xlabel("# of Academy Awards")
plt.title("My Favorite Movies")

• By default, the y-axis in a bar chart (or x-axis in


barh) starts from 0, in contrast to a line chart.
Boxplot

• import pandas as pd
• import numpy as np
• df =
pd.DataFrame(np.random.ra
nd(10, 5), columns=['A', 'B',
'C', 'D', 'E'])
• df.plot.box(grid='True')
Download a dataset
• download_url =
"https://raw.githubusercontent.com/fivethirtyeight/data/master/nba-
elo/nbaallelo.csv"
• target_csv_path = "nba_all_elo.csv"

• response = requests.get(download_url)
• response.raise_for_status() # Check that the request was
successful
• with open(target_csv_path, "wb") as f:
• f.write(response.content)
• print("Download ready.")
Dataset
• When you execute the script, it will save the file nba_all_elo.csv in
your current working directory.
• Now you can use the Pandas Python library to take a look at your
data:
• import pandas as pd
• nba = pd.read_csv("nba_all_elo.csv")
• type(nba)
• Here, you follow the convention of importing Pandas in Python
with the pd alias. Then, you use .read_csv() to read in your
dataset and store it as a DataFrame object in the variable nba.
Dataset
• You can see how much data nba contains:
• len(nba)
• nba.shape
• Now you know that there are 126,314 rows and 23 columns in
your dataset. But how can you be sure the dataset really contains
basketball stats? You can have a look at the first five rows
with .head():
• nba.head()
• nba.tail()
Dataset: Displaying Data Types
• The first step in getting to know your data is to discover the
different data types it contains. While you can put anything into a
list, the columns of a DataFrame contain values of a specific data
type. You can display all columns and their data types with .info():
• nba.info()
• Showing Basics Statistics
• Now that you’ve seen what data types are in your dataset, it’s time
to get an overview of the values each column contains. You can
do this with .describe():
• nba.describe()
Displaying Data Types
• Exploratory data analysis can help you answer questions about
your dataset. For example, you can examine how often specific
values occur in a column:
• nba["team_id"].value_counts()
• nba["fran_id"].value_counts()
• It seems that a team named "Lakers" played 6024 games, but
only 5078 of those were played by the Los Angeles Lakers. Find
out who the other "Lakers" team is:
• nba.loc[nba["fran_id"] == "Lakers", "team_id"].value_counts()
• nba.loc[nba["team_id"] == "MNL", "date_game"].min()
• nba.loc[nba["team_id"] == "MNL", "date_game"].max()
• nba.loc[nba["team_id"] == "MNL", "date_game"].agg(("min",
"max"))

You might also like