pythonPPT合并版
pythonPPT合并版
2024-IS60516
MSc BA
Lecturer: Dr Selja Seppälä
Explain the difference between constants, Correctly use the Python naming
reserved words and variables, with conventions and assign meaningful names
examples
Variables, expressions & statements
Describe assignment statements with Define and use the operators for writing
examples and write them in code numeric expressions, and list & use the
operator precedence rules
DESCRIBE AND GIVE EXAMPLES OF THE Write code to check for types and convert
DIFFERENT TYPES IN PYTHON one type to another
Use the Python function to get user input Write meaningful comments when
appropriate
• Vocabulary
– Constants
– Reserved Words
– Variables, namespace, assignment
– Python naming conventions
• Sentences or Lines
• Expressions
• Operators & Precedence Rules
• Type & Type Conversions
• Comments
IS1110 | Python Language Overview | S. Seppälä Source: Python for Everybody, www.py4e.com
Constants
• Fixed values such as numbers, letters, and strings, are called
“constants” because their value does not change
• Numeric constants are as you expect
>>> print(123)
• String constants use single quotes (') 123
or double quotes (") >>> print(98.6)
98.6
>>> print('Hello world')
Hello world
Reserved Words
• You cannot use reserved words as variable names / identifiers as
they have a special meaning for Python
False class return is
finally
None if for lambda
continue
True def from while
nonlocal
and del global not with
as elif try or yield
assert else import pass
break except in raise
Variables
• A variable is a named place in the memory where a programmer can store
data and later retrieve the data using the variable “name”
x = 12.2 x 12.2
y = 14
y 14
Python Variable Name Rules
• Must start with a letter or underscore _
• Case Sensitive
See naming conventions and other writing rules in the PEP 8 – Style Guide for Python Code:
https://realpython.com/python-pep8/
Naming variables
https://forms.office.com/e/wk9
fTdpA3h
• Case Sensitive
• this_is_a_var
• my_list
• square_root_function
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Mnemonic Variable Names
• Since we programmers are given a choice in how we choose our
variable names, there is a bit of “best practice”
• We name variables to help us remember what we intend to store
in them (“mnemonic” = “memory aid”)
• This can confuse beginning students because well-named
variables often “sound” so good that they must be keywords
http://en.wikipedia.org/wiki/Mnemonic
Sentences or Lines
x = 2 Assignment statement
x = x + 2 Assignment with expression
print(x) Print statement
• my_int = my_int + 7
• lhs = rhs
• take the resulting value and associate it with the name on the lhs
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
A variable is a memory location x 0.6
used to store a value (0.6)
0.6 0.6
x = 3.9 * x * ( 1 - x )
0.4
3
Operator Precedence Rules
Highest precedence rule to lowest precedence rule:
• Left to right
What Does “Type” Mean?
• In Python variables, literals, and
constants have a “type” >>> ddd = 1 + 4
>>> print(ddd)
• Python knows the difference between 5
an integer number and a string >>> eee = 'hello ' + 'there'
>>> print(eee)
hello there
• For example “+” means “addition” if
something is a number and
“concatenate” if something is a string
concatenate = put together
Python “types”
• integers: 5
• floats: 1.2
• booleans: True
• strings: "anything" or 'something'
• lists: [,] ['a',1,1.3]
• others (not seen yet)
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Type Matters
• Python knows what “type” >>> eee = 'hello ' + 'there'
everything is >>> eee = eee + 1
Traceback (most recent call last):
File "<stdin>", line 1, in
• Some operations are <module>TypeError: Can't convert
prohibited 'int' object to str implicitly
>>> type(eee)
• You cannot “add 1” to a string <class'str'>
>>> type('hello')
<class'str'>
• We can ask Python what type >>> type(1)
something is by using the <class'int'>
type() function >>>
Type Conversions
>>> print(float(99) + 100)
199.0
• When you put an integer and >>> i = 42
floating point in an >>> type(i)
expression, the integer is <class'int'>
implicitly converted to a float >>> f = float(i)
>>> print(f)
• You can control this with the 42.0
>>> type(f)
built-in functions int() and
<class'float'>
float()
>>>
User Input
• We can instruct Python to nam = input('Who are you? ')
print('Welcome', nam)
pause and read data from
the user using the input() Who are you? Chuck
function Welcome Chuck
• The input() function returns
a string inp = input('Europe floor?')
usf = int(inp) + 1
• If we want to read a number print('US floor', usf)
from the user, we must
convert it from a string to a Europe floor? 0
number using a type US floor 1
conversion function
Comments in Python
• Anything after a # is ignored
by Python
• Why comment?
- Describe what is going to
happen in a sequence of
code
- Document who wrote the
code or other ancillary
information
- Turn off a line of code -
perhaps temporarily
Acknowledgements / Contributions
2024-IS60516
MSc BA
Lecturer: Dr Selja Seppälä
Use comparison operators in Describe what ASCII values are
boolean expressions and compare strings based on
Conditional execution these
• Conditional Steps
– Structures that Control Flow
– Conditional/Decision Structures
http://en.wikipedia.org/wiki/George_Boole
ASCII Values
• ASCII values determine order used to compare strings with
relational operators.
• Associated with keyboard letters, characters, numerals
• ASCII values are numbers ranging from 32 to 126.
• A few ASCII values.
• The ASCII standard also assigns characters to some numbers
above 126.
• Functions chr(int) and ord(str) access ASCII values.
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 1, © 2016 Pearson
Education, Inc., Hoboken, NJ. All rights reserved.
Relational/Comparison Operators
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 1, © 2016 Pearson
Education, Inc., Hoboken, NJ. All rights reserved.
Boolean Expressions
A Boolean expression is a logical statement that is either TRUE or FALSE.
Based on: Murach's Python Programming, C3, Slide 5, © 2016, Mike Murach & Associates, Inc.
Logical Operators
• Logical operators are the reserved words and, or, and not
• Enables combining multiple relational operators
• Conditions that use these operators are called compound
conditions
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 1, © 2016 Pearson
Education, Inc., Hoboken, NJ. All rights reserved.
Logical Operators
Operator Name Order of precedence
and AND 1. NOT operator
or OR 2. AND operator
not NOT 3. OR operator
Based on: Murach's Python Programming, C3, Slide 6, © 2016, Mike Murach & Associates, Inc.
Logical Operators
• Given: cond1 and cond2 are conditions
– cond1 and cond2 true only if both conditions are
true
– cond1 or cond2 true if either or both conditions
are true
– not cond1 is false if the condition is true, true if
the condition is false
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 1, © 2016 Pearson
Education, Inc., Hoboken, NJ. All rights reserved.
Short-Circuit Evaluation
• Consider the condition cond1 and cond2
• If Python evaluates cond1 as false, it does not bother to
check cond2
• Similarly with cond1 or cond2
• If Python finds cond1 true, it does not bother to check
further
• Think why this feature helps for
(number != 0) and (m == (n / number))
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 1, © 2016 Pearson
Education, Inc., Hoboken, NJ. All rights reserved.
Methods that Return Boolean Values
Methods that return either True or False.
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 1, © 2016 Pearson
Education, Inc., Hoboken, NJ. All rights reserved.
Conditional/Decision Structures
One-Way Decisions
x = 5 Yes
print('Before 5') Before 5 x == 5 ?
if x == 5 :
print('Is 5') Is 5 No print('Is 5’)
print('Is Still 5') Is Still 5
print('Third 5')
Third 5 print('Still 5')
print('Afterwards 5')
print('Before 6') Afterwards 5
if x == 6 : Before 6 print('Third 5')
print('Is 6')
print('Is Still 6')
print('Third 6')
print('Afterwards 6') Afterwards 6
Indentation
• Increase indent after an if statement or for statement (after : )
• Maintain indent to indicate the scope of the block (which lines are affected
by the if/for)
• Reduce indent back to the level of the if statement or for statement to
indicate the end of the block
• Blank lines are ignored - they do not affect indentation
• Comments on a line by themselves are ignored with regard to indentation
increase / maintain after if or for
decrease to indicate end of block
x = 5
if x > 2 :
print('Bigger than 2')
print('Still bigger')
print('Done with 2')
for i in range(5) :
print(i)
if i > 2 :
print('Bigger than 2')
print('Done with i', i)
print('All Done')
Think About begin/end Blocks
header
x = 5
compound if x > 2 :
statement print('Bigger than 2') indented
(stretches across
more than one line) print('Still bigger') block
print('Done with 2')
for i in range(5) :
print(i)
if i > 2 :
print('Bigger than 2')
print('Done with i', i)
print('All Done')
The syntax of the if statement
if boolean_expression:
statements...
[elif boolean_expression:
statements...]
...
[else:
statements...]
Based on: Murach's Python Programming, C3, Slide 11, © 2016, Mike Murach & Associates, Inc.
Nested x>1
yes
x = 42
if x > 1 : yes
x < 100
print('More than one')
if x < 100 :
no
print('Less than 100') print('Less than 100')
print('All done')
print('All Done')
Two-way Decisions
x=4
• Sometimes we want to
do one thing if a logical no yes
x>2
expression is true and
something else if the
expression is false print('Not bigger') print('Bigger')
if x > 2 :
print('Bigger') print('Not bigger') print('Bigger')
else :
print('Smaller')
print('All done')
print('All Done')
Multi-way
yes
x<2 print('small')
no
if x < 2 :
yes
print('small')
x < 10 print('Medium')
elif x < 10 :
print('Medium') no
else :
print('LARGE') print('LARGE')
print('All done')
print('All Done')
The try / except Structure
print('Hello')
astr = 'Bob'
try:
print('Hello')
Safe line istr = int(astr)
istr = int(astr)
Dangerous line!
Will not print
print('There')
Safe line if previous
except:
line fails to
execute print('There')
istr = -1
istr = -1
print('Done', istr)
2024-IS6061
MSc BA
Lecturer: Dr Selja Seppälä
Explain what a function is and Use Python built-in functions in your Define your own functions when
describe its composition code appropriate
Functions
Explain the difference between Create functions with one or more Pass multiple arguments to
parameters and arguments and how parameters and call them using the functions by position, by name and
they are used adequate arguments with default values
Return values from functions Describe the difference between Write code with correct variable
local scope and global scope of a scope
variable
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Python Functions
• There are two kinds of functions in Python.
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 4, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Built-in Functions
• Output of functions is a single value
• Function is said to return its output
• Items inside parentheses called arguments
• Examples:
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 4, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Argument
Result
>>> big = max('Hello world')
>>> print(big)
w
>>> tiny = min('Hello world')
>>> print(tiny)
>>>
User-defined Functions
Building our Own Functions
• We create a new function using the def keyword followed by
optional parameters in parentheses
• This defines the function but does not execute the body of the
function
def print_lyrics():
print("I'm a lumberjack, and I'm okay.")
print('I sleep all night and I work all day.')
Figure 5.1 Function Parts
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Definitions and Uses
• Once we have defined a function, we can call (or invoke) it
as many times as we like
def print_lyrics():
print("I'm a lumberjack, and I'm okay.")
print('I sleep all night and I work all day.')
print('Yo')
print_lyrics()
Hello
x = x + 2
print(x) Yo
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.
7
Arguments
• An argument is a value we pass into the function as its input
when we call the function
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 4, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
User-defined Functions
• Parameters and return statements are optional in function
definitions
• Function names should describe the role performed
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 4, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Return Values
Often a function will take its arguments, do some computation, and
return a value to be used as the value of the function call in the
calling expression. The return keyword is used for this.
def greet():
return "Hello" Hello Glenn
Hello Sally
print(greet(), "Glenn")
print(greet(), "Sally")
Arguments, Parameters, and
Results
>>> big = max('Hello world') Parameter
>>> print(big)
w
def max(inp):
blah
blah
'Hello world' for x in inp: 'w'
blah
blah
Argument return 'w'
Result
Passing argument to parameter
• For each argument in the function invocation, the
argument’s associated object is passed to the
corresponding parameter in the function
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Passing a Value to a Function
• Example 3: Program shows there is no change in the value of
the argument
Object pointed to by the
argument variable
The argument in a function (not the argument variable
call is a variable itself) is passed to a
parameter variable
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 4, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Multiple Parameters /
Arguments
• We can define more than one
parameter in the function def addtwo(a, b):
definition added = a + b
return added
• We simply add more arguments
x = addtwo(3, 5)
when we call the function print(x)
• We match the number and order 8
of arguments and parameters
Functions Having Several Parameters
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 4, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Functions Having Several Parameters
• FIGURE 4.3 Passing arguments to a function.
function call
function
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 4, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Passing by Parameter Name
• Arguments can be passed to functions by using names of
the corresponding parameters
• Instead of relying on position
• Given
• Could use
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 4, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
How to use default values
in your function definitions
You can specify a default value for any parameter in a
function definition by assigning a value to the parameter.
However, the parameters with default values must be coded
last in the function definition.
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Based on: Python Namespace and Scope: https://www.programiz.com/python-programming/namespace
Named Constants
• Program sometimes employs a special constant used
several times in program
• Convention programmers use
• Create a global variable
• Name written in uppercase letters with words separated by
underscore
• In Python, programmer is responsible for not changing
value of the variable
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 4, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Library Modules
• A library module is a file with extension .py
• Contains functions and variables
• Can be used (imported) by any program
• can be created in IDLE or any text editor
• Looks like an ordinary Python program
• To gain access to the functions and variables
• place a statement of the form import moduleName
at the beginning of the program
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 4, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Library Modules
• Create a
Module:
• Use a Module:
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
More on functions
• Complete. A function should check for all the cases
where it might be invoked. Check for potential errors.
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Acknowledgements / Contributions
2024-IS6061
MSc BA
Lecturer: Dr Selja Seppälä
List the different types of Choose between a while and a for
repeating statements and explain loop depending on the context
Loops and iteration
the difference
Write while and for loops using Adequately use loop control
the correct syntax (iteration) variables
Use the break and continue Describe what the code in given
statements, and explain how they while and for loops does
work
Use None constants and variables Write code with the is and is not
operators
• Repetition
• Indefinite loops: iteration with a while loop
• Breaking out of a loop with break
• Finishing an iteration with continue
• Definite loops: iteration with a for loop
• Loop patterns
• The None value
• The is and is not operators
• The for statement is useful for iteration, moving through all the
elements of data structure, one at a time.
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Indefinite Loops
The while Loop
• The while loop repeatedly executes an indented block of
statements as long as a certain condition is met.
• A while loop has the form: continuation
condition
header while condition: body
indented block of statements of the
loop
• The continuation condition is a boolean expression that
evaluates to True or False
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 3, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
General approach to a while
• Outside the loop, initialize the Boolean with a loop control
(iteration) variable.
• Initialize the variable, typically outside of the loop and before
the loop begins.
• Somewhere inside the loop you perform some operation which
changes the state of the program, eventually leading to a False
Boolean and exiting the loop.
• Modify the value of the control variable during the course of
the loop
• Have to have both!
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
n=5 Repeated Steps
Output:
No Yes Program:
n>0? 5
n = 5 4
print(n) while n > 0 : 3
print(n)
n = n – 1 2
n = n -1 print('Blastoff!') 1
print(n) Blastoff!
0
Loops (repeated steps) have iteration variables that
print('Blastoff') change each time through a loop. Often these iteration
variables go through a sequence of numbers.
Breaking Out of a Loop
• The break statement ends the current loop and jumps to the
statement immediately following the loop
• It is like a loop test that can happen anywhere in the body of the
loop
while True: > hello there
line = input('> ') hello there
if line == 'done' : > finished
break finished
print(line) > done
print('Done!') Done!
Finishing an Iteration with
continue
The continue statement ends the current iteration and jumps to the
top of the loop and starts the next iteration
while True:
> hello there
line = input('> ')
hello there
if line[0] == '#' :
continue > # don't print this
if line == 'done' : > print this!
break print this!
print(line) > done
print('Done!') Done!
Indefinite Loops
• We can write a loop to run the loop once for each of the items in
a set using the Python for construct
• Examples:
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 3, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Making “smart” loops Example: sum
values in data
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 3, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved. & Python None Keyword, W3Schools (
https://www.w3schools.com/python/ref_keyword_none.asp)
The is and is not Operators
• Python has an is operator
that can be used in logical
smallest = None expressions
print('Before')
for value in [3, 41, 12, 9, 74, 15] :
if smallest is None : • Implies “is the same as”
smallest = value (same type AND value)
elif value < smallest :
smallest = value • Similar to, but stronger than
print(smallest, value)
==
print('After', smallest)
• is not also is a logical
operator
Acknowledgements / Contributions
2024-IS6061
MSc BA
Lecturer: Dr Selja Seppälä
WRITE STRINGS IN DIFFERENT SCENARIOS Read and convert strings
Strings
Use string indexing and slicing to Get the length of a string
access characters within strings
• String type
• Non-printing characters
• Read & convert
• Indexing
• String length
• Looping through strings & counting
• More string operations: slicing, concatenation,
repetition, comparison
• String library: introduction & common functions
• UTF-8
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
And then there is """ """
• triple quotes preserve both the vertical and horizontal formatting
of the string
• allows you to type tables, paragraphs, whatever and preserve
the formatting
"""this is
a test
today"""
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Non-printing Characters
• If inserted directly, are preceded by a backslash (the escape
character \)
• new line '\n'
• tab '\t'
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Reading and >>> name = input('Enter:')
Enter:Chuck
Converting >>> print(name)
Chuck
>>> apple = input('Enter:')
• We prefer to read data in using Enter:100
strings and then parse and >>> x = apple – 10
convert the data as we need Traceback (most recent call
• This gives us more control over last): File "<stdin>", line
1, in <module>
error situations and/or bad user
TypeError: unsupported operand
input
type(s) for -: 'str' and 'int'
• Input numbers must be >>> x = int(apple) – 10
converted from strings >>> print(x)
90
The Index
• Because the elements of a string are a sequence, we can
associate each element with an index, a location in the
sequence:
• positive values count up from the left, beginning with index 0
• negative values count down from the right, starting with -1
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Looking Inside Strings
b a n a n a
• We can get at any single character in a
string using the index specified in 0 1 2 3 4 5
square brackets >>> fruit = 'banana'
>>> letter = fruit[1]
• The index value must be an integer
>>> print(letter)
and starts at zero a
• The index value can be an expression >>> x = 3
that is computed >>> w = fruit[x - 1]
>>> print(w)
n
Accessing an element: Summary
A particular element of the string is accessed by the index of the
element surrounded by square brackets [ ]
hello_str = 'Hello World'
print(hello_str[1]) => prints e
print(hello_str[-1]) => prints d
print(hello_str[11]) => ERROR
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Strings Have Length
b a n a n a
The built-in function len gives 0 1 2 3 4 5
us the length of a string
>>> fruit = 'banana'
>>> print(len(fruit))
6
Looping Through Strings
fruit = 'banana'
• A definite loop using a b
for letter in fruit :
for statement is much print(letter) a
more elegant n
• The iteration variable is a
index = 0 n
completely taken care of while index < len(fruit) :
by the for loop letter = fruit[index]
a
print(letter)
index = index + 1
Looping and Counting
word = 'banana'
This is a simple loop that count = 0
loops through each letter in a for letter in word :
string and counts the number if letter == 'a' :
of times the loop encounters count = count + 1
the 'a' character print(count)
Looking Deeper into in
• The iteration variable “iterates”
Iteration Six-character
through the sequence
(ordered set) variable string
• The block (body) of code is
executed once for each value for letter in 'banana' :
in the sequence
print(letter)
• The iteration variable moves
through all of the values in the
sequence
Slicing, the rules
• slicing is the ability to select a subsequence of the overall
sequence
• uses the syntax [start:finish], where:
• start is the index of where we start the subsequence
• : is the colon operator
• finish is the index of one after where we end the subsequence (“up
to but not including”)
• if either start or finish are not provided, it defaults to the
beginning of the sequence for start and the end of the sequence
for finish
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Extended Slicing
• also takes three arguments:
• [start:finish:countBy]
• defaults are:
• start is beginning, finish is end, countBy is 1
my_str = 'hello world'
my_str[0:11:2] 'hlowrd'
• every other letter
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Basic String Operations
s = 'spam'
• length operator len()
len(s) 4
• + is concatenate
new_str = 'spam' + '-' + 'spam-'
print(new_str) spam-spam-
• * is repeat, the number is how many times
new_str * 3
'spam-spam-spam-spam-spam-spam-'
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Some Details
• Both + and * on strings makes a new string, does not modify the
arguments
• Order of operation is important for concatenation, irrelevant for
repetition
• The types required are specific. For concatenation you need two
strings, for repetition a string and an integer
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Using in as a Logical Operator
>>> fruit = 'banana'
• The in keyword can also be >>> 'n' in fruit
used to check to see if one True
>>> 'm' in fruit
string is “in” another string False
• The in expression is a >>> 'nan' in fruit
logical expression that True
>>> if 'a' in fruit :
returns True or False and ... print('Found it!')
can be used in an if ...
statement Found it!
>>>
String Comparison
if word == 'banana':
print('All right, bananas.')
https://docs.python.org/3/library/stdtypes.html#string-methods
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
STRINGS
2024-IS6061
MSc BA
Lecturer: Dr Selja Seppälä
WRITE STRINGS IN DIFFERENT SCENARIOS Read and convert strings
Strings
Use string indexing and slicing to Get the length of a string
access characters within strings
• String type
• Non-printing characters
• Read & convert
• Indexing
• String length
• Looping through strings & counting
• More string operations: slicing, concatenation,
repetition, comparison
• String library: introduction & common functions
• UTF-8
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
And then there is """ """
• triple quotes preserve both the vertical and horizontal formatting
of the string
• allows you to type tables, paragraphs, whatever and preserve
the formatting
"""this is
a test
today"""
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Non-printing Characters
• If inserted directly, are preceded by a backslash (the escape
character \)
• new line '\n'
• tab '\t'
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Reading and >>> name = input('Enter:')
Enter:Chuck
Converting >>> print(name)
Chuck
>>> apple = input('Enter:')
• We prefer to read data in using Enter:100
strings and then parse and >>> x = apple – 10
convert the data as we need Traceback (most recent call
• This gives us more control over last): File "<stdin>", line
1, in <module>
error situations and/or bad user
TypeError: unsupported operand
input
type(s) for -: 'str' and 'int'
• Input numbers must be >>> x = int(apple) – 10
converted from strings >>> print(x)
90
The Index
• Because the elements of a string are a sequence, we can
associate each element with an index, a location in the
sequence:
• positive values count up from the left, beginning with index 0
• negative values count down from the right, starting with -1
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Looking Inside Strings
b a n a n a
• We can get at any single character in a
string using the index specified in 0 1 2 3 4 5
square brackets >>> fruit = 'banana'
>>> letter = fruit[1]
• The index value must be an integer
>>> print(letter)
and starts at zero a
• The index value can be an expression >>> x = 3
that is computed >>> w = fruit[x - 1]
>>> print(w)
n
Accessing an element: Summary
A particular element of the string is accessed by the index of the
element surrounded by square brackets [ ]
hello_str = 'Hello World'
print(hello_str[1]) => prints e
print(hello_str[-1]) => prints d
print(hello_str[11]) => ERROR
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Strings Have Length
b a n a n a
The built-in function len gives 0 1 2 3 4 5
us the length of a string
>>> fruit = 'banana'
>>> print(len(fruit))
6
Looping Through Strings
fruit = 'banana'
• A definite loop using a b
for letter in fruit :
for statement is much print(letter) a
more elegant n
• The iteration variable is a
index = 0 n
completely taken care of while index < len(fruit) :
by the for loop letter = fruit[index]
a
print(letter)
index = index + 1
Looping and Counting
word = 'banana'
This is a simple loop that count = 0
loops through each letter in a for letter in word :
string and counts the number if letter == 'a' :
of times the loop encounters count = count + 1
the 'a' character print(count)
Looking Deeper into in
• The iteration variable “iterates”
Iteration Six-character
through the sequence
(ordered set) variable string
• The block (body) of code is
executed once for each value for letter in 'banana' :
in the sequence
print(letter)
• The iteration variable moves
through all of the values in the
sequence
Slicing, the rules
• slicing is the ability to select a subsequence of the overall
sequence
• uses the syntax [start:finish], where:
• start is the index of where we start the subsequence
• : is the colon operator
• finish is the index of one after where we end the subsequence (“up
to but not including”)
• if either start or finish are not provided, it defaults to the
beginning of the sequence for start and the end of the sequence
for finish
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Extended Slicing
• also takes three arguments:
• [start:finish:countBy]
• defaults are:
• start is beginning, finish is end, countBy is 1
my_str = 'hello world'
my_str[0:11:2] 'hlowrd'
• every other letter
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Basic String Operations
s = 'spam'
• length operator len()
len(s) 4
• + is concatenate
new_str = 'spam' + '-' + 'spam-'
print(new_str) spam-spam-
• * is repeat, the number is how many times
new_str * 3
'spam-spam-spam-spam-spam-spam-'
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Some Details
• Both + and * on strings makes a new string, does not modify the
arguments
• Order of operation is important for concatenation, irrelevant for
repetition
• The types required are specific. For concatenation you need two
strings, for repetition a string and an integer
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
Using in as a Logical Operator
>>> fruit = 'banana'
• The in keyword can also be >>> 'n' in fruit
used to check to see if one True
>>> 'm' in fruit
string is “in” another string False
• The in expression is a >>> 'nan' in fruit
logical expression that True
>>> if 'a' in fruit :
returns True or False and ... print('Found it!')
can be used in an if ...
statement Found it!
>>>
String Comparison
if word == 'banana':
print('All right, bananas.')
https://docs.python.org/3/library/stdtypes.html#string-methods
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Copyright © 2017
Pearson Education, Ltd.
FILES
2024-IS6061
MSc BA
Lecturer: Dr Selja Seppälä
Explain how files are accessed EXPLAIN WHAT A FILE AND A FILE HANDLE ARE
(input-output stream)
Files
Use the open() & close() functions Use try/except to avoid issues
and the with statement to open & when opening files
close files
Use the different file modes for Use loops and file methods to
reading and writing read files
• What is a file?
• Opening a file
• Reading a file
• Searching in a file
• Writing to a file
Based on: PY4E, Chapter 7 & "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 6,
IS6061 | Files | S. Seppälä
Copyright © 2017 Pearson Education, Ltd.
Accessing a file
Based on: PY4E, Chapter 7 & "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 6,
IS6061 | Files | S. Seppälä
Copyright © 2017 Pearson Education, Ltd.
Making a file object
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 6, Copyright © 2017
IS6061 | Files | S. Seppälä
Pearson Education, Ltd.
Different modes
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 6, Copyright © 2017
IS6061 | Files | S. Seppälä
Pearson Education, Ltd.
Summary of open()
Based on: Murach's Python Programming, C7, Slide 10, © 2016, Mike Murach & Associates, Inc. & David I.
IS6061 | Files | S. Seppälä Schneider, An Introduction to Programming Using Python, Chapter 5, © 2016 Pearson Education, Inc.,
Hoboken, NJ. All rights reserved & https://docs.python.org/3.7/library/io.html#io.IOBase.readlines
Reading a file
• The result that is printed to the console
• How to use a loop to read each line • How to read the entire file as a
of the file listJohn Cleese
with open("members.txt") as file: Eric Idle
with open("members.txt") as file:
for line in file: member1 = file.readline()
print(line, end="") print(member1, end="")
print() member2 = file.readline()
print(member2)
• How to read the entire file as a
string • How to read each line of the file
with open("members.txt") as file: with open("members.txt") as file:
contents = file.read() members = file.readlines()
print(contents) print(members[0], end="")
print(members[1])
IS6061 | Files | S. Seppälä Based on: Murach's Python Programming, C7, Slide 11, © 2016, Mike Murach & Associates, Inc.
Opening a file in write mode and closing the file
manually
IS6061 | Files | S. Seppälä Based on: Murach's Python Programming, C7, Slide 6, © 2016, Mike Murach & Associates, Inc.
Opening and closing files using with statements
• The syntax of the with statement for file I/O 💡Preferred way to avoid issues
IS6061 | Files | S. Seppälä Based on: Murach's Python Programming, C7, Slide 7, © 2016, Mike Murach & Associates, Inc.
The write method
• The write() method of a file object
write(str)
IS6061 | Files | S. Seppälä Based on: Murach's Python Programming, C7, Slide 8, © 2016, Mike Murach & Associates, Inc.
Searching Through a File
2024-IS6061
MSc BA
Lecturer: Dr Selja Seppälä
Lists
DESCRIBE WHAT DATA STRUCTURES ARE & Describe the properties of lists Create and populate lists using (empty)
GIVE EXAMPLES brackets and list methods
Explain what a mutable object is & give Access and modify lists through USE THE SPLIT AND JOIN LIST METHODS TO
examples of mutable and immutable indexing, slicing and list methods CONVERT TO/FROM STRINGS
objects
Use list functions & python built-in Search in lists with "in" Construct lists with list comprehension
functions with lists
• Data Structures
• What is a list?
• Accessing and modifying lists
• List functions & methods
• List comprehension
Based on: PY4E, Chapter 8 & "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 7,
IS6061 | Lists | S. Seppälä
Copyright © 2017 Pearson Education, Ltd.
The list Object
• A list is a kind of collection, thus allows to store many values in a
single variable.
• A list is an ordered sequence of Python objects
– Objects can be of any type
– Objects do not have to all be the same type
– Constructed by writing items enclosed in square brackets and
separated by commas
Based on: PY4E, Chapter 8 & David I. Schneider, An Introduction to Programming Using Python, Chapter 2, ©
IS6061 | Lists | S. Seppälä
2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Looping through items in a list
IS6061 | Lists | S. Seppälä Based on: Murach's Python Programming, C6, Slide 16, © 2016, Mike Murach & Associates, Inc.
Looking inside lists: indexing
Based on: PY4E, Chapter 8 & "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 7,
IS6061 | Lists | S. Seppälä
Copyright © 2017 Pearson Education, Ltd.
Lists are Mutable
>>> fruit = 'Banana'
>>> fruit[0] = 'b'
• Strings are “immutable” - we Traceback
cannot change the contents of TypeError: 'str' object does not
support item assignment
a string - we must make a >>> x = fruit.lower()
new string to make any >>> print(x)
banana
change >>> lotto = [2, 14, 26, 41, 63]
• Lists are “mutable” - we can >>> print(lotto)
[2, 14, 26, 41, 63]
change an element of a list >>> lotto[2] = 28
using the index operator >>> print(lotto)
[2, 14, 28, 41, 63]
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 7, Copyright © 2017
IS6061 | Lists | S. Seppälä
Pearson Education, Ltd.
List functions & methods
team = ["Seahawks", 2014, "CenturyLink Field"]
nums = [5, 10, 4, 5]
words = ["spam", "ni"]
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 2, © 2016 Pearson
IS6061 | Lists | S. Seppälä
Education, Inc., Hoboken, NJ. All rights reserved.
Built-in Functions and Lists
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 2, © 2016 Pearson
IS6061 | Lists | S. Seppälä
Education, Inc., Hoboken, NJ. All rights reserved.
List methods for modifying a list
stats = [48.0, 30.5, 20.2, 100.0]
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 2, © 2016 Pearson
IS6061 | Lists | S. Seppälä
Education, Inc., Hoboken, NJ. All rights reserved.
The split and join methods
• These statements • Program shows how join method is used
each display list to display items from list of strings.
['a', 'b', 'c'].
line = ["To", "be", "or", "not",
print("a,b,c".split(','))
"to", "be."]
print("a**b**c".split('**'))
print(" ".join(line))
print("a\nb\nc".split()) krispies = ["Snap", "Crackle", "Pop"]
print("a b c".split()) print(", ".join(krispies))
print("a b c".split()) [Run]
To be or not to be.
Snap, Crackle, Pop
Based on: David I. Schneider, An Introduction to Programming Using Python, Chapter 2, © 2016 Pearson
IS6061 | Lists | S. Seppälä
Education, Inc., Hoboken, NJ. All rights reserved.
List operators
[ n for n in range(1,5)]
what we iterate
what we
through. Note that
collect we iterate over a set of
returns values and collect some
(in this case all) of them
[1,2,3,4]
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 7, Copyright © 2017
IS6061 | Lists | S. Seppälä
Pearson Education, Ltd.
DICTIONARIES
2024-IS6061
MSc BA
Lecturer: Dr Selja Seppälä
Describe the properties of Differentiate between a list and Create and populate dicts using
dictionaries and keys & values a dictionary (empty) curly braces, DICT()
Dictionaries and indexing
Access dictionary keys & values Explain what a dictionary USING DICTIONARIES TO COUNT ITEMS
using different approaches traceback is and how to avoid it
Use python built-in dictionary Search in dictionaries with "in" Create Dictionaries of complex
methods and "not in" objects
IS6061 | Dictionaries |
S. Seppälä
Summary
• Types of collections
• Dictionaries
• Example: most common name
• The get() method
• Example: counting words
• Accessing keys & values
• Deleting items
• Dictionaries and lists as values of dict
IS6061 | Dictionaries |
S. Seppälä
Python dictionaries
• Dictionaries are Python’s most powerful data collection.
• Dictionaries allow us to do fast database-like operations in
Python.
• A dictionary is an unordered collection of items.
• Each item is composed of a key and value pair:
dictionary_name = {key1:value1, key2:value2, ...}
– Key must be immutable
• strings, integers, tuples are fine
• lists are NOT
– Value can be anything
IS6061 | Dictionaries | Based on: PY4E, Chapter 9 & "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 9,
S. Seppälä Copyright © 2017 Pearson Education, Ltd.
Code that creates dictionaries
# strings as keys and values
# an empty dictionary
countries = {"CA": "Canada",
book_catalog = {}
"US": "United States",
"MX": "Mexico"}
OR
# numbers as keys, strings as values
purse = dict()
numbers = {1: "One", 2: "Two", 3: "Three",
4: "Four", 5: "Five"}
IS6061 | Dictionaries |
Based on: Murach's Python Programming, C12, Slide 4, © 2016, Mike Murach & Associates, Inc.
S. Seppälä
Indexing in dictionaries
• Lists index their entries based
>>> purse = dict()
on the position in the list.
>>> purse['money'] = 12
• Dictionaries are unordered, so >>> purse['candy'] = 3
we can't use the position to >>> purse['tissues'] = 75
index the entries. >>> print(purse)
• The key acts as an index to {'money': 12, 'tissues': 75, 'candy': 3}
find the associated value, i.e., >>> print(purse['candy'])
dictionaries are indexed by 3
keys. >>> purse['candy'] = purse['candy'] + 2
• A dictionary can be searched >>> print(purse)
to locate the value associated {'money': 12, 'tissues': 75, 'candy': 5}
with a key.
IS6061 | Dictionaries | Based on: PY4E, Chapter 9 & "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 9,
S. Seppälä Copyright © 2017 Pearson Education, Ltd.
The syntax for accessing a value
dictionary_name[key]
IS6061 | Dictionaries | Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 9, Copyright © 2017
S. Seppälä Pearson Education, Ltd.
Comparing lists and dictionaries
Dictionaries are like lists except that they use keys instead of
numbers to look up values
>>> ddd = dict()
>>> lst = list() >>> ddd['age'] = 21
>>> lst.append(21) >>> ddd['course'] = 182
>>> lst.append(183) >>> print(ddd)
>>> print(lst) {'course': 182, 'age': 21}
[21, 183] >>> ddd['age'] = 23
>>> lst[0] = 23 >>> print(ddd)
>>> print(lst) {'course': 182, 'age': 23}
[23, 183]
IS6061 | Dictionaries |
Based on: PY4E, Chapter 9
S. Seppälä
>>> lst = list() List
>>> lst.append(21)
>>> lst.append(183) Key Value
>>> print(lst)
[21, 183] [0] 21
lst
>>> lst[0] = 23
>>> print(lst) [1] 183
[23, 183]
IS6061 | Dictionaries |
Based on: PY4E, Chapter 9
S. Seppälä
Dictionary Tracebacks
• It is an error to reference a key which is not in the dictionary
• We can use the in operator to see if a key is in the dictionary
IS6061 | Dictionaries |
Based on: PY4E, Chapter 9
S. Seppälä
Counting with a dictionary
When we encounter a new name, we need to add a new entry in the
dictionary and if this is the second or later time we have seen the
name, we simply add one to the count in the dictionary under that
name.
counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names :
if name not in counts:
{'csev': 2,
counts[name] = 1
'zqian': 1,
else :
'cwen': 2}
counts[name] = counts[name] + 1
print(counts)
IS6061 | Dictionaries |
Based on: PY4E, Chapter 9
S. Seppälä
The get() method for dictionaries
The pattern of checking to see
if name in counts: if a key is already in a
x = counts[name] dictionary and assuming a
else :
default value if the key is not
x = 0
there is so common that there
Default value if key does not exist (and no
Traceback).
is a method called get() that
does this for us.
x = counts.get(name, 0)
IS6061 | Dictionaries |
Based on: PY4E, Chapter 9
S. Seppälä
Simplified Counting with get()
We can use get() and provide a default value of zero when
the key is not yet in the dictionary - and then just add one
(either to zero or to the current value of that key).
counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names :
counts[name] = counts.get(name, 0) + 1
print(counts)
IS6061 | Dictionaries |
Based on: PY4E, Chapter 9
S. Seppälä
Dictionary Methods
Method Description
clear() Removes all the elements from the dictionary
copy() Returns a copy of the dictionary
fromkeys() Returns a dictionary with the specified keys and value
get() Returns the value of the specified key
items() Returns a list containing a tuple for each key value pair
keys() Returns a list containing the dictionary's keys
pop() Removes the element with the specified key and returns its
associated value
popitem() Removes the last inserted key-value pair and returns it as a tuple
setdefault() Returns the value of the specified key. If the key does not exist:
insert the key, with the specified value
update() Updates the dictionary with the specified key-value pairs
values() Returns a list of all the values in the dictionary
IS6061 | Dictionaries | Based on: W3 Schools, Python Dictionary Methods, www.w3schools.com/python/python_ref_dictionary.asp &
S. Seppälä Real Python, Dictionaries in Python, https://realpython.com/python-dicts/#built-in-dictionary-methods
Retrieving Lists of Keys and Values
You can get a list
of keys, values, or >>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
items (both) from >>> print(list(jjj))
['jan', 'chuck', 'fred']
a dictionary >>> print(list(jjj.keys()))
['jan', 'chuck', 'fred']
>>> print(list(jjj.values()))
[100, 1, 42]
>>> print(list(jjj.items()))
[('jan', 100), ('chuck', 1), ('fred', 42)]
>>>
IS6061 | Dictionaries |
Based on: Murach's Python Programming, C12, Slide 10, © 2016, Mike Murach & Associates, Inc.
S. Seppälä
Two dictionary methods for deleting items
pop(key[, default_value])
clear()
Code that uses the pop() method to delete an item
country = countries.pop("US") # "United States"
country = countries.pop("IE") # KeyError
country = countries.pop("IE", "Unknown") # "Unknown"
Code that prevents a KeyError from occuring
code = "IE"
country = countries.pop(code, "Unknown country")
print(country + " was deleted.")
Code that uses the clear() method to delete all items
countries.clear()
IS6061 | Dictionaries |
Based on: Murach's Python Programming, C12, Slide 11, © 2016, Mike Murach & Associates, Inc.
S. Seppälä
A dictionary that contains other dictionaries as values (1)
contacts = {
"Joel":
{"address": "1500 Anystreet", "city": "San Francisco",
"state": "California", "postalCode": "94110",
"phone": "555-555-1111"},
"Anne"
{"address": "1000 Somestreet", "city": "Fresno",
"state": "California", "postalCode": "93704",
"phone": "125-555-2222"},
"Ben":
{"address": "1400 Another Street", "city": "Fresno",
"state": "California", "postalCode": "93704",
"phone": "125-555-4444"}
}
IS6061 | Dictionaries |
Based on: Murach's Python Programming, C12, Slide 30, © 2016, Mike Murach & Associates, Inc.
S. Seppälä
A dictionary that contains lists as values
IS6061 | Dictionaries |
Based on: Murach's Python Programming, C12, Slide 31, © 2016, Mike Murach & Associates, Inc.
S. Seppälä
TUPLES
2024-IS60516
MSc BA
Lecturer: Dr Selja Seppälä
EXPLAIN WHAT A TUPLE IS AND DESCRIBE
ITS CHARACTERISTICS
Use the tuple() method
Tuples
• What is a tuple?
• The tuple() method
• Tuples vs. lists
• Why tuples?
• Tuples & assignment
• Tuples & dictionaries
• Comparing tuples
• Sorting tuples
• Tuples are
immutable lists,
i.e., ordered
sequences of
items that cannot
be modified in
place
• They are printed
can contain
with (,) different data types
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 7, Copyright © 2017
IS6061 | Tuples | S. Seppälä Pearson Education, Ltd. & David I. Schneider, An Introduction to Programming Using Python, Chapter 2, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
The tuple() method
• Creates an empty tuple object • Converts a list into a tuple
>>> my_tpl = tuple() >>> my_tpl = tuple([5,'abc', 22.7])
>>> print(type(my_tpl)) (5, 'abc', 22.7)
<class 'tuple'>
>>> print(my_tpl)
()
• Creates a tuple
>>> my_tpl = tuple(("apple", "banana", "cherry")) # note the double
round-brackets
>>> print(my_tpl)
('apple', 'banana', 'cherry')
IS6061 | Tuples | S. Seppälä Based on: Python Tuples, W3 Schools, https://www.w3schools.com/python/python_tuples.asp
Lists and Tuples
Based on: "The Practice of Computing Using Python, 3rd/ E, GE", Punch & Enbody, Chapter 7, Copyright © 2017
IS6061 | Tuples | S. Seppälä Pearson Education, Ltd. & David I. Schneider, An Introduction to Programming Using Python, Chapter 2, © 2016
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Tuples are like lists
• Tuples are another kind of sequence that functions much like a list
– They have elements which are indexed starting at 0
>>> x = ('Glenn', 'Sally', 'Joseph')
>>> for iter in y:
>>> print(x[2])
... print(iter)
Joseph
...
>>> y = ( 1, 9, 2 )
1
>>> print(y)
9
(1, 9, 2)
2
>>> print(max(y))
>>>
9
Based on: PY4E & David I. Schneider, An Introduction to Programming Using Python, Chapter 2, © 2016
IS6061 | Tuples | S. Seppälä
Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Tuples are “immutable” (unlike lists)
tuple_values = (1, 2, 3)
a, b, c = tuple_values # a = 1, b = 2, c = 3
Based on: PY4E, Chapter 10 & Murach's Python Programming, C6, Slide 45, © 2016, Mike Murach & Associates,
IS6061 | Tuples | S. Seppälä
Inc.
A function that returns a tuple
def get_location():
# code that computes values for x, y, and z
return x, y, z
IS6061 | Tuples | S. Seppälä Based on: Murach's Python Programming, C6, Slide 45, © 2016, Mike Murach & Associates, Inc.
Tuples and Dictionaries
The comparison operators work with tuples and other sequences. If the first
item is equal, Python goes on to the next element, and so on, until it finds
elements that differ.
>>> (0, 1, 2) < (5, 1, 2)
True
>>> (0, 1, 2000000) < (0, 3, 4)
True
>>> ( 'Jones', 'Sally' ) < ('Jones', 'Sam')
True
>>> ( 'Jones', 'Sally') > ('Adams', 'Sam')
True
lst = []
for key, val in counts.items(): Even shorter version using list
newtup = (val, key) comprehension
lst.append(newtup)
Lecture 2- NumPy I
Credit for most of the IS6061
slide contents goes to Dr.
Sampath Jayarathna.
Dr. Sampath Jayarathna
Additions are specified on the
slides.
Old Dominion University
Credit for some of the slides in this lecture goes to Jianhua Ruan UTSA
NumPy
Image: Sam Taha, Python Data Science and Machine Learning Stack, 14 June 2017,
https://grandlogic.blogspot.com/2017/06/python-data-science-and-machine.html
Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data
IS6061 | NumPy | S. Seppälä
Science & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
NumPy
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
NumPy ndarray vs Python list
Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data
IS6061 | NumPy | S. Seppälä
Science & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
NumPy Arrays vs. Python Lists
NumPy Arrays Python Lists
1. Homogeneous Data: NumPy arrays store elements of
1. Datatype: Lists can hold different data types,
the same data type → more compact and memory-
but this can decrease memory efficiency and
efficient than lists.
slow numerical operations.
2. Fixed Data Type: NumPy arrays have a fixed data 2. Element Overhead: Lists in Python store
type, reducing memory overhead by eliminating the additional information about each element, e.g.
need to store type information for each element. its type and reference count.
3. Contiguous Memory: NumPy arrays store elements in 3. Memory Fragmentation: Lists may not store
adjacent memory locations, reducing fragmentation elements in contiguous memory locations,
and allowing for efficient access. causing memory fragmentation and inefficiency.
4. Array Metadata: NumPy arrays have extra metadata 4. Performance: Lists are not optimized for
like shape, strides, and data type. However, this numerical computations and may have slower
overhead is usually smaller than the per-element mathematical operations due to Python’s
overhead in lists. interpretation overhead. They are generally
5. Performance: NumPy arrays are optimized for used as general-purpose data structures.
numerical computations, with efficient element-wise 5. Functionality: Lists can store any data type, but
operations and mathematical functions. These lack specialized NumPy functions for numerical
operations are implemented in C, resulting in faster operations.
performance than equivalent operations on lists.
Based on: Based on: Python Lists VS Numpy Arrays, GeeksforGeeks, 25 August 2023,
IS6061 | NumPy | S. Seppälä
https://www.geeksforgeeks.org/python-lists-vs-numpy-arrays/
NumPy Arrays vs. Python Lists
• Homogeneous Data
• Heterogeneous Data • Fixed Data Type
• Element Overhead • Contiguous Memory
• Memory Fragmentation • Array Metadata
Based on: Based on: Python Lists VS Numpy Arrays, GeeksforGeeks, 25 August 2023,
IS6061 | NumPy | S. Seppälä
https://www.geeksforgeeks.org/python-lists-vs-numpy-arrays/
NumPy Arrays vs. Python Lists
• Heterogeneous Data
• Element Overhead
• Memory Fragmentation • Homogeneous Data
• Fixed Data Type
• Contiguous Memory
• Array Metadata
Based on: Based on: Python Lists VS Numpy Arrays, GeeksforGeeks, 25 August 2023,
IS6061 | NumPy | S. Seppälä https://www.geeksforgeeks.org/python-lists-vs-numpy-arrays/
ndarray
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Creating ndarrays
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Creating ndarrays array = np.eye(3)
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
array = np.array([[0,1,2],
[2,3,4]])
array = np.arange(0, 10, 2)
[[0 1 2]
[0, 2, 4, 6, 8]
[2 3 4]]
array =
array = np.zeros((2,3))
np.random.randint(0, 10,
[[0. 0. 0.]
(3,3))
[0. 0. 0.]]
[[6 4 3]
[1 5 6]
array = np.ones((2,3))
[9 8 5]]
[[1. 1. 1.]
[1. 1. 1.]]
arange is an array-valued version of the built-in Python range function
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Arithmetic with NumPy Arrays
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Arithmetic with NumPy Arrays
• Arithmetic operations with scalars propagate the scalar argument to each element in
the array: arr = np.array([[1., 2., 3.], [4., 5., 6.]])
print(arr)
[[1. 2. 3.]
[4. 5. 6.]]
print(arr **2)
[[ 1. 4. 9.]
[16. 25. 36.]]
arr = np.arange(10)
print(arr) # [0 1 2 3 4 5 6 7 8 9]
print(arr[5]) #5
print(arr[5:8]) #[5 6 7]
arr[5:8] = 12
print(arr) #[ 0 1 2 3 4 12 12 12 8 9]
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Indexing and Slicing
• As you can see, if you assign a scalar value to a slice, as in arr[5:8] = 12, the value is
propagated (or broadcasted) to the entire selection.
• An important first distinction from Python’s built-in lists is that array slices are views on
the original array.
– This means that the data is not copied, and any modifications to the view will be reflected in the
source array.
arr = np.arange(10)
print(arr) # [0 1 2 3 4 5 6 7 8 9]
arr_slice = arr[5:8]
print(arr_slice) # [5 6 7]
arr_slice[1] = 12345
print(arr) # [ 0 1 2 3 4 5 12345 7 8 9]
arr_slice[:] = 64
print(arr) # [ 0 1 2 3 4 64 64 64 8 9]
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Indexing
print(arr2d[0][2]) # 3
print(arr2d[0, 2]) # 3
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Activity 3
• Write a code to slice this array to display the last 2 elements of middle
array,
[5 6]
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
NumPy – Slicing Activity
https://forms.office.com/e/ETp
rJEZuTS
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
References
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
References
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 2- NumPy I, CS 620 / DASC 600 - Introduction to Data Scien
ce & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Pandas
IS6061
MSc BA
Lecturer: Dr Selja Seppälä
IS6061 | NumPy | S. Seppälä
CS 620 / DASC 600
Introduction to Data Science
& Analytics
Lecture 3- Pandas
Credit for most of the
IS6061 slide contents goes
to Dr. Sampath Jayarathna.
Dr. Sampath Jayarathna
Additions are specified on
the slides.
Old Dominion University
Credit for some of the slides in this lecture goes to Jianhua Ruan UTSA
Why pandas?
• One of the most popular library that data scientists use
• Labeled axes to avoid misalignment of data
– When merging two tables, some rows may be different
• Missing values or special values may need to be removed or replaced
height Weight Weight2 age Gender salary Credit score
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Overview
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Overview
• Python Library to provide data analysis features similar to: R, MATLAB, SAS
• Rich data structures and functions to make working with data structure fast,
easy and expressive.
– It contains data structures and data manipulation tools designed to make data
cleaning and analysis fast and easy in Python.
• It is built on top of NumPy
– The biggest difference is that pandas is designed for working with tabular or
heterogeneous data.
• Often used with
– numerical computing tools like NumPy and SciPy
– analytical libraries like statsmodels and scikit-learn
– data visualization libraries like matplotlib.
Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Science & Analytics, Old
IS6061 | NumPy | S. Seppälä Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/ & Wes McKinney, 5 Getting Started with
pandas, Python for Data Analytics, 3rd edition, 2023, p. 123, https://wesmckinney.com/book/
Overview
• Key components provided by Pandas:
– Series
– DataFrame
import
convention
Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Science & Analytics, Old
IS6061 | NumPy | S. Seppälä
Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Series
• One dimensional array-like object
• It contains a sequence of values (array of data of any NumPy data type) and
an associated array of data labels called its index. (Indexes can be strings or
integers or other data types.)
• By default , the series will get indexing from 0 to N where N = size -1
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Series – referencing elements
Creating a obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', obj2['d']= 10
Series with 'c']) print(obj2[['d', 'c', 'a']])
an index print(obj2) #Output
identifying #Output d 10
d 4 c 3
each data
b 7 a -5
point with a a -5 dtype: int64
label c 3
dtype: int64 print(obj2[:2]) Using labels
print(obj2.index) #Output in the index
#Output d 10
when
Index(['d', 'b', 'a', 'c'], dtype='object') b 7
dtype: int64 selecting
print(obj2.values)
single
#Output
[ 4 7 -5 3] print(obj2.a) values or a
#Output set of
print(obj2['a']) -5 values
#Output
-5
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Series – array/dict operations obj4 = obj3[obj3>0]
print(obj4)
Can be thought of as a dict. #output
Can be constructed from a dict d 10
b 7
directly. c 3
dtype: int64
obj3 = Series({'d': 4, 'b': 7, 'a':
-5, 'c':3 })
print(obj3**2)
print(obj3)
#output
#output
d 100
d 4
b 49
b 7
a 25
a -5
c 9
c 3
dtype: int64
dtype: int64
Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Science & Analytics, Old
IS6061 | NumPy | S. Seppälä
Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Series – auto alignment
print(obj4.add(obj5))
sdata = {'Texas': 10, 'Ohio': 20, 'Oregon': 15, 'Utah': 18}
#output
states = ['Texas', 'Ohio', 'Oregon', 'Iowa']
Iowa NaN
obj4 = Series(sdata, index=states)
Ohio 40.0
print(obj4)
#Output
Oregon 30.0
Texas 10.0 Texas 20.0
Ohio 20.0 Utah NaN
Oregon 15.0 dtype: float64
Iowa NaN
dtype: float64
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Series name and index name
Both the Series object itself and its index have a name attribute, which
integrates with other key areas of pandas functionality
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Series name and index name
print(obj4.index)
Index(['Florida', 'New York', 'Kentucky', 'Georgia'],
dtype='object')
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Indexing, selection and filtering
• Series can be sliced/accessed with label-based indexes, or using position-
based indexes
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
DataFrame
• A DataFrame is a tabular data structure comprised of rows and columns, akin to a
spreadsheet or database table.
• It can be treated as an ordered collection of columns
– Each column can be a different data type (numeric, string, boolean, etc.)
– Have both row and column indices
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)
print(frame) • There are many ways to
#output construct a DataFrame
state year pop
• One of the most common is
0 Ohio 2000 1.5
1 Ohio 2001 1.7 from a dict of equal-length
2 Ohio 2002 3.6 lists or NumPy arrays
3 Nevada 2001 2.4
4 Nevada 2002 2.9
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
DataFrame – specifying columns and indices
• Order of columns/rows can be specified.
• Columns not in data will have NaN.
frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
index=['A', 'B', 'C', 'D', 'E'])
Print(frame2)
year state pop debt
A 2000 Ohio 1.5 NaN
B 2001 Ohio 1.7 NaN same order
C 2002 Ohio 3.6 NaN
D 2001 Nevada 2.4 NaN
E 2002 Nevada 2.9 NaN
initialized with NaN
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
DataFrame – from nested dict of dicts
• Outer dict keys as columns and inner dict keys as row indices
pop = {'Nevada': {2001: 2.9, 2002: 2.9}, 'Ohio': {2002: 3.6, 2001: 1.7, 2000:
1.5}}
frame3 = DataFrame(pop)
print(frame3)
#output
Nevada Ohio
2000 NaN 1.5 Transpose
2001 2.9 1.7
2002 2.9 3.6 print(frame3.T)
2000 2001
2002
Nevada NaN 2.9 2.9
Union of inner keys (in sorted order) Ohio 1.5 1.7
3.6
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
DataFrame – index, columns, values
frame3.index.name = 'year'
frame3.columns.name='state‘
print(frame3)
state Nevada Ohio
year
2000 NaN 1.5
2001 2.9 1.7
2002 2.9 3.6
print(frame3.index)
Int64Index([2000, 2001, 2002], dtype='int64', name='year')
print(frame3.columns)
Index(['Nevada', 'Ohio'], dtype='object', name='state')
print(frame3.values)
[[nan 1.5]
[2.9 1.7]
[2.9 3.6]]
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
DataFrame – retrieving a column
A column in a DataFrame can be retrieved as a Series by dict-like
notation or as attribute
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)
print(frame['state']) print(frame.state)
0 Ohio 0 Ohio
1 Ohio 1 Ohio
2 Ohio 2 Ohio
3 Nevada 3 Nevada
4 Nevada 4 Nevada
Name: state, dtype: object Name: state, dtype:
object
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
DataFrame – getting rows
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
index=['A', 'B', 'C', 'D', 'E’])
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
DataFrame – modifying columns
frame2['debt'] = 0 val = Series([10, 10, 10], index = ['A',
print(frame2) 'C', 'D'])
year state pop debt frame2['debt'] = val
A 2000 Ohio 1.5 0 print(frame2)
B 2001 Ohio 1.7 0 year state pop debt
C 2002 Ohio 3.6 0 A 2000 Ohio 1.5 10.0
D 2001 Nevada 2.4 0 B 2001 Ohio 1.7 NaN
E 2002 Nevada 2.9 0 C 2002 Ohio 3.6 10.0
D 2001 Nevada 2.4 10.0
frame2['debt'] = range(5) E 2002 Nevada 2.9 NaN
print(frame2)
year state pop debt
A 2000 Ohio 1.5 0
B 2001 Ohio 1.7 1
C 2002 Ohio 3.6 2 Rows or individual elements can be
D 2001 Nevada 2.4 3 modified similarly. Using loc or iloc.
E 2002 Nevada 2.9 4
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
DataFrame – removing columns
del frame2['debt']
print(frame2)
year state pop
A 2000 Ohio 1.5
B 2001 Ohio 1.7
C 2002 Ohio 3.6
D 2001 Nevada 2.4
E 2002 Nevada 2.9
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
More on DataFrame indexing
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
More on DataFrame indexing - 2
print(frame.loc[['r1', 'r2'], ['c1', 'c2']]) print(v.loc['a'])
c1 c2 c1 c2 c3
r1 0 1 a 0 1 2
r2 3 4 a 3 4 5
print(frame.iloc[:2,:2]) print(frame.loc['r1':'r3',
c1 c2 'c1':'c3'])
r1 0 1 c1 c2 c3
r2 3 4 r1 0 1 2
r2 3 4 5
v = DataFrame(np.arange(9).reshape(3,3), index=['a', r3 6 7 8
'a', 'b'], columns=['c1','c2','c3'])
print(v)
c1 c2 c3
a 0 1 2
a 3 4 5 A pandas Index
b 6 7 8 can contain
duplicate labels
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
More on DataFrame indexing - 3
print(frame) print(frame[frame['c1']>0])
c1 c2 c3 c1 c2 c3
r1 0 1 2 r2 3 4 5
r2 3 4 5 r3 6 7 8
r3 6 7 8
print(frame['c1']>0)
print(frame <3) r1 False
c1 c2 r2 True
c3 r3 True
r1 True True Name: c1, dtype: bool
True
r2 False False
False
r3 False False
frame[frame<3] = 3
False
print(frame)
c1 c2 c3
r1 3 3 3
r2 3 4 5
r3 6 7 8
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Removing rows/columns
print(frame)
c1 c2 c3
r1 0 1 2
r2 3 4 5 This returns a new object
r3 6 7 8
print(frame.drop(['r1']))
c1 c2 c3
r2 3 4 5
r3 6 7 8
print(frame.drop(['r1','r3']))
c1 c2 c3
r2 3 4 5
print(frame.drop(['c1'], axis=1))
c2 c3
r1 1 2
r2 4 5
r3 7 8
IS6061 | NumPy | S. Seppälä Based on: Dr. Sampath Jayarathna, Lecture 3 - Pandas, CS 620 / DASC 600 - Introduction to Data Scienc
e & Analytics, Old Dominion University, https://www.cs.odu.edu/~sampath/courses/f19/cs620/
Reindexing
• Alter the order of rows/columns of a DataFrame or order of a series
according to new index
frame2 = frame.reindex(columns=['c2', 'c3', 'c1'])
print(frame2)
c2 c3 c1 This returns a new object
r1 1 2 0
r2 4 5 3
r3 7 8 6
IS6061
PROGRAMMING FOR DATA AND BUSINESS ANALYTICS
MSc BIAS
Lecturer: Dr Selja Seppälä
Introduction
IS6061 | EDA | S. Seppälä Based on: Exploratory Data Analysis in Python, Simplilearn (https://youtu.be/MoM6mighOJM)
Usefulness of EDA
IS6061 | EDA | S. Seppälä Based on: CS 109a: Data Science, Effective Exploratory Data Analysis and Visualization, Pavlos Protopapas & Kevin Rader,
https://harvard-iacs.github.io/2018-CS109A/lectures/lecture-3/
Why Perform EDA?
IS6061 | EDA | S. Seppälä Based on: Exploratory Data Analysis in Python, Simplilearn (https://youtu.be/MoM6mighOJM) & Based on: CS 109a: Data Science, Effective
Exploratory Data Analysis and Visualization, Pavlos Protopapas & Kevin Rader, https://harvard-iacs.github.io/2018-CS109A/lectures/lecture-3/
EDA Workflow
• Build a DataFrame from the data (ideally, put all data in this object)
• Clean the DataFrame. It should have the following properties
– Each row describes a single object
– Each column describes a property of that object
– Columns are numeric whenever appropriate
– Columns contain atomic properties that cannot be further decomposed
• Explore global properties. Use e.g. histograms, scatter plots, and
aggregation functions to summarize the data.
• Explore group properties. Use groupby and small multiples to compare
subsets of the data.
IS6061 | EDA | S. Seppälä Based on: CS 109a: Data Science, Effective Exploratory Data Analysis and Visualization, Pavlos Protopapas & Kevin Rader,
https://harvard-iacs.github.io/2018-CS109A/lectures/lecture-3/
EDA in Python vs. Excel
https://forms.office.com/e/
UG0kcxuaZt
Exploratory Data
Analysis
by Neha Mathur
EDA using Pandas
Descriptive statistics
Removal of nulls
Visualization
1. Packages and data import
• Step 1 : Import pandas to the workplace.
• “Import pandas”
Descriptive
• Median – The middle value when all the data points are
put in an ordered list: dataframe.median()
• Mode – The data point which occurs the most in the
Stats dataset : dataframe.mode()
2. Spread : It is the measure of how far the datapoints are away
(Pandas) from the mean or median
• Variance - The variance is the mean of the squares of the
individual deviations: dataframe.var()
• Standard deviation - The standard deviation is the square
root of the variance: dataframe.std()
3. Skewness: It is a measure of asymmetry: dataframe.skew()
Other methods to get a quick look on the data:
• Describe() : Summarizes the central tendency,
dispersion and shape of a dataset’s distribution,
Descriptive excluding NaN values.
• Syntax: pandas.dataframe.describe()
Stats • Info() :Prints a concise summary of the
dataframe. This method prints information
(contd.) about a dataframe including the index dtype
and columns, non-null values and memory
usage.
• Syntax: pandas.dataframe.info()
3. Null values
Detecting Handling
• Myatt, Glenn J., and Wayne P. Johnson. Making Sense of Data I : A Practical Guide to Exploratory Data
Analysis and Data Mining, John Wiley & Sons, Incorporated, 2014. ProQuest Ebook Central,
https://ebookcentral-proquest-com.ucc.idm.oclc.org/lib/uccie-ebooks/detail.action?docID=1729064.
• Martinez, Wendy L., et al. Exploratory Data Analysis with MATLAB, CRC Press LLC, 2017. ProQuest
Ebook Central,
https://ebookcentral-proquest-com.ucc.idm.oclc.org/lib/uccie-ebooks/detail.action?docID=5475665.
– Section 1.1: What is Exploratory Data Analysis
• Advanced exploratory data analysis (EDA):
• https://miykael.github.io/blog/2022/advanced_eda/
• https://github.com/miykael/miykael.github.io/blob/master/assets/nb/03_advanced_eda/nb_advanced_eda.
ipynb
Quote source: Sandro Tosi, John Hunter, Sandro Tosi, Merits of Matplotlib, in Matplotlib for Python
IS6061 | Matplotlib | S. Seppälä
Developers, Chapter 1, Packt, November 2009.
Introduction
• Matplotlib is a multi-platform data visualization library for Python
– Designed for creating 2D and 3D plots and figures suitable for publication
• The project was started by John Hunter in 2002 to enable a MATLAB-like plotting interface in Python
– Matplotlib was modeled on MATLAB, because graphing was something that MATLAB did very well.
• Built on NumPy arrays
• Designed to work with the broader SciPy stack (e.g., pandas has good integration with matplotlib)
• Other data visualization toolkits use matplotlib for their underlying plotting (e.g., Seaborn)
• Supports interactive plotting from the IPython shell and now, Jupyter notebook
• Supports various GUI backends on all operating systems. Plots can be rendered in UI applications.
• Can export visualizations to all the common vector and raster graphics formats (PDF, SVG, JPG,
PNG, BMP, GIF, etc.)
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä https://wesmckinney.com/book/plotting-and-visualization & Python for Scientists, Chapter 20 matplotlib,
rev 1.1, CJ Associates, 2014, p. 20-2.
Matplotlib terminology
• A Figure is one "picture".
– It has a border ("frame"), and other attributes.
– A Figure can be saved to a file.
• A Plot is one set of values graphed onto the Figure.
– A Figure can contain more than one Plot.
• Axes and Subplot are similar; the difference is how they get placed on the figure.
– Subplots allow multiple plots to be placed in a rectangular grid.
– Axes allow multiple plots to be placed at any location, including within other plots, or
overlapping.
• Matplotlib uses default objects for all of these, which are sufficient for simple plots.
– You can explicitly create any or all of these objects to fine-tune a graph.
– Most of the time, for simple plots, you can accept the defaults and get great-looking
figures.
IS6061 | Matplotlib | S. Seppälä Based on: Python for Scientists, Chapter 20 matplotlib, rev 1.1, CJ Associates, 2014, p. 20-4.
IS6061 | Matplotlib | S. Seppälä Based on: Quick start guide, Matplotlib, https://matplotlib.org/stable/users/explain/quick_start.html
Using Matplotlib
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
A simple line plot
In [15]: data
Out[15]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [16]: plt.plot(data)
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Figures and subplots
In [17]: fig = plt.figure()
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Figures and subplots: Example
⚠️In Jupyter notebooks, plots are reset after each cell is evaluated, so
you must put all the plotting commands in a single notebook cell.
fig = plt.figure()
ax1 = fig.add_subplot(2, 2,
1)
ax2 = fig.add_subplot(2, 2,
2)
ax3 = fig.add_subplot(2, 2,
3)
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Axis methods
• Plot axis objects have various methods
that create different types of plots
• It is preferred to use the axis methods
over the top-level plotting functions like
plt.plot.
You can find a
comprehensive
catalogue of plot
types in the
fig = plt.figure()
matplotlib document
ax1 = fig.add_subplot(2, 2, 1) ation
ax2 = fig.add_subplot(2, 2, 2) .
ax3 = fig.add_subplot(2, 2, 3)
In [26]: axes
Out[26]:
array([[<Axes: >, <Axes: >, <Axes: >],
[<Axes: >, <Axes: >, <Axes: >]], dtype=object)
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Axes array
• The axes array can then be indexed like a two-dimensional array.
– E.g., axes[0, 1] refers to the subplot in the top row at the center.
• You can also indicate that subplots should have the same x- or y-
axis using sharex and sharey, respectively.
– This can be useful when you're comparing data on the same scale;
otherwise, matplotlib autoscales plot limits independently.
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Adjusting the spacing around subplots
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Colors, Markers, and Line Styles
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(np.random.standard_normal(30).cumsum(),
color="black",
linestyle="dashed", marker="o")
See documentation for line
styles and marker options.
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Modifying the y-axis
Axis Range, Ticks and Tick Labels consists of the same
process,
substituting y for x.
Controlled by axes methods:
• Plot range: xlim
– Called with no arguments returns the current parameter value (e.g., ax.xlim()
returns the current x-axis plotting range)
– Called with parameters sets the parameter value (e.g., ax.xlim([0, 10]) sets the
x-axis range to 0 to 10)
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Labels, Legends and Title
• Title: set_title
In [45]: ax.set_title("My first matplotlib plot")
• The axes class has a set method that allows batch setting of plot
properties.
ax.set(title="My first matplotlib plot", xlabel="Stages", ylabel="Test")
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Example
fig, ax = plt.subplots()
ax.plot(np.random.standard_normal(1000).cumsum())
ticks = ax.set_xticks([0, 250, 500, 750, 1000])
labels = ax.set_xticklabels(["one", "two", "three", "four", "five"], rotation=30,
fontsize=8)
ax.set(title="My first matplotlib plot", xlabel="Stages", ylabel="Test")
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Adding legends
• There are a couple of ways to add one.
• The easiest is to pass the label argument
when adding each piece of the plot and calling
ax.legend() to automatically create a legend.
• The legend method has several other choices
for the location loc argument.
fig, ax = plt.subplots()
ax.legend() Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Saving Plots to File
• You can save the active figure to file using the figure object’s savefig
instance method.
• The file type is inferred from the file extension.
• For example, to save an SVG version of a figure, you need only type:
fig.savefig("figpath.svg")
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Matplotlib & a tiny bit of
Pandas
https://forms.office.com/e/3
RhG4fejm8
IS6061 | Matplotlib |
S. Seppälä
Plotting with Pandas
• Pandas objects come equipped with their plotting functions, which are
essentially wrappers around the matplotlib library.
– Think of matplotlib as a backend for pandas plots.
• The Pandas Plot is a set of methods that can be used with a Pandas
DataFrame, or a series, to plot various graphs from the data in that
DataFrame.
• Pandas Plot simplifies the creation of graphs and plots, so you don’t
need to know the details of working with matplotlib.
Based on: Parul Pandey, Pandas Plot: Deep Dive Into Plotting Directly With Pandas, MLOps Blog,
IS6061 | Matplotlib | S. Seppälä NeptuneAI, 23 August 2023,
https://neptune.ai/blog/pandas-plot-deep-dive-into-plotting-directly-with-pandas
plot method
• Series and DataFrame have a plot method for making some basic plot
types.
– By default, plot() makes line plots.
s = pd.Series(np.random.standard_normal(10).cumsum(), index=np.arange(0,
100, 10))
s.plot()
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
df.plot method
plt.style.use('grayscale')
df.plot()
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Bar plot (1)
Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Bar plot (2)
With a DataFrame, bar plots group the values in each row in bars, side by side,
for each value.
In [71]: df = pd.DataFrame(np.random.uniform(size=(6, 4)),
....: index=["one", "two", "three", "four", "five", "six"],
....: columns=pd.Index(["A", "B", "C", "D"], name="Genus"))
In [72]: df
Out[72]:
Genus A B C D
one 0.370670 0.602792 0.229159 0.486744
two 0.420082 0.571653 0.049024 0.880592
three 0.814568 0.277160 0.880316 0.431326
four 0.374020 0.899420 0.460304 0.100843
five 0.433270 0.125107 0.494675 0.961825
six 0.601648 0.478576 0.205690 0.560547
In [73]: df.plot.bar() Based on: Wes McKinney, 9 Plotting and Visualization, Python for Data Analytics, 3rd edition, 2023,
IS6061 | Matplotlib | S. Seppälä
https://wesmckinney.com/book/plotting-and-visualization
Histogram
df = pd.DataFrame({
'length': [1.5, 0.5, 1.2, 0.9, 3],
'width': [0.7, 0.2, 0.15, 0.2, 1.1]
}, index=['pig', 'rabbit', 'duck', 'chicken',
'horse'])
df['length'].hist(bins=3)
Based on: pandas.DataFrame.hist, pandas API reference,
IS6061 | Matplotlib | S. Seppälä
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.hist.html#pandas.DataFrame.hist
References
# Create a visualization
sns.relplot(
data=tips,
This plot shows the relationship
x="total_bill", y="tip", col="time", between five variables in the tips
hue="smoker", style="smoker", size="size", dataset using a single call to the
seaborn function relplot().
)
penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins,
x="flipper_length_mm",
hue="species",
multiple="stack")
• Distributional representations
– Statistical analyses require knowledge about the distribution of variables in
your dataset. The seaborn function displot() supports several approaches
to visualizing distributions.
• Plots for categorical data
– Several specialized plot types in seaborn are oriented towards visualizing
categorical data. They can be accessed through catplot(). These plots
offer different levels of granularity.
sns.pairplot(data=penguins,
hue="species")
https://forms.office.com/e/
KQgguqrE4D
• Plotly is an open source graphing library for Python. It makes interactive, publication-quality
2D and 3D graphs, such as line plots, scatter plots, area charts, bar charts, error bars, box
plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.
• Plotly is typically imported with the following command: import plotly.express as px
IS6061
PROGRAMMING FOR DATA AND BUSINESS ANALYTICS
MSc BIAS
Lecturer: Dr Selja Seppälä
IS6061 | SciPy | S. Seppälä Logo: https://github.com/scipy/scipy.org/blob/main/static/images/logo.svg
SciPy
(Souce: https://scipy.github.io/devdocs/tutorial/general.html)
Optimizers in SciPy
• Optimizers are a set of procedures defined in SciPy that either
find the minimum value of a function, or the root of an equation.
Optimizing Functions
• Essentially, all of the algorithms in Machine Learning are nothing
more than a complex equation that needs to be minimized with
the help of given data.
(Source: https://www.w3schools.com/python/scipy/scipy_optimizers.php)
IS6061 | SciPy | S. Seppälä
Optimisers in SciPy
Minimizing a Function
• A function, in this context, represents a curve, curves have high points and low
points.
– High points are called maxima.
– Low points are called minima.
• The highest point in the whole curve is called global maxima, whereas the rest
of them are called local maxima.
• The lowest point in whole curve is called global minima, whereas the rest of
them are called local minima.
Finding Minima
• We can use scipy.optimize.minimize() function to minimize the function.
(Source: https://www.w3schools.com/python/scipy/scipy_optimizers.php)
IS6061 | SciPy | S. Seppälä
SciPy Sparse Data
IS6061
PROGRAMMING FOR DATA AND BUSINESS ANALYTICS
MSc BIAS
Lecturer: Dr Selja Seppälä
Data Science Process: CRISP DM
• The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a
process model that serves as the base for a data science process. It has
six sequential phases:
1. Business understanding – What does the business need?
2. Data understanding – What data do we have / need? Is it clean?
3. Data preparation – How do we organize the data for modeling?
4. Modeling – What modeling techniques should we apply?
5. Evaluation – Which model best meets the business objectives?
6. Deployment – How do stakeholders access the results?
• Published in 1999 to standardize data mining processes across
industries, it has since become the most common methodology for data
mining, analytics, and data science projects.
(Source: https://www.datascience-pm.com/crisp-dm-2/)
IS6061 | Introduction to Algorithms |
S. Seppälä
CRISP DM
https://www.upgrad.com/blog/data-structures-algorithm-in-python/)