Numerical and Scientific Computing in Python: v0.1 Spring 2019
Numerical and Scientific Computing in Python: v0.1 Spring 2019
v0.1
Spring 2019
Alternatives to Python
Python’s strengths
“regular” Python code is not competitive with compiled languages (C, C++,
Fortran) for numeric computing.
The solution: specialized libraries that extend Python with data structures
and algorithms for numeric computing.
Keep the good stuff, speed up the parts that are slow!
Outline
The numpy library
NumPy underlies many other numeric and algorithm libraries available for
Python, such as:
SciPy, matplotlib, pandas, OpenCV’s Python API, and more
Ndarray – the basic NumPy data type
List: Ndarray:
General purpose Intended to store and process
Untyped (mostly) numeric data
1 dimension Typed
Resizable N-dimensions
Add/remove elements anywhere Chosen at creation time
Accessed with [ ] notation and Fixed size
integer indices Chosen at creation time
Accessed with [ ] notation and
integer indices
List Review
# Make a list
x = []
The list is the most common data structure in Python.
# Add something to it
Lists can: x.append(1)
Have elements added or removed x.append([2,3,4])
Hold any type of thing in Python – variables, functions, objects, etc. print(x)
Be sorted or reversed
Hold duplicate members --> [1, [2, 3, 4]]
Be accessed by an index number, starting from 0.
x[1] ‘b’
Indexing backwards from -1 x[-1] 3.14
x[-3] ‘a’
Slicing x[start:end:incr] Slicing produces a COPY of
the original list!
x[0:2] [‘a’,’b’]
x[-1:-3:-1] [3.14,’b’]
x[:] [‘a’,’b’,3.14]
Sorting x.sort() in-place sort Depending on list contents a
sorted(x) returns a new sorted list sorting function might be req’d
Pointer to a
Python object
'a'
Allocated
Pointer to a
x Python object
'b' anywhere in
memory
Pointer to a
Python object
3.14
x[1] get the pointer at index 1 resolve pointer to the Python object
in memory get the value from the object
import numpy as np
# Initialize a NumPy array
NumPy ndarray # from a Python list
y = np.array([1,2,3])
1 2 3
y[1] check the ndarray data type retrieve the value at offset 1 in the
data array
https://docs.scipy.org/doc/numpy/reference/arrays.html
dtype
Every ndarray has a dtype, the type a = np.array([1,2,3])
of data that it holds. a.dtype dtype('int64')
This is used to interpret the block of
data stored in the ndarray.
c = np.array([-1,4,124],
Can be assigned at creation time: dtype='int8')
c.dtype --> dtype('int8')
A small amount of memory is used to store info about the ndarray (~few dozen bytes)
The numpy function array creates a new array from any data structure
with array like behavior (other ndarrays, lists, sets, etc.)
Read the docs!
https://en.wikipedia.org/wiki/Row-_and_column-major_order
ndarray indexing oneD = np.array([1,2,3,4])
twoD = oneD.reshape([2,2])
twoD
ndarray indexing is similar to array([[1, 2],
[3, 4]])
Python lists, strings, tuples, etc.
# index from 0
oneD[0] 1
Index with integers, starting from oneD[3] 4
zero.
# -index starts from the end
oneD[-1] 4
oneD[-2] 3
Indexing N-dimensional arrays,
just use commas: # For multiple dimensions use a comma
# matrix[row,column]
array[i,j,k,l] = 42 twoD[0,0] 1
twoD[1,0] 3
y = np.arange(50,300,50)
ndarray slicing y --> array([ 50, 100, 150, 200, 250])
https://docs.scipy.org/doc/numpy/reference/routines.linalg.html
NumPy I/O
When reading files you can use standard Python, use lists, allocate
ndarrays and fill them.
Or use any of NumPy’s I/O routines that will directly generate ndarrays.
Open numpy_matplotlib_fft.py
As numpy is a large library we can only cover the basic usage here
https://en.wikipedia.org/wiki/Trapezoidal_rule
scipy.integrate
Open integrate.py and let’s look at examples of fixed samples and
function object integration.
trapz docs:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.tra
pz.html#scipy.integrate.trapz
The OpenCV Python API uses NumPy ndarrays, making OpenCV algorithms
compatible with SciPy and other libraries.
OpenCV vs SciPy
A simple benchmark: Gaussian and median
filtering a 1024x671 pixel image of the CAS
building.
Gaussian: radius 5, median: radius 9. See: image_bench.py
Timing: 2.4 GHz Xeon E5-2680 (Sandybridge)
scipy.ndimage.gaussian_filter 85.7
Gaussian 3.7x
cv2.GaussianBlur 23.2
scipy.ndimage.median_filter 1,780
Median 22.5x
cv2.medianBlur 79.2
When NumPy and SciPy aren’t fast enough
Auto-compile your Python code with the numba and numexpr libraries
Combine your own C++ or Fortran code with SWIG and call from Python
numba
The numba library can translate portions of your Python code and compile
it into machine code on demand.
The @jit decorator is used to # This will get compiled when it's
indicate which functions are first executed
@jit
compiled.
def average(x, y, z):
Options: return (x + y + z) / 3.0
GPU code generation
Parallelization
Caching of compiled code # With type information this one gets
# compiled when the file is read.
@jit (float64(float64,float64,float64))
Can produce faster array code def average_eager(x, y, z):
than pure NumPy statements. return (x + y + z) / 3.0
numexpr
import numpy as np
Another acceleration library for import numexpr as ne
Python.
a = np.arange(10)
b = np.arange(0, 20, 2)
Useful for speeding up specific
ndarray expressions. # Plain NumPy
Typically 2-4x faster than plain NumPy c = 2 * a + 3 * b
Intel now releases a customized build of Python 2.7 and 3.6 based on
their optimized libraries.
In RCS testing on various projects the Intel Python build is always at least
as fast as the regular Python and Anaconda modules on the SCC.
In one case involving processing several GB’s of XML code it was 20x faster!
Can use the Intel Thread Building Blocks library to improve multithreaded
Python programs:
This can make mixing Python, Cython, and C code (or libraries) very
straightforward.
You can write your own compiled code and link it into Python via Cython
or the SWIG tool. Contact RCS for help!