0% found this document useful (0 votes)

83 views24 pages

L2. Numpy

This document provides an overview of key concepts in NumPy including: - NumPy allows efficient implementation of n-dimensional arrays and is built in C for speed - NumPy arrays have standard data types and can be constructed from scratch by specifying shapes and filling values - NumPy arrays support attributes like dtype, shape, size and indexing/slicing similar to Python lists - NumPy arrays can be reshaped without copying data and flattened to a 1D view

Uploaded by

Gevinda Arulia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views24 pages

L2. Numpy

Uploaded by

Gevinda Arulia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Lecture 2.

Numpy
2-1. Numpy basic
Numpy : efficient implementation of n-dim array
built in C : fast
1-d array, 2-d array, ..., n-d array

In [414]:
i m p o r t numpy a s np

In [415]:
a = [1,2,3,4,5,6,7,8]

print("a : List =", a)

b = np. array(a)

print("b : np array =", b)

a : List = [1, 2, 3, 4, 5, 6, 7, 8]
b : np array = [1 2 3 4 5 6 7 8]

NumPy standard data types

NumPy arrays contain values of a single type
data type can be specified when constructing an array
np.zeros(10, dtype=int)

np.zeros(10, dtype=float)

np.zeros(10, dtype='int16')

np.zeros(10, dtype=np.float32)

Data type Description

bool_ Boolean (True or False) stored as a byte

int_ Default integer type (same as C long ; normally either int64 or int32 )

intc Identical to C int (normally int32 or int64 )

Data type Description

intp Integer used for indexing (same as C ssize_t ; normally either int32 or int64 )

int8 Byte (-128 to 127)

int16 Integer (-32768 to 32767)

int32 Integer (-2147483648 to 2147483647)

int64 Integer (-9223372036854775808 to 9223372036854775807)

uint8 Unsigned integer (0 to 255)

uint16 Unsigned integer (0 to 65535)

uint32 Unsigned integer (0 to 4294967295)

uint64 Unsigned integer (0 to 18446744073709551615)

float_ Shorthand for float64 .

float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa

float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa

float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa

complex_ Shorthand for complex128 .

complex64 Complex number, represented by two 32-bit floats

complex128 Complex number, represented by two 64-bit floats

In [416]:
np. array([1, 2, 3, 4], dtype= 'float32')

Out[416]: array([1., 2., 3., 4.], dtype=float32)

Creating arrays from scratch

shape :
(d1) : 1-d array of size d1
(d1,d2) : 2-d array of size d1xd2
(d1,d2,d3) : 3-d array of size d1xd2xd3
...

In [417]:
# Create a length-10 integer array filled with zeros

a0 = np. zeros(10, dtype= int)

print(a0. shape)

print(a0)

# Create a 3x5 floating-point array filled with ones

a1 = np. ones((2, 5), dtype= float)

print(a1. shape)

print(a1)

# Create a 3x5 array filled with 3.14

af = np. full((2, 5), 3.14)

print(af)

(10,)
[0 0 0 0 0 0 0 0 0 0]
(2, 5)
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
[[3.14 3.14 3.14 3.14 3.14]
[3.14 3.14 3.14 3.14 3.14]]

In [418]:
# Create a 3x3 identity matrix

np. eye(3)

Out[418]: array([[1., 0., 0.],

[0., 1., 0.],
[0., 0., 1.]])

In [419]:
# Create an array filled with a linear sequence

# Starting at 0, ending at 20, stepping by 2

# (this is similar to the built-in range() function)

np. arange(0, 20, 2)

Out[419]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])

Creating array with random numbers

In [9]:
# setting seed for random number generator

# for reproducibility

i m p o r t random

i m p o r t numpy a s np

random. seed(0)

np. random. seed(0)

In [10]:
# Create a 3x3 array of uniform[0,1] random numbers

np. random. random(size= (3, 3))

Out[10]: array([[0.5488135 , 0.71518937, 0.60276338],

[0.54488318, 0.4236548 , 0.64589411],
[0.43758721, 0.891773 , 0.96366276]])

In [421]:
# Create a 3x3 array of random integers in the interval [0, 10)

np. random. randint(0, 10, size= (3, 3))

Out[421]: array([[7, 1, 6],

[9, 9, 8],
[6, 3, 4]])

In [422]:
# Create a 3x3 array of N(0,1)

np. random. randn(3,3)

Out[422]: array([[ 0.33786932, 1.39970946, 1.1298669 ],

[-0.07111281, -0.80368313, -1.11158007],
[ 1.01861985, 0.36387617, -0.30621626]])

In [423]:
# Create a 3x3 array of N(50,1)

np. random. normal(50, 10, size= (3, 3))

Out[423]: array([[52.7827885 , 50.90993972, 35.35756522],

[66.58160611, 53.7680413 , 36.89402385],
[53.89007158, 60.90909425, 61.22526522]])

NumPy array attributes

dtype : the data type of the array
ndim : the number of axes
shape : the size of each axis
size : the total number of element in the array
itemsize : the size of each array element (in bytes)
nbytes : the total size of the array (in bytes)

In [424]:
np. random. seed(0) # seed for reproducibility

x3 = np. random. randint(0, 10, size= (3, 4, 5)) # Three-dimensional array

print("dtype:", x3. dtype)

print("ndim: ", x3. ndim)

print("shape:", x3. shape)

print("size: ", x3. size)

print("itemsize:", x3. itemsize, "bytes")

print("nbytes:", x3. nbytes, "bytes")

dtype: int32
ndim: 3
shape: (3, 4, 5)
size: 60
itemsize: 4 bytes
nbytes: 240 bytes

Array indexing, slicing: similar to python list

In [425]:
x1 = np. arange(10)

print (x1)

[0 1 2 3 4 5 6 7 8 9]

In [426]:
x1[1] = 1.8 # truncated to integer

print (x1[0], x1[1], x1[- 1], x1[- 2])

print (x1[:4])

print (x1[4:7])

print (x1[7:])

print (x1[7:- 1])

print (x1[1:8:2])

0 1 9 8
[0 1 2 3]
[4 5 6]
[7 8 9]
[7 8]
[1 3 5 7]

Slicing

- a view (not a copy) of the base array

- to make a copy, use copy()

In [ ]:
x1 = np. arange(10)

y = x1[4:7]

print(x1, y)

y[0] = 0

print(x1, y) # TAQ

z = x1[7:]. copy()

z[0] = 0

print(x1,z) # TAQ

In [428]:
x2 = np. random. randint(0, 100, size= (3,4))

print (x2)

print (x2[0]) # first row

print (x2[0,1:3])

print (x2[:,1]) # second col

[[42 58 31 1]
[65 41 57 35]
[11 46 82 91]]
[42 58 31 1]
[58 31]
[58 41 46]

Array reshaping
y = x.reshape(new_shape) : changes shapes of x
no-copy view : reference the original array
x.reshape(-1) : flattens to 1-d array
numpy n-d array : 1-d array storage + n-d view

In [ ]:
x1 = np. arange(9)

print(x1, '\n')

x2 = x1. reshape((3, 3))

print(x2, '\n')

x2[1] = 0 # x2[1] = x2[1,:]

print(x1) # TAQ

In [16]:
x2 = np. arange(12). reshape((3, 4))

print (x2, '\n')

print (x2[0], '\n')

print (x2[1]. reshape((1,4)), '\n')

print (x2[2]. reshape((4,1)), '\n')

[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

[0 1 2 3]

[[4 5 6 7]]

[[ 8]
[ 9]
[10]
[11]]

In [17]:
print (x2. reshape(- 1), '\n')

print (x2. reshape((4,- 1)), '\n')

print (x2. reshape((- 1,6)), '\n')

print (x2. reshape((2,2,- 1)))

[ 0 1 2 3 4 5 6 7 8 9 10 11]

[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]

[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]]

[[[ 0 1 2]
[ 3 4 5]]

[[ 6 7 8]
[ 9 10 11]]]

1-d index of n-d array

x.shape = (d 0, d1 , d2 )

y.reshape(-1) : flattened(1-d) view of x

x[a, b, c] ≡ y[k], where k = a(d1 d2 ) + bd2 + c = (ad1 + b)d2 + c

In [6]:
i m p o r t numpy a s np

x3 = np. arange(2* 3* 4)

x4 = x3. reshape((2,3,4))

a,b,c = 1,1,2

print(x4[a,b,c])

k = (a* 3 + b)* 4 + c

print(x3[k])

x3[k] = 0

print (x4[a,b,c]) # TAQ

18
18
0

Concatenating arrays
In [18]:
x = np. array([1, 2, 3])

y = np. array([3, 2, 1])

print (np. concatenate([x, y]), '\n')

z = [99, 99, 99]

print(np. concatenate([x, y, z]))

[1 2 3 3 2 1]
[ 1 2 3 3 2 1 99 99 99]

In [20]:
x = np. arange(0,8). reshape((2, 4))

y = np. arange(8,16). reshape((2, 4))

print (x, '\n')

print (y, '\n')

print (np. concatenate([x,y], axis= 0), '\n')

print (np. vstack([x,y]), '\n')

print (np. concatenate([x,y], axis= 1), '\n')

print (np. hstack([x,y]))

[[0 1 2 3]
[4 5 6 7]]

[[ 8 9 10 11]
[12 13 14 15]]

[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]

[[ 0 1 2 3 8 9 10 11]
[ 4 5 6 7 12 13 14 15]]

2-2. Computation on NumPy arrays: Universal

Functions
Loops are slow
In [22]:
i m p o r t numpy a s np

a = np. random. random(size= 1000000)

print(a. sum()) # TAQ : any guess?

500387.3135894248

the following code is slow, because of?

using list?
using for loop?

In [23]:
# using list and for-loop

d e f reciprocal_1(x):

n = len(x)

y = []

s = 0.0

f o r i i n range(n):

z = 1.0 / x[i]

s + = z

y. append(z)

r e t u r n y, s

% t i m e i t b, s = reciprocal_1(a)

381 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [24]:
# using numpy array and for-loop

d e f reciprocal_2(x):

n = len(x)

y = np. zeros(n)

f o r i i n range(n):

y[i] = 1.0 / x[i]

r e t u r n y, y. sum()

% t i m e i t b, s = reciprocal_2(a)

373 ms ± 37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [25]:
# using numpy array and no-loop

d e f reciprocal_3(x):

y = 1/ x

r e t u r n y, y. sum()

% t i m e i t b, sum = reciprocal_3(a)

4.38 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
UFuncs : vectorized opertions
ufunc : universal function
very fast
element-wise operations on numpy array
unary
binary: scalar ⊙ np array
binary: np array ⊙ np array

In [438]:
x = np. arange(0,11,2)

y = np. arange(1,12,2)

print(x)

print(y)

[ 0 2 4 6 8 10]
[ 1 3 5 7 9 11]

In [439]:
print("x =", x)

print("x + 5 =", x + 5)

print("x - 5 =", x - 5)

print("x * 2 =", x * 2)

print("x / 2 =", x / 2)

print("x // 2 =", x / / 2) # floor division

x = [ 0 2 4 6 8 10]
x + 5 = [ 5 7 9 11 13 15]
x - 5 = [-5 -3 -1 1 3 5]
x * 2 = [ 0 4 8 12 16 20]
x / 2 = [0. 1. 2. 3. 4. 5.]
x // 2 = [0 1 2 3 4 5]

In [440]:
print("-x = ", - x)

print("x ** 2 = ", x * * 2)

print("x % 2 = ", x % 2)

print("-(x/2+1)**2 = ", - (0.5* x + 1) * * 2)

-x = [ 0 -2 -4 -6 -8 -10]
x ** 2 = [ 0 4 16 36 64 100]
x % 2 = [0 0 0 0 0 0]
-(x/2+1)**2 = [ -1. -4. -9. -16. -25. -36.]

In [441]:
print (x + 2)

print (np. add(x, 2))

[ 2 4 6 8 10 12]
[ 2 4 6 8 10 12]
The following table lists the arithmetic operators implemented in NumPy:

Operator Equivalent ufunc Description

+ np.add Addition (e.g., 1 + 1 = 2 )

- np.subtract Subtraction (e.g., 3 - 2 = 1 )

- np.negative Unary negation (e.g., -2 )

* np.multiply Multiplication (e.g., 2 * 3 = 6 )

/ np.divide Division (e.g., 3 / 2 = 1.5 )

// np.floor_divide Floor division (e.g., 3 // 2 = 1 )

Operator Equivalent ufunc Description

np.power Exponentiation (e.g., 2 3 = 8 )

% np.mod Modulus/remainder (e.g., 9 % 4 = 1 )

Math functions
In [442]:
x = [- 1, 2, - 3]

print("x =", x) # x: python list

y = np. abs(x) # x is converted to np array, so is y

print("y=|x| =", y)

x = [-1, 2, -3]
y=|x| = [1 2 3]

In [443]:
x = [1, 2, 3]

print("e^y =", np. exp(y))

print("2^y =", np. exp2(y))

print("3^y =", np. power(3, y))

e^y = [ 2.71828183 7.3890561 20.08553692]

2^y = [2. 4. 8.]
3^y = [ 3 9 27]

In [444]:
x = [1, 2, 4, 10]

print("x =", x)

print("ln(x) =", np. log(x))

print("log2(x) =", np. log2(x))

print("log10(x) =", np. log10(x))

x = [1, 2, 4, 10]
ln(x) = [0. 0.69314718 1.38629436 2.30258509]
log2(x) = [0. 1. 2. 3.32192809]
log10(x) = [0. 0.30103 0.60205999 1. ]

Special function : np.expm1(x), np.log1p(x)

for computing more pricisely when x is small
np.expm1(x) : high precision function for np.exp(x)-1
np.log1p(x) : high precision function for np.log(1+x)

In [445]:
x = np. array([0.001, 0.0001, 0.00001], dtype= np. float32)

y = np. array(x, dtype= np. float64)

print("exp(x) - 1 =", np. exp(x)- 1)

print("exp(y) - 1 =", np. exp(y)- 1)

print("expm1(x) =", np. expm1(x))

print("log(1 + x) =", np. log(1+ x))

print("log(1 + y) =", np. log(1+ y))

print("log1p(x) =", np. log1p(x))

exp(x) - 1 = [1.00052357e-03 1.00016594e-04 1.00135803e-05]

exp(y) - 1 = [1.00050021e-03 1.00004998e-04 1.00000497e-05]
expm1(x) = [1.0005003e-03 1.0000499e-04 1.0000050e-05]
log(1 + x) = [9.99546959e-04 1.00011595e-04 1.00135303e-05]
log(1 + y) = [9.99500381e-04 9.99949978e-05 9.99994975e-06]
log1p(x) = [9.995003e-04 9.999500e-05 9.999950e-06]

Trigonometric functions
In [446]: theta = np. linspace(0, np. pi, 4)

print("theta = ", theta)

print("sin(theta) = ", np. sin(theta))

print("cos(theta) = ", np. cos(theta))

print("tan(theta) = ", np. tan(theta))

theta = [0. 1.04719755 2.0943951 3.14159265]

sin(theta) = [0.00000000e+00 8.66025404e-01 8.66025404e-01 1.22464680e-16]
cos(theta) = [ 1. 0.5 -0.5 -1. ]
tan(theta) = [ 0.00000000e+00 1.73205081e+00 -1.73205081e+00 -1.22464680e-16]

In [447]:
x = [- 1, 0, 1]

print("x = ", x)

print("arcsin(x) = ", np. arcsin(x))

print("arccos(x) = ", np. arccos(x))

print("arctan(x) = ", np. arctan(x))

x = [-1, 0, 1]
arcsin(x) = [-1.57079633 0. 1.57079633]
arccos(x) = [3.14159265 1.57079633 0. ]
arctan(x) = [-0.78539816 0. 0.78539816]

Specialized functions : gamma, beta, erf, ...

scipy.special : provides many special functions

In [448]:
f r o m scipy i m p o r t special

In [449]:
# Gamma functions (generalized factorials) and related functions

x = [1, 5, 10]

print("gamma(x) =", special. gamma(x))

print("ln|gamma(x)| =", special. gammaln(x))

print("beta(x, 2) =", special. beta(x, 2))

gamma(x) = [1.0000e+00 2.4000e+01 3.6288e+05]

ln|gamma(x)| = [ 0. 3.17805383 12.80182748]
beta(x, 2) = [0.5 0.03333333 0.00909091]

In [450]:
# Error function (integral of Gaussian)

# its complement, and its inverse

x = np. array([0, 0.3, 0.7, 1.0])

print("erf(x) =", special. erf(x))

print("erfc(x) =", special. erfc(x))

print("erfinv(x) =", special. erfinv(x))

erf(x) = [0. 0.32862676 0.67780119 0.84270079]

erfc(x) = [1. 0.67137324 0.32219881 0.15729921]
erfinv(x) = [0. 0.27246271 0.73286908 inf]

Specifying output
In [27]:
x = np. arange(5)

y = np. arange(10)

np. multiply(x, 10, out= y[3:8]) # store x*10 to y[3:8]

print(y)

y[3:8] = np. multiply(x, 10)

print(y)

[ 0 1 2 0 10 20 30 40 8 9]
[ 0 1 2 0 10 20 30 40 8 9]
2-3. Aggregations : sum, min, max, and so on
numpy aggregation functions are much faster than standard python aggregation

In [452]:
L = np. random. random(100000)

% t i m e i t sum(L)

% t i m e i t np.sum(L)

15.4 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
45.2 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [453]:
% t i m e i t max(L)

% t i m e i t L.max()

10 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
36.3 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [28]:
M = np. random. random((3, 4))

print(M)

[[0.59753225 0.07463842 0.43803321 0.21812627]

[0.53159247 0.10793176 0.8448757 0.03346237]
[0.32140902 0.65124221 0.61473394 0.2195387 ]]

In [29]:
print(M. sum())

print(M. sum(axis= 0), '\n')

print(M. cumsum(axis= 1), '\n')

print(M. prod(axis= 0), '\n')

print(M. cumprod(axis= 1), '\n')

4.653116331422877
[1.45053374 0.83381239 1.89764286 0.47112734]

[[0.59753225 0.67217067 1.11020389 1.32833016]

[0.53159247 0.63952423 1.48439993 1.5178623 ]
[0.32140902 0.97265124 1.58738518 1.80692388]]

[0.10209353 0.00524631 0.22750296 0.00160242]

[[0.59753225 0.04459886 0.01953578 0.00426127]

[0.53159247 0.05737571 0.04847534 0.0016221 ]
[0.32140902 0.20931512 0.12867311 0.02824873]]

In [30]:
print('min =', M. min(axis= 0))

print('max =', M. max(axis= 0))

print('mean=', M. mean(axis= 0))

print('var =', M. var(axis= 0))

print('std =', M. std(axis= 0))

print('med =', np. median(M, axis= 0)) # M.median(axis=0) does not work

print('p75%=', np. percentile(M, 75, axis= 0))

min = [0.32140902 0.07463842 0.43803321 0.03346237]

max = [0.59753225 0.65124221 0.8448757 0.2195387 ]
mean= [0.48351125 0.27793746 0.63254762 0.15704245]
var = [0.01386324 0.06986296 0.02774547 0.00763635]
std = [0.11774227 0.26431602 0.1665697 0.08738621]
med = [0.53159247 0.10793176 0.61473394 0.21812627]
p75%= [0.56456236 0.37958699 0.72980482 0.21883248]

In [31]:
x = np. arange(9,- 1,- 1)

print (np. argmin(x))

print (x. argmax())

9
0

2-4. Example: What is the Average Height of US

Presidents?
Aggregates available in NumPy can be extremely useful for summarizing a set of values.
As a
simple example, let's consider the heights of all US presidents.
This data is available in the file
president_heights.csv, which is a simple comma-separated list of labels and values:

president_heights.csv
order,name,height(cm)

1,George Washington,189

2,John Adams,170

3,Thomas Jefferson,189
...

pandas to read the file

pandas will be explored more fully later

In [458]:
i m p o r t pandas a s pd

data = pd. read_csv('data/president_heights.csv')

heights = np. array(data['height(cm)'])

print(heights)

[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
174 183 183 168 170 178 182 180 183 178 182 188 175 179 183 193 182 183
177 185 188 188 182 185]

summary statistics:
In [459]:
print("Mean height: ", heights. mean())

print("Standard deviation:", heights. std())

print("Minimum height: ", heights. min())

print("Maximum height: ", heights. max())

Mean height: 179.73809523809524

Standard deviation: 6.931843442745892
Minimum height: 163
Maximum height: 193

quantiles:
In [460]:
print("25th percentile: ", np. percentile(heights, 25))

print("Median: ", np. median(heights))

print("75th percentile: ", np. percentile(heights, 75))

25th percentile: 174.25

Median: 182.0
75th percentile: 183.0

In [461]:
% m a t p l o t l i b inline

i m p o r t matplotlib.pyplot a s plt

i m p o r t seaborn; seaborn. set() # set plot style

In [462]:
plt. hist(heights)

plt. title('Height Distribution of US Presidents')

plt. xlabel('height (cm)')

plt. ylabel('number');

In [ ]:

2-5. Broadcasting
Motivation
ufunc : element-wise operation
A⊙B
what if A and B has different shape?

we want to match shapes

as long as there is a natural way

</span>
broadcasting : rules for binary ufunc when shapes differ

In [463]:
i m p o r t numpy a s np

In [464]:
a = np. array([0, 1, 2])

b = np. array([5, 5, 5])

print (a + b)

[5 6 7]

In [465]:
print(a + 5)

[5 6 7]

we can view a + 5 as :
duplicate the value 5 into the array [5, 5, 5]
then add element-wise
this is only mental model (simple way of thinking broadcasting)
numpy does this in a more efficient way
We can similarly extend this to arrays of higher dimension

In [466]:
a = np. array([0, 1, 2])

M = np. ones((3, 3))

print(M+ a)

[[1. 2. 3.]
[1. 2. 3.]
[1. 2. 3.]]

M+a
a is duplicated, or broadcast
across the second dimension (vertically)
in order to match the shape of M .

In [32]:
a = np. arange(3)

b = np. arange(3). reshape((3,1))

print(a, '\n')

print(b, '\n')

print(a+ b)

[0 1 2]

[[0]
[1]
[2]]

[[0 1 2]
[1 2 3]
[2 3 4]]

visualization of broadcasting in a + 5 , M + b , and a + b

The light boxes represent the broadcasted values: again, this extra memory is not actually
allocated in the course of the operation, but it can be useful conceptually to imagine that it is.

Rules of Broadcasting
Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two
arrays:

Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with
fewer dimensions is padded with ones on its leading (left) side.
Rule 2: If the shape of the two arrays does not match in any dimension, the array with
shape equal to 1 in that dimension is stretched to match the other shape.
Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Non-compatible example
In [468]:
M = np. ones((3, 2))

a = np. arange(3)

M + a # TAQ : result?

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
< i p y t h o n - i n p u t - 4 6 8 - d 4 a d f a 6 8 c d 6 2 > in <module>
1 M = np. ones( ( 3 , 2 ) )
2 a = np. arange( 3 )
----> 3 M + a

V a l u e E r r o r : operands could not be broadcast together with shapes (3,2) (3,)

Broadcasting rules apply to any binary ufunc .

e.g. logaddexp(a, b) = log(exp(a) + exp(b))

In [33]:
a = np. array([0, 1, 2])

M = np. ones((3, 3))

print (np. logaddexp(M, a),'\n')

print (np. logaddexp(M, a. reshape((3,1))))

[[1.31326169 1.69314718 2.31326169]

[1.31326169 1.69314718 2.31326169]
[1.31326169 1.69314718 2.31326169]]

[[1.31326169 1.31326169 1.31326169]

[1.69314718 1.69314718 1.69314718]
[2.31326169 2.31326169 2.31326169]]

Broadcasting Example : Centering an array (i.e. zero mean)

Some data analysis algorithms assume zero-mean data for simplicity
PCA
How to do centering?

In [34]:
X = np. random. random((10, 3))

print(X,'\n')

Xmean = X. mean(axis= 0)

print(Xmean)

[[0.20020551 0.71990961 0.04386778]

[0.24253241 0.58362154 0.19365921]
[0.73131058 0.54673692 0.34738314]
[0.4634587 0.36476761 0.48515853]
[0.83694219 0.67311311 0.08619293]
[0.08002749 0.40544108 0.42883816]
[0.34285624 0.21887864 0.82597284]
[0.91164433 0.76492665 0.4030136 ]
[0.2637624 0.37390141 0.97775962]
[0.40045545 0.65564646 0.33258765]]

[0.44731953 0.5306943 0.41244335]

We can compute the mean of each feature using the mean aggregate across the first
dimension:

In [471]:
X_centered = X - Xmean

print (X_centered. mean(axis= 0))

[ 9.43689571e-17 -5.55111512e-17 4.44089210e-17]

Broadcasting Example : Plotting a two-dimensional function

plot a function z = f (x, y)

need to evaluate f (x, y) at 50x50 grid points

use broadcasting to compute z = f (x, y)

then plot z using matplotlib, which will be covered later

In [472]:
# x and y have 50 steps from 0 to 5

x = np. linspace(0, 5, 50) # shape=(50,)

y = np. linspace(0, 5, 50). reshape((- 1,1)) # shape=(50,1)

z = np. sin(x)* * 10 + np. cos(10 + y * x) * np. cos(x)

print('shape of z = ', z. shape)

print(z)

shape of z = (50, 50)

[[-0.83907153 -0.83470697 -0.8216586 ... 0.8956708 0.68617261
0.41940746]
[-0.83907153 -0.82902677 -0.8103873 ... 0.92522407 0.75321348
0.52508175]
[-0.83907153 -0.82325668 -0.79876457 ... 0.96427357 0.84172689
0.66446403]
...
[-0.83907153 -0.48233077 -0.01646558 ... 0.96449925 0.75196531
0.41982581]
[-0.83907153 -0.47324558 0.00392612 ... 0.92542163 0.68540362
0.37440839]
[-0.83907153 -0.46410908 0.02431613 ... 0.89579384 0.65690314
0.40107702]]

In [473]:
% m a t p l o t l i b inline

i m p o r t matplotlib.pyplot a s plt

plt. imshow(z, origin= 'lower', extent= [0, 5, 0, 5], cmap= 'viridis')

plt. colorbar();

2-6. Comparisons, Masks, and Boolean Logic

In [ ]:
x = np. array([1, 2, 3, 4, 5])

b = (x < = 3)

print ("x <= 3 : ", b)

print (np. sum(x < = 3)) # TAQ

print (np. count_nonzero(x < = 3)) # TAQ

In [475]:
print (x* b) # masking

print (np. sum(x* b))

print (x[x< = 3])

print (x[b])

print (np. sum(x[x< = 3]))

[1 2 3 0 0]
6
[1 2 3]
[1 2 3]
6

In [476]:
print ((3 < = x) & (x < = 4))

print (np. any((3 < = x) & (x < = 4)))

print ((x < 3) | (x > 4))

print (np. all((x < 3) | (x > 4)))

[False False True True False]

True
[ True True False False True]
False

In [477]:
b = (x < = 3)

print (x* b) # masking

print (np. sum(x* b))

[1 2 3 0 0]
6

Motivating Example: Sleepless in Seatle

Is Seatle really rainy city?
Let's get data first!
daily rainfall data from January 1 to December 31, 2014.

In [37]:
i m p o r t numpy a s np

i m p o r t pandas a s pd

# use pandas to extract rainfall inches as a NumPy array

data = pd. read_csv('data/Seattle2014.csv')

rainfall = data['PRCP']. values

rainfall. shape

Out[37]: (365,)

In [43]:
% m a t p l o t l i b inline

i m p o r t matplotlib.pyplot a s plt

# you may need to install seaborn to set nice plot styles

# >>> conda install seaborn

i m p o r t seaborn; seaborn. set()

plt. hist(rainfall, 40);

Questions (on Seatle rainfall data in 2014)

number of rainy days
number of rainy days in non-summer
precipitation in summer
precipitation in non-summer
...

In [480]:
print("Number days without rain: ", np. sum(rainfall = = 0))

print("Number days with rain: ", np. sum(rainfall > 0))

print("Days with more than 10 mm: ", np. sum(rainfall > 10))

print("Rainy days with < 5 mm: ", np. sum((rainfall > 0) & (rainfall < 5)))

Number days without rain: 215

Number days with rain: 150
Days with more than 10 mm: 120
Rainy days with < 5 mm: 10

In [481]:
# construct a mask of all rainy days

rainy = (rainfall > 0)

# construct a mask of all summer days (June 21st is the 172nd day)

days = np. arange(365)

summer = (days > 172) & (days < 262)

print("Median precip on rainy days in 2014 (mm): ",

np. median(rainfall[rainy]))

print("Median precip on summer days in 2014 (mm): ",

np. median(rainfall[summer]))

print("Maximum precip on summer days in 2014 (mm): ",

np. max(rainfall[summer]))

print("Median precip on non-summer rainy days (mm):",

np. median(rainfall[rainy & ~ summer]))

Median precip on rainy days in 2014 (mm): 49.5

Median precip on summer days in 2014 (mm): 0.0
Maximum precip on summer days in 2014 (mm): 216
Median precip on non-summer rainy days (mm): 51.0

2-7. Fancy Indexing

Indexing np array
simple index: arr[0]
slice: arr[:5]
Boolean mask: arr[arr > 0]
fancy indexing: arr[[1,3,7]]

In [482]:
x = np. random. randint(100, size= 10)

print(x)

[ 7 8 89 16 52 87 72 34 4 0]

In [ ]:
ind = [3, 7, 4]

print (x[ind]) # TAQ : result?

In [ ]:
ind = np. array([[3, 7],

[4, 5]])

x[ind] # TAQ : result?

In [485]:
X = np. arange(12). reshape((3, 4))

Out[485]: array([[ 0, 1, 2, 3],

[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

In [486]:
row = np. array([0, 1, 2])

col = np. array([2, 1, 3])

X[row, col] # TAQ : result?

Out[486]: array([ 2, 5, 11])

In [487]:
row = np. array([0, 2]). reshape((2,1))

col = np. array([2, 1, 3])

X[row, col] # TAQ : result? Hint : broadcasting is applied

Out[487]: array([[ 2, 1, 3],

[10, 9, 11]])
Combined Indexing
combining simple, slice, mask, and fancy index

In [488]:
print(X)

col = [2,0,1]

print(X[2, col])

print(X[1:, col])

[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[10 8 9]
[[ 6 4 5]
[10 8 9]]

In [489]:
row = np. array([0, 2]). reshape((2,1))

col_mask = np. array([T r u e , F a l s e , T r u e , F a l s e ])

X[row, col_mask]

Out[489]: array([[ 0, 2],

[ 8, 10]])

Example: Selecting Random Points

consider a set of N points in D dimensions
we generate N = 100 points in 2D
bi-variate normal
plot them using matplotlib
randomly select 20 points from them
mark selected points in different shape

In [45]:
mean = [0, 0]

cov = [[1, 2],

[2, 5]]

X = np. random. multivariate_normal(mean, cov, 100)

X. shape

Out[45]: (100, 2)

In [46]:
% m a t p l o t l i b inline

i m p o r t matplotlib.pyplot a s plt

i m p o r t seaborn; seaborn. set() # for plot styling

plt. scatter(X[:, 0], X[:, 1]);

In [47]:
indices = np. random. choice(X. shape[0], 20, replace= F a l s e )

print (indices)

selection = X[indices] # fancy indexing here

print (selection. shape)

[ 2 81 12 6 94 68 82 30 1 23 37 3 64 21 11 45 83 67 92 71]
(20, 2)

In [48]:
plt. scatter(X[:, 0], X[:, 1], alpha= 0.3)

plt. scatter(selection[:, 0], selection[:, 1],

facecolor= 'red', s= 7);

Modifying Values with Fancy Indexing

In [494]:
x = np. arange(10)

idx = np. array([2, 1, 8, 4])

x[idx] = 99

print(x)

x[idx] - = 10

print(x)

[ 0 99 99 3 99 5 6 7 99 9]
[ 0 89 89 3 89 5 6 7 89 9]
We can use any assignment-type operator for this. For example:

In [495]:
x[i] - = 10

print(x)

[ 0 89 89 3 89 5 6 7 89 -1]

Avoid duplication in fancy index

may cause unexpected results
use at() method if duplication is unavoidable

In [496]:
# duplication in fancy index

x = np. zeros(10)

x[[0, 0]] = [4, 6]

print(x)

[6. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

In [497]:
# duplication in fancy index

idx = [2, 3, 3, 4, 4, 4]

x[idx] + = 1

print(x)

[6. 0. 1. 1. 1. 0. 0. 0. 0. 0.]

In [498]:
x = np. zeros(10)

np. add. at(x, idx, 1)

print(x)

[0. 0. 1. 2. 3. 0. 0. 0. 0. 0.]

Sorting
np.sort
np.argsort

In [499]:
x = np. random. random(5)

y = np. sort(x)

print(x) # x is not changed

print(y)

x. sort() # in-place sort

print(x)

[0.59488531 0.19874637 0.33881144 0.23509604 0.80192003]

[0.19874637 0.23509604 0.33881144 0.59488531 0.80192003]
[0.19874637 0.23509604 0.33881144 0.59488531 0.80192003]

In [500]:
height = 150 + 40* np. random. random(5)

print('hieght=', height)

money = 100* np. random. random(5)

print('money =', money)

idx = np. argsort(height) # idx in order of height

print('index =', idx)

print('h[idx]=', height[idx]) # fancy index

print('m[idx]=', money[idx]) # fancy index

hieght= [189.40070616 161.84410721 163.2017855 167.6655901 184.75984166]

money = [ 3.3759033 12.44325455 84.22738284 5.61546723 45.12811488]
index = [1 2 3 4 0]
h[idx]= [161.84410721 163.2017855 167.6655901 184.75984166 189.40070616]
m[idx]= [12.44325455 84.22738284 5.61546723 45.12811488 3.3759033 ]

Partitioning
g
complete sorting is not needed
want to find the k-smallest values in the array
np.partition :
the smallest K values to the left of the partition
and the remaining values to the right, in arbitrary order:

In [501]:
x = np. array([7, 2, 3, 1, 6, 5, 4])

y = np. partition(x, 3)

print (y)

idx = np. argpartition(x, 3)

print (idx)

print (x[idx])

[2 1 3 4 6 5 7]
[1 3 2 6 4 5 0]
[2 1 3 4 6 5 7]

In [502]:
X = np. random. randint(0, 10, (4, 6))

print (X)

print (np. partition(X, 2, axis= 1))

[[5 8 4 2 0 0]
[6 5 1 9 6 8]
[8 4 4 1 2 1]
[0 4 1 0 6 7]]
[[0 0 2 4 5 8]
[1 5 6 9 6 8]
[1 1 2 8 4 4]
[0 0 1 4 6 7]]

Example: k-Nearest Neighbors

Randomy created 10 points in 2D

In [509]:
N = 10

X = np. random. rand(N, 2)

In [510]:
% m a t p l o t l i b inline

i m p o r t matplotlib.pyplot a s plt

i m p o r t seaborn; seaborn. set() # Plot styling

plt. scatter(X[:, 0], X[:, 1], s= 100);

In [511]: # squared distance matrix (NxN)

dist_sq = np. sum((X. reshape(N,1,2) - X. reshape(1,N,- 1)) * * 2, axis= - 1)

dist = np. sqrt(dist_sq)

In [512]:
# the above can be done using scipy.spatial.distance

f r o m scipy.spatial.distance i m p o r t pdist, squareform

# pdist(.) : pairwise distance, metric = 'euclid' by default

# squareform(.) : nxn matrix form

dist = squareform(pdist(X))

In [513]:
K = 2

knn0 = np. argpartition(dist, K + 1, axis= 1)

In [514]:
plt. scatter(X[:, 0], X[:, 1], s= 100)

# draw lines from each point to its two nearest neighbors

knn = knn0[:, 1:K+ 1] # exclude column 0

f o r i i n range(N):

f o r j i n knn[i]:

# plot a line from X[i] to X[j]

# use some zip magic to make it happen:

plt. plot(* zip(X[j], X[i]), color= 'black')

In [ ]:

Nexans Accessories
100% (2)
Nexans Accessories
8 pages
Vitamin C Serum Marketing Plan by Slidesgo
No ratings yet
Vitamin C Serum Marketing Plan by Slidesgo
40 pages
Num Py
No ratings yet
Num Py
48 pages
Experimental Investigation of A Dual Stage Ignition Biomass Downdraft
No ratings yet
Experimental Investigation of A Dual Stage Ignition Biomass Downdraft
10 pages
NB 10
0% (1)
NB 10
24 pages
PIPE LINE SOP
No ratings yet
PIPE LINE SOP
2 pages
Damla Bozoglu Portfolio
No ratings yet
Damla Bozoglu Portfolio
11 pages
Genshin Impact (@GenshinImpact) Twitter
No ratings yet
Genshin Impact (@GenshinImpact) Twitter
1 page
School of Computer Science and IT: Question Bank Sub: 20mcasct303 - Desinging Enterprise Network
No ratings yet
School of Computer Science and IT: Question Bank Sub: 20mcasct303 - Desinging Enterprise Network
8 pages
DAV
No ratings yet
DAV
80 pages
3PL Integration Through EDI
100% (1)
3PL Integration Through EDI
8 pages
Bài tập Câu gián tiếp
No ratings yet
Bài tập Câu gián tiếp
2 pages
NumPy Cheat Sheet
No ratings yet
NumPy Cheat Sheet
1 page
MODEL PD20P-FPS-PTT
No ratings yet
MODEL PD20P-FPS-PTT
2 pages
Project Management Skills Important Questions, CHAPTER 2
No ratings yet
Project Management Skills Important Questions, CHAPTER 2
4 pages
#The Numpy Array 20240306 131018 0000
No ratings yet
#The Numpy Array 20240306 131018 0000
7 pages
Lab 3
No ratings yet
Lab 3
19 pages
Numpy Project Part-1
No ratings yet
Numpy Project Part-1
49 pages
NumPy Basics
No ratings yet
NumPy Basics
9 pages
Numpy Complete Notes
No ratings yet
Numpy Complete Notes
68 pages
Numpy - Jupyter Notebook
No ratings yet
Numpy - Jupyter Notebook
44 pages
Hands On NumPy?-1
No ratings yet
Hands On NumPy?-1
27 pages
Day 3-Numpy Basics - Jupyter Notebook
No ratings yet
Day 3-Numpy Basics - Jupyter Notebook
8 pages
Python 20240309 154846 0000
No ratings yet
Python 20240309 154846 0000
34 pages
Workshop Notes-2 Handling Array with NumPy
No ratings yet
Workshop Notes-2 Handling Array with NumPy
13 pages
Lets Begin With Numpy
No ratings yet
Lets Begin With Numpy
16 pages
3GPP TR 36.819
No ratings yet
3GPP TR 36.819
13 pages
Numpy
No ratings yet
Numpy
28 pages
RePack by Diakov
No ratings yet
RePack by Diakov
1 page
Numpy
No ratings yet
Numpy
5 pages
Swarang Raut EDVA Experiment 1 Numpy Pandas
No ratings yet
Swarang Raut EDVA Experiment 1 Numpy Pandas
58 pages
Ex 2
No ratings yet
Ex 2
7 pages
Numpy Merged
No ratings yet
Numpy Merged
96 pages
Elris2D: A Matlab Package For The 2D Inversion of DC Resistivity/IP Data
No ratings yet
Elris2D: A Matlab Package For The 2D Inversion of DC Resistivity/IP Data
20 pages
Numpy Session1
No ratings yet
Numpy Session1
1 page
Electrical Control Valves EX4 / EX5 / EX6 / EX7 / EX8
No ratings yet
Electrical Control Valves EX4 / EX5 / EX6 / EX7 / EX8
9 pages
Bachelorarbeit Pascal Sthamer Final
No ratings yet
Bachelorarbeit Pascal Sthamer Final
73 pages
Section 7
No ratings yet
Section 7
33 pages
Unit 1
No ratings yet
Unit 1
170 pages
Day 3.Numpy_Complete_Guide
No ratings yet
Day 3.Numpy_Complete_Guide
17 pages
Sample PR Questions For Practice - MAD-22617
No ratings yet
Sample PR Questions For Practice - MAD-22617
2 pages
Python_Numpy
No ratings yet
Python_Numpy
20 pages
DMBI Theory
No ratings yet
DMBI Theory
15 pages
15.NUMPY
No ratings yet
15.NUMPY
32 pages
vertopal.com_12_Numpy
No ratings yet
vertopal.com_12_Numpy
17 pages
The Recent Status of R & D On Solar Cell - ITB
No ratings yet
The Recent Status of R & D On Solar Cell - ITB
29 pages
Python Numpy
100% (1)
Python Numpy
31 pages
numpy
No ratings yet
numpy
8 pages
NumPy: from basic to advance
No ratings yet
NumPy: from basic to advance
119 pages
Intro to Numpy With Examples
No ratings yet
Intro to Numpy With Examples
60 pages
Chapter 3 - Exception Handling in VB
No ratings yet
Chapter 3 - Exception Handling in VB
21 pages
Numpy Merged (1)
No ratings yet
Numpy Merged (1)
93 pages
Numpy
No ratings yet
Numpy
11 pages
Numpy
No ratings yet
Numpy
24 pages
Numpy
No ratings yet
Numpy
14 pages
En Tekstenuitleg Net Articles Software Access Validation Rule Tutorial List of Access Validation Rules
No ratings yet
En Tekstenuitleg Net Articles Software Access Validation Rule Tutorial List of Access Validation Rules
3 pages
Efficient Computing with NumPy
No ratings yet
Efficient Computing with NumPy
73 pages
Sheet 3 Numpy
No ratings yet
Sheet 3 Numpy
10 pages
NumPy 2
No ratings yet
NumPy 2
11 pages
v6.2.0e_ReleaseNotes_v1.0
No ratings yet
v6.2.0e_ReleaseNotes_v1.0
57 pages
Arrays
No ratings yet
Arrays
28 pages
Num Py Notes
No ratings yet
Num Py Notes
13 pages
SIH Problem Statements
No ratings yet
SIH Problem Statements
39 pages
Ids 6 Experiments
No ratings yet
Ids 6 Experiments
27 pages
Numpy Library Basics
No ratings yet
Numpy Library Basics
16 pages
Unit 4 Numpy
No ratings yet
Unit 4 Numpy
14 pages
Numpy - Basics - Jupyter Notebook
No ratings yet
Numpy - Basics - Jupyter Notebook
9 pages
NUMPY, PANDAS
No ratings yet
NUMPY, PANDAS
19 pages
En Norm
No ratings yet
En Norm
14 pages
Numpy (Numerical Python)
No ratings yet
Numpy (Numerical Python)
80 pages
Numpy
No ratings yet
Numpy
20 pages
Unit Iii Using Numpy
No ratings yet
Unit Iii Using Numpy
23 pages
Num Py
No ratings yet
Num Py
31 pages
Ruckus R310: Benefits
No ratings yet
Ruckus R310: Benefits
6 pages
INTRODUCTION TO ORACLE 10G
No ratings yet
INTRODUCTION TO ORACLE 10G
4 pages
PYTHON UNIT-5 Part-B
No ratings yet
PYTHON UNIT-5 Part-B
3 pages
Numpy Basics
No ratings yet
Numpy Basics
66 pages
Notebook 1 - Numpy
No ratings yet
Notebook 1 - Numpy
17 pages
Numpy
No ratings yet
Numpy
71 pages
1 s2.0 S0196890421011778 Main
No ratings yet
1 s2.0 S0196890421011778 Main
12 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
TDS BM66
No ratings yet
TDS BM66
1 page
Applied Machine Learning For Engineers: Introduction To Numpy
No ratings yet
Applied Machine Learning For Engineers: Introduction To Numpy
13 pages
Numpy Part 1
No ratings yet
Numpy Part 1
33 pages
NumPy Quickstart
No ratings yet
NumPy Quickstart
26 pages
The 555 - A Versatile Timer (RE-EN - 1992 - 09-12)
No ratings yet
The 555 - A Versatile Timer (RE-EN - 1992 - 09-12)
28 pages
FALLSEM2023-24 CSI3007 ETH VL2023240104352 2023-09-27 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3007 ETH VL2023240104352 2023-09-27 Reference-Material-I
47 pages
DesignBoat Curriculum
0% (1)
DesignBoat Curriculum
12 pages
Tentative NumPy Tutorial
No ratings yet
Tentative NumPy Tutorial
30 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet