L2. Numpy
L2. Numpy
Numpy
2-1. Numpy basic
Numpy : efficient implementation of n-dim array
built in C : fast
1-d array, 2-d array, ..., n-d array
In [414]:
i m p o r t numpy a s np
In [415]:
a = [1,2,3,4,5,6,7,8]
b = np. array(a)
a : List = [1, 2, 3, 4, 5, 6, 7, 8]
b : np array = [1 2 3 4 5 6 7 8]
np.zeros(10, dtype=float)
np.zeros(10, dtype='int16')
np.zeros(10, dtype=np.float32)
int_ Default integer type (same as C long ; normally either int64 or int32 )
intp Integer used for indexing (same as C ssize_t ; normally either int32 or int64 )
float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
In [416]:
np. array([1, 2, 3, 4], dtype= 'float32')
In [417]:
# Create a length-10 integer array filled with zeros
print(a0. shape)
print(a0)
print(a1. shape)
print(a1)
print(af)
(10,)
[0 0 0 0 0 0 0 0 0 0]
(2, 5)
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
[[3.14 3.14 3.14 3.14 3.14]
[3.14 3.14 3.14 3.14 3.14]]
In [418]:
# Create a 3x3 identity matrix
np. eye(3)
In [419]:
# Create an array filled with a linear sequence
# for reproducibility
i m p o r t random
i m p o r t numpy a s np
random. seed(0)
In [10]:
# Create a 3x3 array of uniform[0,1] random numbers
In [421]:
# Create a 3x3 array of random integers in the interval [0, 10)
In [422]:
# Create a 3x3 array of N(0,1)
In [423]:
# Create a 3x3 array of N(50,1)
In [424]:
np. random. seed(0) # seed for reproducibility
dtype: int32
ndim: 3
shape: (3, 4, 5)
size: 60
itemsize: 4 bytes
nbytes: 240 bytes
print (x1)
[0 1 2 3 4 5 6 7 8 9]
In [426]:
x1[1] = 1.8 # truncated to integer
print (x1[:4])
print (x1[4:7])
print (x1[7:])
print (x1[1:8:2])
0 1 9 8
[0 1 2 3]
[4 5 6]
[7 8 9]
[7 8]
[1 3 5 7]
Slicing
In [ ]:
x1 = np. arange(10)
y = x1[4:7]
print(x1, y)
y[0] = 0
print(x1, y) # TAQ
z = x1[7:]. copy()
z[0] = 0
print(x1,z) # TAQ
In [428]:
x2 = np. random. randint(0, 100, size= (3,4))
print (x2)
print (x2[0,1:3])
[[42 58 31 1]
[65 41 57 35]
[11 46 82 91]]
[42 58 31 1]
[58 31]
[58 41 46]
Array reshaping
y = x.reshape(new_shape) : changes shapes of x
no-copy view : reference the original array
x.reshape(-1) : flattens to 1-d array
numpy n-d array : 1-d array storage + n-d view
In [ ]:
x1 = np. arange(9)
print(x1, '\n')
print(x2, '\n')
print(x1) # TAQ
In [16]:
x2 = np. arange(12). reshape((3, 4))
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[0 1 2 3]
[[4 5 6 7]]
[[ 8]
[ 9]
[10]
[11]]
In [17]:
print (x2. reshape(- 1), '\n')
[ 0 1 2 3 4 5 6 7 8 9 10 11]
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]]
[[[ 0 1 2]
[ 3 4 5]]
[[ 6 7 8]
[ 9 10 11]]]
In [6]:
i m p o r t numpy a s np
x3 = np. arange(2* 3* 4)
x4 = x3. reshape((2,3,4))
a,b,c = 1,1,2
print(x4[a,b,c])
k = (a* 3 + b)* 4 + c
print(x3[k])
x3[k] = 0
18
18
0
Concatenating arrays
In [18]:
x = np. array([1, 2, 3])
[1 2 3 3 2 1]
[ 1 2 3 3 2 1 99 99 99]
In [20]:
x = np. arange(0,8). reshape((2, 4))
[[0 1 2 3]
[4 5 6 7]]
[[ 8 9 10 11]
[12 13 14 15]]
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 0 1 2 3 8 9 10 11]
[ 4 5 6 7 12 13 14 15]]
[[ 0 1 2 3 8 9 10 11]
[ 4 5 6 7 12 13 14 15]]
500387.3135894248
In [23]:
# using list and for-loop
d e f reciprocal_1(x):
n = len(x)
y = []
s = 0.0
f o r i i n range(n):
z = 1.0 / x[i]
s + = z
y. append(z)
r e t u r n y, s
% t i m e i t b, s = reciprocal_1(a)
381 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [24]:
# using numpy array and for-loop
d e f reciprocal_2(x):
n = len(x)
y = np. zeros(n)
f o r i i n range(n):
r e t u r n y, y. sum()
% t i m e i t b, s = reciprocal_2(a)
In [25]:
# using numpy array and no-loop
d e f reciprocal_3(x):
y = 1/ x
r e t u r n y, y. sum()
% t i m e i t b, sum = reciprocal_3(a)
4.38 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
UFuncs : vectorized opertions
ufunc : universal function
very fast
element-wise operations on numpy array
unary
binary: scalar ⊙ np array
binary: np array ⊙ np array
In [438]:
x = np. arange(0,11,2)
y = np. arange(1,12,2)
print(x)
print(y)
[ 0 2 4 6 8 10]
[ 1 3 5 7 9 11]
In [439]:
print("x =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
x = [ 0 2 4 6 8 10]
x + 5 = [ 5 7 9 11 13 15]
x - 5 = [-5 -3 -1 1 3 5]
x * 2 = [ 0 4 8 12 16 20]
x / 2 = [0. 1. 2. 3. 4. 5.]
x // 2 = [0 1 2 3 4 5]
In [440]:
print("-x = ", - x)
print("x ** 2 = ", x * * 2)
print("x % 2 = ", x % 2)
-x = [ 0 -2 -4 -6 -8 -10]
x ** 2 = [ 0 4 16 36 64 100]
x % 2 = [0 0 0 0 0 0]
-(x/2+1)**2 = [ -1. -4. -9. -16. -25. -36.]
In [441]:
print (x + 2)
[ 2 4 6 8 10 12]
[ 2 4 6 8 10 12]
The following table lists the arithmetic operators implemented in NumPy:
Math functions
In [442]:
x = [- 1, 2, - 3]
print("y=|x| =", y)
x = [-1, 2, -3]
y=|x| = [1 2 3]
In [443]:
x = [1, 2, 3]
In [444]:
x = [1, 2, 4, 10]
print("x =", x)
x = [1, 2, 4, 10]
ln(x) = [0. 0.69314718 1.38629436 2.30258509]
log2(x) = [0. 1. 2. 3.32192809]
log10(x) = [0. 0.30103 0.60205999 1. ]
In [445]:
x = np. array([0.001, 0.0001, 0.00001], dtype= np. float32)
Trigonometric functions
In [446]: theta = np. linspace(0, np. pi, 4)
In [447]:
x = [- 1, 0, 1]
print("x = ", x)
x = [-1, 0, 1]
arcsin(x) = [-1.57079633 0. 1.57079633]
arccos(x) = [3.14159265 1.57079633 0. ]
arctan(x) = [-0.78539816 0. 0.78539816]
In [448]:
f r o m scipy i m p o r t special
In [449]:
# Gamma functions (generalized factorials) and related functions
x = [1, 5, 10]
In [450]:
# Error function (integral of Gaussian)
Specifying output
In [27]:
x = np. arange(5)
y = np. arange(10)
print(y)
print(y)
[ 0 1 2 0 10 20 30 40 8 9]
[ 0 1 2 0 10 20 30 40 8 9]
2-3. Aggregations : sum, min, max, and so on
numpy aggregation functions are much faster than standard python aggregation
In [452]:
L = np. random. random(100000)
% t i m e i t sum(L)
% t i m e i t np.sum(L)
15.4 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
45.2 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [453]:
% t i m e i t max(L)
% t i m e i t L.max()
10 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
36.3 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [28]:
M = np. random. random((3, 4))
print(M)
In [29]:
print(M. sum())
4.653116331422877
[1.45053374 0.83381239 1.89764286 0.47112734]
In [30]:
print('min =', M. min(axis= 0))
print('med =', np. median(M, axis= 0)) # M.median(axis=0) does not work
In [31]:
x = np. arange(9,- 1,- 1)
9
0
president_heights.csv
order,name,height(cm)
1,George Washington,189
2,John Adams,170
3,Thomas Jefferson,189
...
In [458]:
i m p o r t pandas a s pd
print(heights)
[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
174 183 183 168 170 178 182 180 183 178 182 188 175 179 183 193 182 183
177 185 188 188 182 185]
summary statistics:
In [459]:
print("Mean height: ", heights. mean())
quantiles:
In [460]:
print("25th percentile: ", np. percentile(heights, 25))
In [461]:
% m a t p l o t l i b inline
i m p o r t matplotlib.pyplot a s plt
In [462]:
plt. hist(heights)
plt. ylabel('number');
In [ ]:
2-5. Broadcasting
Motivation
ufunc : element-wise operation
A⊙B
what if A and B has different shape?
In [463]:
i m p o r t numpy a s np
In [464]:
a = np. array([0, 1, 2])
print (a + b)
[5 6 7]
In [465]:
print(a + 5)
[5 6 7]
we can view a + 5 as :
duplicate the value 5 into the array [5, 5, 5]
then add element-wise
this is only mental model (simple way of thinking broadcasting)
numpy does this in a more efficient way
We can similarly extend this to arrays of higher dimension
In [466]:
a = np. array([0, 1, 2])
print(M+ a)
[[1. 2. 3.]
[1. 2. 3.]
[1. 2. 3.]]
M+a
a is duplicated, or broadcast
across the second dimension (vertically)
in order to match the shape of M .
In [32]:
a = np. arange(3)
print(a, '\n')
print(b, '\n')
print(a+ b)
[0 1 2]
[[0]
[1]
[2]]
[[0 1 2]
[1 2 3]
[2 3 4]]
The light boxes represent the broadcasted values: again, this extra memory is not actually
allocated in the course of the operation, but it can be useful conceptually to imagine that it is.
Rules of Broadcasting
Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two
arrays:
Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with
fewer dimensions is padded with ones on its leading (left) side.
Rule 2: If the shape of the two arrays does not match in any dimension, the array with
shape equal to 1 in that dimension is stretched to match the other shape.
Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.
Non-compatible example
In [468]:
M = np. ones((3, 2))
a = np. arange(3)
M + a # TAQ : result?
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
< i p y t h o n - i n p u t - 4 6 8 - d 4 a d f a 6 8 c d 6 2 > in <module>
1 M = np. ones( ( 3 , 2 ) )
2 a = np. arange( 3 )
----> 3 M + a
In [33]:
a = np. array([0, 1, 2])
In [34]:
X = np. random. random((10, 3))
print(X,'\n')
Xmean = X. mean(axis= 0)
print(Xmean)
In [471]:
X_centered = X - Xmean
In [472]:
# x and y have 50 steps from 0 to 5
print(z)
In [473]:
% m a t p l o t l i b inline
i m p o r t matplotlib.pyplot a s plt
plt. colorbar();
b = (x < = 3)
In [475]:
print (x* b) # masking
print (x[b])
[1 2 3 0 0]
6
[1 2 3]
[1 2 3]
6
In [476]:
print ((3 < = x) & (x < = 4))
In [477]:
b = (x < = 3)
[1 2 3 0 0]
6
In [37]:
i m p o r t numpy a s np
i m p o r t pandas a s pd
rainfall. shape
Out[37]: (365,)
In [43]:
% m a t p l o t l i b inline
i m p o r t matplotlib.pyplot a s plt
In [480]:
print("Number days without rain: ", np. sum(rainfall = = 0))
print("Days with more than 10 mm: ", np. sum(rainfall > 10))
print("Rainy days with < 5 mm: ", np. sum((rainfall > 0) & (rainfall < 5)))
In [481]:
# construct a mask of all rainy days
# construct a mask of all summer days (June 21st is the 172nd day)
np. median(rainfall[rainy]))
np. median(rainfall[summer]))
np. max(rainfall[summer]))
In [482]:
x = np. random. randint(100, size= 10)
print(x)
[ 7 8 89 16 52 87 72 34 4 0]
In [ ]:
ind = [3, 7, 4]
In [ ]:
ind = np. array([[3, 7],
[4, 5]])
In [485]:
X = np. arange(12). reshape((3, 4))
In [486]:
row = np. array([0, 1, 2])
In [487]:
row = np. array([0, 2]). reshape((2,1))
In [488]:
print(X)
col = [2,0,1]
print(X[2, col])
print(X[1:, col])
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[10 8 9]
[[ 6 4 5]
[10 8 9]]
In [489]:
row = np. array([0, 2]). reshape((2,1))
X[row, col_mask]
In [45]:
mean = [0, 0]
[2, 5]]
X. shape
Out[45]: (100, 2)
In [46]:
% m a t p l o t l i b inline
i m p o r t matplotlib.pyplot a s plt
In [47]:
indices = np. random. choice(X. shape[0], 20, replace= F a l s e )
print (indices)
[ 2 81 12 6 94 68 82 30 1 23 37 3 64 21 11 45 83 67 92 71]
(20, 2)
In [48]:
plt. scatter(X[:, 0], X[:, 1], alpha= 0.3)
x[idx] = 99
print(x)
x[idx] - = 10
print(x)
[ 0 99 99 3 99 5 6 7 99 9]
[ 0 89 89 3 89 5 6 7 89 9]
We can use any assignment-type operator for this. For example:
In [495]:
x[i] - = 10
print(x)
[ 0 89 89 3 89 5 6 7 89 -1]
In [496]:
# duplication in fancy index
x = np. zeros(10)
print(x)
[6. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
In [497]:
# duplication in fancy index
idx = [2, 3, 3, 4, 4, 4]
x[idx] + = 1
print(x)
[6. 0. 1. 1. 1. 0. 0. 0. 0. 0.]
In [498]:
x = np. zeros(10)
print(x)
[0. 0. 1. 2. 3. 0. 0. 0. 0. 0.]
Sorting
np.sort
np.argsort
In [499]:
x = np. random. random(5)
y = np. sort(x)
print(y)
print(x)
In [500]:
height = 150 + 40* np. random. random(5)
print('hieght=', height)
Partitioning
g
complete sorting is not needed
want to find the k-smallest values in the array
np.partition :
the smallest K values to the left of the partition
and the remaining values to the right, in arbitrary order:
In [501]:
x = np. array([7, 2, 3, 1, 6, 5, 4])
y = np. partition(x, 3)
print (y)
print (idx)
print (x[idx])
[2 1 3 4 6 5 7]
[1 3 2 6 4 5 0]
[2 1 3 4 6 5 7]
In [502]:
X = np. random. randint(0, 10, (4, 6))
print (X)
[[5 8 4 2 0 0]
[6 5 1 9 6 8]
[8 4 4 1 2 1]
[0 4 1 0 6 7]]
[[0 0 2 4 5 8]
[1 5 6 9 6 8]
[1 1 2 8 4 4]
[0 0 1 4 6 7]]
In [509]:
N = 10
In [510]:
% m a t p l o t l i b inline
i m p o r t matplotlib.pyplot a s plt
In [512]:
# the above can be done using scipy.spatial.distance
dist = squareform(pdist(X))
In [513]:
K = 2
In [514]:
plt. scatter(X[:, 0], X[:, 1], s= 100)
f o r i i n range(N):
f o r j i n knn[i]:
In [ ]:
In [ ]: