本文乃Numpy Quickstart tutorial的翻译版本(并非完全翻译,翻译君为本人),英语能力足够的童鞋请阅读原文~,如果您觉得本文对您有帮助的话不要忘记点个赞哦!
一 神奇的广播操作
numpy的一些universal操作要求被操作的两个数组需要拥有相同的形状,如果两者的形状不相同的时候,则会调用广播功能,规则如下:
1.当输入数组的维数不同时,不断地给数组的较小维度上+1,即沿着这个数组的某个方向伸长出去(新位置由原来的元素替代),直到这些数组的形状相同。
2.输出数组在某个轴上的大小是输入数组在这个轴上的最大值
如下图所示的一样,两个形状不同的数组在相加时,b的垂直方向会伸长出去,直到和a形状相同为止
下面给出一些例子:
>>> x = np.arange(4) >>> xx = x.reshape(4,1) >>> y = np.ones(5) >>> z = np.ones((3,4)) >>> x.shape (4,) >>> y.shape (5,) >>> x + y <type 'exceptions.ValueError'>: shape mismatch: objects cannot be broadcast to a single shape >>> xx.shape (4, 1) >>> y.shape (5,) >>> (xx + y).shape (4, 5) >>> xx + y array([[ 1., 1., 1., 1., 1.], [ 2., 2., 2., 2., 2.], [ 3., 3., 3., 3., 3.], [ 4., 4., 4., 4., 4.]]) >>> x.shape (4,) >>> z.shape (3, 4) >>> (x + z).shape (3, 4) >>> x + z array([[ 1., 2., 3., 4.], [ 1., 2., 3., 4.], [ 1., 2., 3., 4.]])
二 神奇的索引功能
numpy允许你使用各式各样的索引,如下:
>>> a = np.arange(12)**2 # the first 12 square numbers >>> i = np.array( [ 1,1,3,8,5 ] ) # an array of indices >>> a[i] # the elements of a at the positions i array([ 1, 1, 9, 64, 25]) >>> >>> j = np.array( [ [ 3, 4], [ 9, 7 ] ] ) # a bidimensional array of indices >>> a[j] # the same shape as j array([[ 9, 16], [81, 49]])
如果引索数组是多维的,那么其中的每个元素指向被引索数组的第一个维度上的取值。
>>> palette = np.array( [ [0,0,0], # black ... [255,0,0], # red ... [0,255,0], # green ... [0,0,255], # blue ... [255,255,255] ] ) # white >>> image = np.array( [ [ 0, 1, 2, 0 ], # each value corresponds to a color in the palette ... [ 0, 3, 4, 0 ] ] ) >>> palette[image] # the (2,4,3) color image array([[[ 0, 0, 0], [255, 0, 0], [ 0, 255, 0], [ 0, 0, 0]], [[ 0, 0, 0], [ 0, 0, 255], [255, 255, 255], [ 0, 0, 0]]])
numpy甚至允许使用更高维的数组,不过需要具有相同形状
>>> a = np.arange(12).reshape(3,4) >>> a array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> i = np.array( [ [0,1], # indices for the first dim of a ... [1,2] ] ) >>> j = np.array( [ [2,1], # indices for the second dim ... [3,3] ] ) >>> >>> a[i,j] # i and j must have equal shape array([[ 2, 5], [ 7, 11]]) >>> >>> a[i,2] array([[ 2, 6], [ 6, 10]]) >>> >>> a[:,j] # i.e., a[ : , j] array([[[ 2, 1], [ 3, 3]], [[ 6, 5], [ 7, 7]], [[10, 9], [11, 11]]])
如果把i,j放到一个list中,效果也一样
>>> l = [i,j] >>> a[l] # equivalent to a[i,j] array([[ 2, 5], [ 7, 11]])
但是应当注意的是,不能把i,j放到array中,因为这么做会被解释为索引a的第一个维度。放到元组里是可以的
>>> s = np.array( [i,j] ) >>> a[s] # not what we want Traceback (most recent call last): File "<stdin>", line 1, in ? IndexError: index (3) out of range (0<=index<=2) in dimension 0 >>> >>> a[tuple(s)] # same as a[i,j] array([[ 2, 5], [ 7, 11]])
下面是实际的例子,利用数组索引来方便查找时间相关的最大值
>>> time = np.linspace(20, 145, 5) # time scale >>> data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series >>> time array([ 20. , 51.25, 82.5 , 113.75, 145. ]) >>> data array([[ 0. , 0.84147098, 0.90929743, 0.14112001], [-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ], [ 0.98935825, 0.41211849, -0.54402111, -0.99999021], [-0.53657292, 0.42016704, 0.99060736, 0.65028784], [-0.28790332, -0.96139749, -0.75098725, 0.14987721]]) >>> >>> ind = data.argmax(axis=0) # index of the maxima for each series >>> ind array([2, 0, 3, 1]) >>> >>> time_max = time[ ind] # times corresponding to the maxima >>> >>> data_max = data[ind, xrange(data.shape[1])] # => data[ind[0],0], data[ind[1],1]... >>> >>> time_max array([ 82.5 , 20. , 113.75, 51.25]) >>> data_max array([ 0.98935825, 0.84147098, 0.99060736, 0.6569866 ]) >>> >>> np.all(data_max == data.max(axis=0)) True
也可以更方便地赋值
>>> a = np.arange(5) >>> a array([0, 1, 2, 3, 4]) >>> a[[1,3,4]] = 0 >>> a array([0, 0, 2, 0, 0])
如果赋值操作进行了多次,那么后面的会覆盖前面的
>>> a = np.arange(5) >>> a[[0,0,2]]=[1,2,3] >>> a array([2, 1, 3, 3, 4])
但是python中的+=操作不允许你重复地进行
>>> a = np.arange(5) >>> a[[0,0,2]]+=1 >>> a array([1, 1, 3, 3, 4])
三 布尔值索引
利用布尔值可以方便的从数组中挑出需要的元素。
首先介绍形状与原来数组相同的使用方法
>>> a = np.arange(12).reshape(3,4) >>> b = a > 4 >>> b # b is a boolean with a's shape array([[False, False, False, False], [False, True, True, True], [ True, True, True, True]], dtype=bool) >>> a[b] # 1d array with the selected elements array([ 5, 6, 7, 8, 9, 10, 11])
赋值也可以:
>>> a[b] = 0 # All elements of 'a' higher than 4 become 0 >>> a array([[0, 1, 2, 3], [4, 0, 0, 0], [0, 0, 0, 0]])
如果您给出一维布尔数组,那么可以根据这个数组方便地指定某行某列的元素,但是要注意该操作需要一维数组的大小匹配。
>>> a = np.arange(12).reshape(3,4) >>> b1 = np.array([False,True,True]) # first dim selection >>> b2 = np.array([True,False,True,False]) # second dim selection >>> >>> a[b1,:] # selecting rows array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> >>> a[b1] # same thing array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> >>> a[:,b2] # selecting columns array([[ 0, 2], [ 4, 6], [ 8, 10]]) >>> >>> a[b1,b2] # a weird thing to do array([ 4, 10])