基本运算

Numpy的数组支持算术运算、逻辑运算、常见数学函数等许多的运算功能。

算术运算

与标量的运算:

>>> a = np.array([1, 2, 3, 4])
>>> a + 1
array([2, 3, 4, 5])
>>> 2**a
array([ 2,  4,  8, 16])

数组元素的算术运算:

>>> a = np.array([1, 2, 3, 4])
>>> b = np.ones(4) + 1
>>> a - b
array([-1.,  0.,  1.,  2.])
>>> a * b
array([ 2.,  4.,  6.,  8.])

>>> j = np.arange(5)
>>> 2**(j + 1) - j
array([ 2,  3,  6, 13, 28])

这些操作远比在Python中直接实现效率要高的多:

>>> a = np.arange(10000)
>>> %timeit a + 1  
10000 loops, best of 3: 24.3 us per loop
>>> l = range(10000)
>>> %timeit [i+1 for i in l] 
1000 loops, best of 3: 861 us per loop

数组相乘只是逐元素相乘，不同于矩阵乘法:

>>> c = np.ones((3, 3))
>>> c * c  
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

矩阵乘法如下:

>>> c.dot(c)
array([[ 3.,  3.,  3.],
       [ 3.,  3.,  3.],
       [ 3.,  3.,  3.]])

支持类似+=和*=等操作，直接编辑原数组，而不是创建新的数组：

>>> a = np.ones((2,3), dtype=int)
>>> b = np.random.random((2,3))
>>> a *= 3
>>> a
array([[3, 3, 3],
       [3, 3, 3]])
>>> b += a
>>> b
array([[ 3.417022  ,  3.72032449,  3.00011437],
       [ 3.30233257,  3.14675589,  3.09233859]])

其他运算

数组与数组直接比较返回的还是一个数组，表示每个元素比较的结果:

>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([4, 2, 2, 4])
>>> a == b
array([False,  True, False,  True], dtype=bool)
>>> a > b
array([False, False,  True, False], dtype=bool)

array_equal函数在两个数组所有元素都相同时返回True，否则返回False:

>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([4, 2, 2, 4])
>>> c = np.array([1, 2, 3, 4])
>>> np.array_equal(a, b)
False
>>> np.array_equal(a, c)
True

逻辑运算:

>>> a = np.array([1, 1, 0, 0], dtype=bool)
>>> b = np.array([1, 0, 1, 0], dtype=bool)
>>> np.logical_or(a, b)
array([ True,  True,  True, False], dtype=bool)
>>> np.logical_and(a, b)
array([ True, False, False, False], dtype=bool)

一些常见数学函数:

>>> a = np.arange(5)
>>> np.sin(a)
array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ])
>>> np.log(a)
array([       -inf,  0.        ,  0.69314718,  1.09861229,  1.38629436])
>>> np.exp(a)
array([  1.        ,   2.71828183,   7.3890561 ,  20.08553692,  54.59815003])

使用数组进行运算时注意维度匹配:

>>> a = np.arange(4)
>>> a + np.array([1, 2])  
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4) (2)

矩阵转置，注意该操作只是创建一个原数组的视图:

>>> a = np.triu(np.ones((3, 3)), 1) # help(np.triu)
>>> a
array([[ 0.,  1.,  1.],
       [ 0.,  0.,  1.],
       [ 0.,  0.,  0.]])
>>> a.T
array([[ 0.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  1.,  0.]])

归约(Reduction)

Numpy的数组支持许多基本的归约与统计相关功能。

计算总和

一维：

>>> x = np.array([1, 2, 3, 4])
>>> np.sum(x)
10
>>> x.sum()
10

二维数组可按行或列计算总和：

>>> x = np.array([[1, 1], [2, 2]])
>>> x
array([[1, 1],
       [2, 2]])
>>> x.sum(axis=0)   # 每列之和 (第一维)
array([3, 3])
>>> x[:, 0].sum(), x[:, 1].sum()
(3, 3)

>>> x.sum(axis=1)   # 每行之和 (第二维)
array([2, 4])
>>> x[0, :].sum(), x[1, :].sum()
(2, 4)

更高维的情况:

>>> x = np.random.rand(2, 2, 2)
>>> x.sum(axis=2)[0, 1]     
1.14764...
>>> x[0, 1, :].sum()     
1.14764...

其他归约操作

求极值:

>>> x = np.array([1, 3, 2])
>>> x.min()
1
>>> x.max()
3

>>> x.argmin()  # index of minimum
0
>>> x.argmax()  # index of maximum
1

逻辑操作:

>>> np.all([True, True, False])
False
>>> np.any([True, True, False])
True

>>> a = np.zeros((100, 100))
>>> np.any(a != 0)
False
>>> np.all(a == a)
True

# 用于数组比较
>>> a = np.array([1, 2, 3, 2])
>>> b = np.array([2, 2, 3, 2])
>>> c = np.array([6, 4, 4, 5])
>>> ((a <= b) & (b <= c)).all()
True

统计:

>>> x = np.array([1, 2, 3, 1])
>>> y = np.array([[1, 2, 3], [5, 6, 1]])
>>> x.mean()
1.75
>>> np.median(x)
1.5
>>> np.median(y, axis=-1) # 最后一个维度
array([ 2.,  5.])

>>> x.std() 
0.82915619758884995

广播(Broadcasting)

数组的很多操作都是元素对元素的，通常两个数组维度和大小是一致的。

然后，对于不同大小的数组，Numpy也有可能会将数组转换为相同的大小，再执行后续的操作，这种转换我们称之为广播(broadcasting)。

>>> a = np.tile(np.arange(0, 40, 10), (3, 1)).T
>>> a
array([[ 0,  0,  0],
       [10, 10, 10],
       [20, 20, 20],
       [30, 30, 30]])
>>> b = np.array([0, 1, 2])
>>> a + b
array([[ 0,  1,  2],
       [10, 11, 12],
       [20, 21, 22],
       [30, 31, 32]])

我们有可能在未感知的情况下使用了广播:

>>> a = np.ones((4, 5))
>>> a[0] = 2  # 将零维数组赋给一维数组
>>> a
array([[ 2.,  2.,  2.,  2.,  2.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

一个有用的技巧:

>>> a = np.arange(0, 40, 10)
>>> a.shape
(4,)
>>> a = a[:, np.newaxis]  # 增加一个维度 -> 二维数组
>>> a.shape
(4, 1)
>>> a
array([[ 0],
       [10],
       [20],
       [30]])
>>> b = np.array([0, 1, 2])
>>> a + b
array([[ 0,  1,  2],
       [10, 11, 12],
       [20, 21, 22],
       [30, 31, 32]])

更复杂一点的例子：

>>> x, y = np.arange(5), np.arange(5)[:, np.newaxis]
>>> distance = np.sqrt(x ** 2 + y ** 2)
>>> distance
array([[ 0.        ,  1.        ,  2.        ,  3.        ,  4.        ],
       [ 1.        ,  1.41421356,  2.23606798,  3.16227766,  4.12310563],
       [ 2.        ,  2.23606798,  2.82842712,  3.60555128,  4.47213595],
       [ 3.        ,  3.16227766,  3.60555128,  4.24264069,  5.        ],
       [ 4.        ,  4.12310563,  4.47213595,  5.        ,  5.65685425]])

上例的向量x和y也可以使用numpy.ogrid函数直接创建:

>>> x, y = np.ogrid[0:5, 0:5]
>>> x, y
(array([[0],
       [1],
       [2],
       [3],
       [4]]), array([[0, 1, 2, 3, 4]]))
>>> x.shape, y.shape
((5, 1), (1, 5))
>>> distance = np.sqrt(x ** 2 + y ** 2)

此外，np.mgrid可按上面类似的方式直接创建矩阵，而省去广播：

>>> x, y = np.mgrid[0:4, 0:4]
>>> x
array([[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2],
       [3, 3, 3, 3]])
>>> y
array([[0, 1, 2, 3],
       [0, 1, 2, 3],
       [0, 1, 2, 3],
       [0, 1, 2, 3]])

维度(Shape)变换

扁平化(Flattening)

>>> a = np.array([[1, 2, 3], [4, 5, 6]])
>>> a.ravel()
array([1, 2, 3, 4, 5, 6])
>>> a.T # 矩阵倒置
array([[1, 4],
       [2, 5],
       [3, 6]])
>>> a.T.ravel()
array([1, 4, 2, 5, 3, 6])

对于更高维度的情况，总是最后的维度最先展开。

调整形状(Reshaping)

>>> a.shape
(2, 3)
>>> b = a.ravel()
>>> b = b.reshape((2, 3))
>>> b
array([[1, 2, 3],
       [4, 5, 6]])

上例也可以这样实现：

>>> b = a.reshape((2, -1))    # 不指定维度值(-1)表示自动推断
>>> b
array([[1, 2, 3],
       [4, 5, 6]])

注意：ndarray.reshape函数通常会尽可能的返回视图(help(np.reshape)):

>>> b[0, 0] = 99
>>> a
array([[99,  2,  3],
       [ 4,  5,  6]])

但某些情况下，reshape 函数可能会返回一个拷贝:

>>> a = np.zeros((3, 2))
>>> b = a.T.reshape(3*2)
>>> b[0] = 9
>>> a
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

增加一个维度(Adding a dimension)

索引时使用np.newaxis对象可以增加一个维度：

>>> z = np.array([1, 2, 3])
>>> z
array([1, 2, 3])

>>> z[:, np.newaxis]
array([[1],
       [2],
       [3]])

>>> z[np.newaxis, :]
array([[1, 2, 3]])

维度交换(Dimension shuffling)

>>> a = np.arange(4*3*2).reshape(4, 3, 2)
>>> a.shape
(4, 3, 2)
>>> a[0, 2, 1]
5
>>> b = a.transpose(1, 2, 0)
>>> b.shape
(3, 2, 4)
>>> b[2, 1, 0]
5

这种方式创建的是视图:

1
2
3

>>> b[2, 1, 0] = -1
>>> a[0, 2, 1]
-1

调整大小(Resizing)

使用ndarray.resize函数可以修改数组的大小：

>>> a = np.arange(4)
>>> a.resize((8,))
>>> a
array([0, 1, 2, 3, 0, 0, 0, 0])

注意此时数组不能在其他地方被引用，否则会有类似下面的错误：

>>> b = a
>>> a.resize((4,))   
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot resize an array that has been referenced or is
referencing another array in this way.  Use the resize function

堆叠(Stacking)不同的数组

可以沿不同的轴堆叠数组：

>>> a = np.floor(10*np.random.random((2,2)))
>>> a
array([[ 8.,  8.],
       [ 0.,  0.]])
>>> b = np.floor(10*np.random.random((2,2)))
>>> b
array([[ 1.,  8.],
       [ 0.,  4.]])
>>> np.vstack((a,b))
array([[ 8.,  8.],
       [ 0.,  0.],
       [ 1.,  8.],
       [ 0.,  4.]])
>>> np.hstack((a,b))
array([[ 8.,  8.,  1.,  8.],
       [ 0.,  0.,  0.,  4.]])

拆分(Splitting)数组

可以使用hsplit沿水平方向拆分数组：

>>> a = np.floor(10*np.random.random((2,12)))
>>> a
array([[ 9.,  5.,  6.,  3.,  6.,  8.,  0.,  7.,  9.,  7.,  2.,  7.],
       [ 1.,  4.,  9.,  2.,  2.,  1.,  0.,  6.,  2.,  2.,  4.,  0.]])
>>> np.hsplit(a,3)   # 拆分为3个
[array([[ 9.,  5.,  6.,  3.],
       [ 1.,  4.,  9.,  2.]]), array([[ 6.,  8.,  0.,  7.],
       [ 2.,  1.,  0.,  6.]]), array([[ 9.,  7.,  2.,  7.],
       [ 2.,  2.,  4.,  0.]])]
>>> np.hsplit(a,(3,4))   # 在第3列和第4列之后拆分数组
[array([[ 9.,  5.,  6.],
       [ 1.,  4.,  9.]]), array([[ 3.],
       [ 2.]]), array([[ 6.,  8.,  0.,  7.,  9.,  7.,  2.,  7.],
       [ 2.,  1.,  0.,  6.,  2.,  2.,  4.,  0.]])]

此外，vsplit 函数可以纵向拆分数组，而 array_split 函数可以指定拆分的轴(某个维度)。

排序

按某一维度进行排序：

>>> a = np.array([[4, 3, 5], [1, 2, 1]])
>>> b = np.sort(a, axis=1) # 每一行独立排序
>>> b
array([[3, 4, 5],
       [1, 1, 2]])

原地排序:

>>> a.sort(axis=1)
>>> a
array([[3, 4, 5],
       [1, 1, 2]])

结合索引数组:

>>> a = np.array([4, 3, 1, 2])
>>> j = np.argsort(a)
>>> j
array([2, 3, 1, 0])
>>> a[j]
array([1, 2, 3, 4])

查找最大值、最小值的索引:

>>> a = np.array([4, 3, 1, 2])
>>> j_max = np.argmax(a)
>>> j_min = np.argmin(a)
>>> j_max, j_min
(0, 2)

加载数据文件

加载文本文件：

1 2	np.savetxt('pop2.txt', data) data2 = np.loadtxt('pop2.txt')

使用 NumPy 自带的文件格式：

1
2
3

data = np.ones((3, 3))
np.save('pop.npy', data)
data3 = np.load('pop.npy')

参考资料

https://docs.scipy.org/doc/numpy/reference/routines.html

https://docs.scipy.org/doc/numpy/user/quickstart.html

http://www.scipy-lectures.org/intro/numpy/index.html

https://docs.scipy.org/doc/