机器学习之线性代数

Linear algebra is a branch of mathematics that is widely uesd throughout science and engineering. A good understanding of linear algebra is essential for understanding and working with many machine learning algorithms, especially deep leaning algrothms.

线性无关

Determining whether Ax=b has a solution thus amounts to testing whether b is in the span of the columns of A.

A set of vectors is linear independtent if no vector in the set is a linear combination of the other vectors.

范数

Formally the p Norms is given by

$$ ||x||_{p} = (\sum |x|^{p})^{\frac{1}{p}} $$

Sometimes we may also wish to measure the size of a matrix. The most common way to do this is with the otherwise obscure Frobenius norm

$$ ||A||_F = \sqrt{\sum_{i,j} A_{i,j}^2}$$

特殊矩阵

对角矩阵

Diagonal matrices consist mostly of zeros and have non-zero entries only along the main diagonal.

对称矩阵

A symmetric matrix is any matrix that is equal to tis own transpose:

$$ A = A^T $$

Symmetrix matrices often arise when the entries are generated by some function of two arguments that does not depend on the order of the arguments.

正交矩阵

An orthogonal matrix is a square matrix whose own rows are mutually orthonormal and whose columns are mutually orthonormal:

$$ A^TA= AA^T = I $$

This implies that

$$ A^T = A^{-1} $$

因为任意两个向量都是垂直的, 所以这个矩阵相当于对坐标系进行旋转. 又因为任意一个向量都是单位向量, 因此这个矩阵是纯粹的旋转操作, 这个矩阵代表的变化不改变向量的长度(即只旋转向量的方向)

通过代数分析可知, 正交矩阵的转置对应的正好就是原矩阵反方向的旋转

特征分解

An eigenvector of a square matrix A is a non-zero vector v such that multiplication by A alters only the scale of v:

$$ Av = \lambda v $$

Suppose that a matrix A has n linearly independent eigenvectors, we may concatenate all of the eigenvectors to form a matrix V with one eigenvector per column:

$$ V = [v^{(1)},v^{(2)},\cdots,v^{(n)}] $$

Likewise, we can concatenate the eigenvalues to form a vectors

$$ \lambda = [\lambda_{1},\lambda_{2},\cdots,\lambda_{n}] $$

The eigendecompositon of A is then given by

$$ A = Vdiag(\lambda)V^{-1} $$

Because:

$$
\begin{align}
Vdiag(\lambda)V^{-1} &= diag(\lambda)VV^{-1} \\
&= [\lambda_{1}v^{(1)},\lambda_{2}v^{(2)},\cdots,\lambda_{n}v^{(n)}]V^{-1} \\
&= [Av^{(1)},Av^{(2)},\cdots,Av^{(n)}]V^{-1} \\
&= AVV^{-1} \\
&= A
\end{align}
$$

对于一个实对称矩阵, 始终满足

$$ A = Q \Lambda Q^T $$

其中, Q是一个正交矩阵, 由A的特征向量构成(即A的特征向量相互正交), 中间的矩阵是一个单位矩阵, 是各个特征向量对于的特征值.

上述性质可以用来求解以下形式的函数极值

$$ f(x) = x^TAx $$

当x为A的特征向量时, f(x)的值为相应的特征值. 因此x为最大特征值对应的特征向量时, f(x)取最大值. 当x为最小特征值对应的特征向量时, f(x)取最小值.

奇异值分解

The singular value decomposition(SVD) provides an way to factorize a matrix into singular vector and singular values. Every real matrix has a singular value decomposition, but the same is not true of the eigenvalue decomposition. For example, if a matrix is not square, the eigendecomposition is not defined, and we must use a singular value decomposition instead.

The singualr value decomposition is similar to eigendecomposition, except this time we will write A as a product of three matrices:

$$A=UDV^T$$

Suppose that A is an $m \times n $ matrix. Then U is defined to be an $m \times m $ matrix, D to be an $m \times n $ matrix, and V to be an $n \times n $ matrix.

Each of these matrices is defined to have a special structure. The matrices U and V are both defined to be orthogonal matrices. The matrix D is defined to be a diagonal matrix. Note that D is not necessarily square.

The elements along the diagonal of D are known as the singular values of the matrix A. The columns of U are known as the left-singular vector. The columns of V are konw as the right-singular vector.

We can actually interpret the singular value decomposition of A in terms of the eigendecomposition of funtions of A. The left-singular vectors of A are the eigenvectors of $ AA^T $. The right-singular vectors of A are the eigenvectors of $ A^TA $. The non-zero singular values of A are the square roots of the eigenvalues of $ AA^T $ or $ A^TA $.

摩尔－彭若斯广义逆

Matrix inversion is not defined for matrices that are not square. Suppose we want to make a left-inverse B of a matrix A, so that we can solve a linear equation

$$Ax=y$$

by left-multiplying each side to obtain

$$x=By$$

Depending on the structure of the problem, it may not be possible to design a unique mapping from A to B.

If A is taller than it is wide, then it is possible for this equation to have no solution. If A is wider than it is tall, then there could be multiple possible solution.

The Moore-Penrose pseudoinverse allows us to make some headway in these cases. The pseudoinverse of A is defined as a matrix:

$$A^+=\lim_{\alpha \rightarrow 0} (A^TA + \alpha I)^-1A^T$$

迹操作

The trace operator gives the sum of all of the diagonal entries of a matrix:

$$ Tr(A) = \sum_i A_{i,j}$$

The trace operator is useful for a variety of reasons. Some operations that are difficult to specify without resorting to summation notation can be specified using matrix products and the trace operator. For example, the trace operator provides an alternative way of writing the Frobenius norm of a matrix:

$$ ||A||_F = \sqrt{Tr(AA^T)} $$

一个矩阵乘积序列的迹等于将最后一个矩阵提到第一个位置后组成的序列的迹. 即使最后的矩阵形状不同, 只要这个操作是允许的, 都满足此性质, 即

$$ Tr(ABC) = Tr(C AB) = Tr(BC A) $$

扩展阅读

强烈推荐线性代数的本质 - 系列合集, 作者从可视化的角度由浅入深的介绍了线性代数的诸多概念, 非常有助于对线性代数中各种基础概念的直观理解.

最后更新： 2024年04月18日 13:26

原始链接： https://lizec.top/2019/01/15/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E4%B9%8B%E7%BA%BF%E6%80%A7%E4%BB%A3%E6%95%B0/

LiZeC的博客

通用性的代价是抽象