Spectral methods and SVD

Lecture 3 bis

Feb. 5th, 2020

An equaliser lecture, cont’d

Relevant source for background self-study

Ian Goodfellow, Yoshua Bengio and Aaron Courville:

Deep Learning MIT Press, 2016.

Study plan

Chapter 2 of Goodfellow et al. textbook is available.

It is a refresher of notation and Linear algebra properties, no examples.

It can be read in the background of our classes.

Phase 1: read §§ 2.1—2.7, then § 2.11.
Phase 2: read §§ 2.8—2.10

Eigendecomposition

Remeber eigenpairs?

Matrix $A$ has a real $\lambda$ and a vector $\mathbf{v}$ s.t.

A v = λ v

$A\mathbf{v} = \lambda \mathbf{v}$

$\lambda$ is an eigenvalue and $\mathbf{v}$ an eigenvector of A.

[…]

We think of A as scaling space with a factor $\lambda$ in direction $\mathbf{v}$ .

We think of A as scaling space with a factor $\lambda$ in direction $\mathbf{v}$ .

f (x) = x^{T} A x

$f(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}$

For unit vectors the max (resp. min) of $f(\cdot)$ corresponds to $\lambda_1$ (resp. $\lambda_n$ ).

Decomposition of the “matrix” effect

Let A have n

linearly-independent e-vector $\{ \mathbf{v}^{(1)}\dots \mathbf{v}^{(n)}\}$
corresponding e-values $\{\lambda_1, \dots \lambda_n\}$

then

A = V d i a g (λ) V^{- 1}

$A = V \rm{diag}(\mathbf{\lambda}) V^{-1}$

where $V = [\mathbf{v}^{(1)}\dots ]$ and $\mathbf{\lambda} = [\lambda_1\dots ]$ .

Conventionally, $\lambda_1 \ge \lambda_2 \dots$

d i a g (λ) = (\begin{matrix} λ_{1} & 0 & \dots \\ 0 & λ_{2} & \dots \\ ⋮ & ⋮ & ⋱ \end{matrix})

$\rm{diag}(\mathbf{\lambda}) = \begin{pmatrix} \lambda_1 & 0 & \dots \\ 0 & \lambda_2 & \dots\\ \vdots & \vdots & \ddots \end{pmatrix}$

A general form for real symmetric M.

A = Q Λ Q^{T}

$A = Q \Lambda Q^T$

where Q is an orthogonal matrix of e-vectors and $\Lambda$ is a diagonal m.

For repeated $\lambda$ values the decomposition is not unique.

SVD

Definition

Singular-value decomp. generalises eigen-decomp.:

any real matrix has one
even non-square m. admit one

A = V d i a g (λ) V^{- 1}

$A = V \rm{diag}(\mathbf{\lambda}) V^{-1}$

A_{(n \times m)} = U_{(m \times m)} D_{(m \times n)} V_{(n \times n)}^{T}

$A_{(n \times m)} = U_{(m \times m)} D_{(m \times n)} V^T_{(n \times n)}$

U is a orthogonal m. of left-singular (col.) vectors
D is a diagonal matrix of singular values
V is a orthogonal m. of right-singular (col.) vectors

Where does all this come from?

Some heavy-lifting…

cols. of U are e-vectors of $AA^T$
$D_{ii}=\sqrt{\lambda_i}$ the i-th e-value of $A^T A$
cols. of V are e-vectors of $A^T A$

Moore-Penrose pseudoinverse

Motivations

solve linear systems

A x = y

$A\mathbf{x} = \mathbf{y}$

with non-square matrices:

n >m: the problem is overconstrained (no solution?)
n < m: the problem is overparametrized (many sols.?)

Ideal procedure: $n=m$ , $|A|\neq 0$

A x = y

$A\mathbf{x} = \mathbf{y}$

A^{- 1} A x = A^{- 1} y

$A^{-1}A\mathbf{x} = A^{-1} \mathbf{y}$

I x = A^{- 1} y

$I\mathbf{x} = A^{-1} \mathbf{y}$

Compute once, run for different __y__s.

Definition

A^{+} = lim_{α \to 0} (A^{T} A + α I)^{- 1} A^{T}

$A^+ = \lim_{\alpha\rightarrow 0} (A^TA + \alpha I)^{-1}A^T$

Verification: does $A^+ A \approx I$ ?

A^{+} A = \frac{A^{T}}{A^{T} A + α I} A = \frac{A^{T} A}{A^{T} A + α I} \approx \frac{I}{I + \frac{α I}{A^{T} A}}

$A^+ A = \frac{A^T}{A^T A + \alpha I} A = \frac{A^T A}{A^T A + \alpha I} \approx \frac{I}{I + \frac{\alpha I}{A^T A}}$

for the decomposition

A = U D V^{T}

$A = UDV^T$

A^{+} = V D^{+} U^{T}

$A^+ = VD^+ U^T$

where $D^+$ is easy to calculate: D is diagonal.

Does $A^+ A \approx I$ ?

Yes, because U and V are s. t. $U^T U =I$ .

V D^{+} U^{T} \cdot U D V^{T}

$VD^+ U^T \cdot UDV^T$

V D^{+} I D V^{T}

$VD^+ I DV^T$

V D^{+} D V^{T}

$VD^+ DV^T$

V I V^{T} (why?) = V V^{T} = I

$V I V^T \hbox{(why?)} = V V^T = I$

Spectral methods and SVD Lecture 3 bis Feb. 5th, 2020