Data Science as Linear Algebra

Study plan

Chapter 2 of Goodfellow et al. textbook is available.

It is a refresher of notation and Linear algebra properties, no examples.

It can be read in the background of our classes.

Phase 1: read §§ 2.1—2.7, then § 2.11.
Phase 2: read §§ 2.8—2.10

Data Science as Linear Algebra

Q: given a user’s declared appreciation of Science fiction films, how could this be distributed to the films they have reviewed?

A system of linear equations:

a_{i 1} x_{1} + a_{i 2} x_{2} + \dots a_{i n} x_{n} = b_{i}

$a_{i1} x_1 + a_{i2} x_2 + \dots a_{in} x_n = b_i$

A x = b

$A \mathbf{x} = \mathbf{b}$

Interpretation: how much seeing a specific films contributed to determine the user’s appreciation for the Sci-Fi genre.

Data Science as Geometry

Each user experience is a vector and a point in the [hyper]space of possible film experiences.

e.g., Jill = $<0, 0, 0, 4, 4>$

Q: Can the given experiences be combined to yield a specific point $\mathbf{b}$ : a rating of how much each user likes Sci-Fi?

Geometry sees vectors (user experiences) as axes of a reference system that spans a space of possible ratings.

This is possible only if at least n vectors are independent from each other, i.e., orthogonal.

non-independence example: The Jason Bourne saga

$U = \{alb, ale\}$ , $F = \{Bourne-1, Bourne-2, \dots\}$

A_{B o u r n e} (\begin{matrix} 4 & 4 & 4 & 0 & 3 \\ 2 & 2 & 1 & 0 & 1 \end{matrix})

$A_{Bourne} \begin{pmatrix} 4 & 4 & 4 & 0 & 3\\ 2 & 2 & 1 & 0 & 1 \end{pmatrix}$

Background: determinant

Determinant: understand matrix as an area

$|A|= 0$ means that

column vectors are not independent;
$A$ does not admit a unique inverse, and
it is not amenable to further processing.

In practice, we use the square $A^\prime = A^T\cdot A$

transpose

a = np.array([[a11, a12], [a21, a22],  [a31, a32]])

at = zip(*a)

Matrix rank

The Rank of a matrix A is the dimension of the vector space generated by its columns. It corresponds to

the maximal number of linearly-independent columns
the dimension of the space spanned by the rows

Matrix inversion

$A^{-1} \cdot A =I$

where $I$ or $U$ is the unit matrix ( myI = np.eye(n)).

Matrix inversion is a delicate process:

The inverse may not exist, or be non-unique.
it might have numerical issues, so $A^{-1}\cdot A$ only $\approx I$ .

i = np.eye(2)

m = np.array([[4, 4, 4, 0, 2], [2, 2, 2, 0, 1]])

mt = m.transpose()

m1 = m.dot(mt)

if (np.linalg.det(mprime)):

    m1_inv =  np.linalg.inv(m1)

    m1_dot_m1_inv = m1.dot(m1_inv)

    print(np.allclose(m1_dot_m1_inv, i)) # handles inf and tiny vals

A.dot(B) == A @ B # tensor product

A.dot(B) != A * B # element-wise product

Data Science as Linear Algebra

An equaliser lecture

Relevant source for background self-study

Motivations

From tables to matrices

Let’s try again:

A geometric interpretation

Spectral Methods

What is it?

Example: Spectral graph theory

Applications?

Example: Spectral ranking

The GBC issue

Study plan

Data Science as Linear Algebra

Data Science as Geometry

Background: determinant

Matrix rank

Matrix inversion

Spectral Analysis

Eigenpairs