Lecture 0 tris
January 15th, 2020
Ian Goodfellow, Yoshua Bengio and Aaron Courville:
Deep Learning MIT Press, 2016.
Activity tables show how users map their choices or, viceversa, how available products map onto their adopters.
Essentially, a weighted binary relationship between users and films…
We are forgetting the mapping Joe
Mathematicians see matrix B as representing a linear transformation between two linear spaces.
It is just one of the possible representations of the map—it depends on a choice for the bases for source and target (space).
Now we can apply the full machinery of Linear Algebra (and Geometry) and see what happens.
techniques that apply the theory of linear maps (in particular, eigenvalues and eigenvectors) to matrices that do not represent geometric transformations, but rather some kind of relationship between entities (e.g., users and films).
When a square matrix represents relationships between entities, such as endorsement among persons, teams defeating other teams, friends or followers on social networks, and so on, several different eigenvectors can be obtained from the original matrix, giving rise to different kinds of spectral rankings
It provides bounds for several graph features using eigenvalues of adjacency matrices.
Well, Google PageRank algorithm is spectral graph analysis.
Early applications in Psychology, Social sciences, Bibliometrics, Economy, and Choice theory.
Let
[Seely, 1949] created an index of likeability based on the ideas of diffusion: it is important to be liked by people who in turn are well-liked and so on.
I.e., my likeability index should recursively be equal to the weighted sum of of the indices of the people who like me.
my likeability index should recursively be equal to the weighted sum of of the indices of the people who like me.
Or, let’s use row vectors
i.e.,
It may have no solution, but matrix preprocessing can assure one exists.
Chapter 2 of Goodfellow et al. textbook is available.
It is a refresher of notation and Linear algebra properties, no examples.
It can be read in the background of our classes.
Phase 1: read §§ 2.1—2.7, then § 2.11.
Phase 2: read §§ 2.8—2.10
Q: given a user’s declared appreciation of Science fiction films, how could this be distributed to the films they have reviewed?
A system of linear equations:
Interpretation: how much seeing a specific films contributed to determine the user’s appreciation for the Sci-Fi genre.
Each user experience is a vector and a point in the [hyper]space of possible film experiences.
e.g., Jill =
Q: Can the given experiences be combined to yield a specific point
Geometry sees vectors (user experiences) as axes of a reference system that spans a space of possible ratings.
This is possible only if at least n vectors are independent from each other, i.e., orthogonal.
non-independence example: The Jason Bourne saga
Determinant: understand matrix as an area
column vectors are not independent;
In practice, we use the square
The Rank of a matrix A is the dimension of the vector space generated by its columns. It corresponds to
the maximal number of linearly-independent columns
the dimension of the space spanned by the rows
where
Matrix inversion is a delicate process:
The inverse may not exist, or be non-unique.
it might have numerical issues, so
Matrix
In principle, if
In practice,
they might not be real, nor
are always costly to find.
A square matrix
In such case its eigenvalues are non-negative: