Class 5
February 26th, 2020
A n-dimensional space, populated by datapoints
The principal component is a direction along which the points line up best
As if we were mis-measuring the dimensions: an unknown (rigid) rotation of the axes would make then minimize the points distance from the axes:
distance from the new axis becomes the only needed value.
Pls see Ch. 11 of Leskovec et al. for the complete numerical examples.
Decompose the data matrix and interpret its >>0 eigenvalues as concepts/topics for user and activity classification:
Conceptual hurdle of SVD: interpretation of negative values
A nonnegative decomposition of the activity matrix would be interpretable:
user/product profiling and reccommender sys. would be half-done already!
Istance: a non-negative matrix V
Solution: non-negative matrix factors W and H s.t.
Let
each col. of the result is seen as a linear combination of the cols. of B, with
Let
If V is an activity m.,
Consumption of i is given by a linear combination of the cols. of W, with
Each
W can be regarded as containing a basis that is optimized for the linear approximation of the data in V.
Since relatively few basis vectors are used to represent many data vectors, good approximation can only be achieved if the basis vectors discover structure that is latent in the data.
[Lee & Seung, "Learning the parts of objects by non-negative matrix factorization." Nature, 1999.]
Let
Minimize
subject to
W,H > 0.
If V is n x m, choose r s.t.
For later: minimization under divergence
Let
Minimize
subject to
Iterated error balancing
through iteration the
the update rules maintain non-negativity and force
A probabilistic hidden-variables model:
Cols. of W are bases that are combined to form the rec.
Influence of
(W and H are shown in a 7x7 montage)
Eigenfaces may have negative values
N=7, M=5.
Fix K=2 and run Non-negative matrix decomposition:
@INPUT:
R: a m. to be factorized, dim. N x M
P: an initial m. of dim. N x K
Q: an initial m. of dim. M x K
K: the no. of latent features
steps: the max no. of steps to perform the optimisation
alpha: the learning rate
beta: the regularization parameter
@OUTPUT:
the final matrices P and Q
np.dot(nP, nQ.T) = [[ 1.01541991 1.00333129 0.97277743 1.06287968 1.04171324]
[ 3.01303945 2.99879446 2.96279189 3.09527589 3.0083412 ]
[ 3.95992279 3.97901104 4.02726085 3.9655638 3.80912366]
[ 4.99247343 4.98812249 4.97712927 5.07657468 4.91104751]
[ 3.52001748 3.37358497 3.00347147 3.96773585 4.01098322]
[ 4.51837154 4.37142223 4.00000329 4.98195069 4.99170315]
[ 2.10390225 2.13556026 2.21557931 2.04860435 1.94148161]]
np.rint(np(nP, nQ.T))= [[ 1. 1. 1. 1. 1.]
[ 3. 3. 3. 3. 3.]
[ 4. 4. 4. 4. 4.]
[ 5. 5. 5. 5. 5.]
[ 4. 4. 4. 4. 4.]
[ 5. 5. 5. 5. 5.]
[ 2. 2. 2. 2. 2.]]
ratings = [[1, 1, 1, 0, 0],
[3, 3, 3, 0, 0],
[4, 4, 4, 0, 0],
[5, 5, 5, 0, 0],
[0, 0, 0, 4, 4],
[0, 0, 0, 5, 5],
[0, 0, 0, 2, 2]]
try Scikit-learn, on the same instance.
W(user x topic) = [[ 0. 0.82037571]
[ 0. 2.46112713]
[ 0. 3.28150284]
[ 0. 4.10187855]
[ 1.62445593 0. ]
[ 2.03056992 0. ]
[ 0.81222797 0. ]]
W: users’ committment to a topic.
H: films’ pertinence to a specific topic (binary, why?)