Class 3
February 5th, 2020
Ratings are the solution
The data that drives ratings is point difference.
Duke, Miami, U of North Carolina, U of Virginia, Virginia Tech (plus Georgia Tech and Pittsburgh now)
is point difference so clearly a reflection of the skills gap?
what if, in a short season, the final result is the only goal?
Simple
captures long-term trend (but not decay)
ignores strenght of the opponents
cold-start problem:
Ratings of 0 or 1 are not reachable.
Colley ratings can be rephrased to contain opponents’ strengt as one of the factors.
since at the start, each
As we progress
so we can interpret
Let’s compute all
C and b are defined to reflect Colley’s rating formula
Essentially
Now set
Team |
|
Colley | Massey |
Miami | .79 | 1st | = |
VT | .65 | 2nd | = |
UNC | .50 | 3rd | 4th |
UVA | .36 | 4th | 3rd |
Duke | .21 | 5th | = |
Laplace correction
winning-only in fact includes the strenghts of the opponents
the total strenght of the league tends to remains constant
latent variables represent non-measurable skills
they live in a feature space, possibly separated from the traditional data space
yet they may get a numeric estimate, and inform our predictions
M. and C. regress on the latent variable strenght
Colley can run with Massey’s points balances (and v. v.)
Both methods can be applied to collaborative filtering.