Lecture 0 bis
January 15th, 2020
Instance:
a collection (dataset) of numerical
a regressor (independent) value x
Solution: a regressand (dependent) value y that complements x
Measure: error over the collection
Instance:
a collection (dataset) of datapoints from
a classification system
Solution: classification function
Measure: misclassification
[PF] “classification predicts whether something will happen, whereas regr. predicts how much something will happen.”
Identify similar individuals based on data known about them.
Instance:
a collection (dataset) of datapoints from
(distance functions for some of the dimensions)
Solution: similarity function
[Measure: error]
group individuals in a population together by their similarity (but not driven by any specific purpose)
Instance:
a collection (dataset)
a relational structure on
a small integer
Solution: a partition of
Measure: network modularity Q: proportion of the relational structure that respects the clusters.
similarity of objects based on their appearing together in transactions.
Instance:
a collection (dataset)
a theshold
Solution: All frequent patterns: subsets that appear in
Detection version:
Market-basket analysis, (some) recommendation systems
Instance:
a user description
a stimulus
a set of possible responses
Solution: a functional reaction of u to a, i.e.,
Instance:
Question
What YouTube video will you watch next?
Alternatives: predict the strenght of the new link; l. deletion.
Instance:
a collection (dataset)
[a distinct indipendent variable
Solution: a projection of
Measure: error in the estimation of
Example: genre identification in consumer behaviour analysis
Instance:
a collection (dataset)
a distinct indipendent variable
Solution: a variable
Measure: effectiveness of
Example: Exactly What food causes you to put on weight?
Controlled clinical trials, A/B testing.
obtain a dataset of examples, inc. the “target” dimension, called label
split it in training and test data
run a. on the test data, find a putative solution
test the quality/pred. power against test data
Regression involves a numeric target while classification involves a categorical/binary one
1: Regression
2: Classification
9: Causal Modelling
3: Similarity matching,
7: link prediction,
8: data reduction
4: Clustering
5: co-occurrence grouping
6: profiling