+ - 0:00:00
Notes for current slide
Notes for next slide

Exercise on MNIST Dataset

[email protected]

Data Science: Techniques and Applications (DSTA)

March 4th, 2020

1 / 7

MNIST 784

  • A set of 70,000 small images of handwritten digits.
  • Each image is labelled with the digit it represents.
  • This data set is often called as "hello world" of Machine Vision.

Loading the data

  • Loading the dataset using sklearn package is as simple as this:
from sklearn.datasets import fetch_openml
mnist=fetch_openml('mnist_784', version=1)
2 / 7

Image display

  • We can display the image from vectors.
  • We must reshape the image vector to a matrix.
import matplotlib as mpi
import matplotlib.pyplot as plt
X,y=mnist['data'], mnist['target']
some_digit=X[0]
some_digit_image=some_digit.reshape(28,28)
plt.imshow(some_digit_image, cmap="binary")
plt.axis("off")
3 / 7

Binary classification : 5 or not 5 classifier

  • an example of one vs. rest training and evaluation

  • First divide the dataset into train and test set.

  • Then cast the labels into two classes: 5 and not 5

  • Use sklearn.linear_model.SGDClassifier as binary classier.

  • The module uses stochastic gradient descent to train a SVM classifier.

y=y.astype(np.uint8)
X_train, X_test, y_train, y_test=X[:60000], X[60000:], y[:60000], y[60000:]
y_train_5=(y_train==5)
y_test_5=(y_test==5)
from sklearn.linear_model import SGDClassifier
sgd_clf=SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train_5)
4 / 7

Evaluation

  • Sklearn package provides module to evaluate classifier's accuracy, precision and F-score.

  • Lets calculate the accuracy on a three-fold cross-validation.

from sklearn.model_selection import cross_val_score
cross_val_score(sgd_clf,X_train, y_train_5, cv=3, scoring="accuracy")
  • Run a dumb classifier:
from sklearn.base import BaseEstimator
class NeverSclassifier(BaseEstimator):
def fit(self, x, y=None):
pass
def predict(self, X):
return np.zeros((len(X),1),dtype=bool)
never_S_clf=NeverSclassifier()
cross_val_score(never_S_clf,X_train, y_train_5, cv=3, scoring="accuracy")
5 / 7

The SoftMax logistic regression

We will use SoftMax regression as a multiclass classifier :

p(y=i|x;W)=ewiTxj=09ewjT,
Where p(y=i|x;W) is the probability that input x is the i-th digit, i[0,9]. We can use this information for prediction by taking maximum probability:
ypred=argmaxip(y=i|x)

6 / 7

Exercise

7 / 7

MNIST 784

  • A set of 70,000 small images of handwritten digits.
  • Each image is labelled with the digit it represents.
  • This data set is often called as "hello world" of Machine Vision.

Loading the data

  • Loading the dataset using sklearn package is as simple as this:
from sklearn.datasets import fetch_openml
mnist=fetch_openml('mnist_784', version=1)
2 / 7
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow