Classification in Python: the k-NN solution

[email protected]

Data Science: Techniques and Applications (DSTA)

January 29th, 2020

Plan of the lab experience

  • Introduction (5 min.)

  • The k-NN algorithm (5 min.)

  • Splitting a test dataset in pure Python (10 min.)

  • Code Euclidean norm in pure Python (5 min.)

  • Develop a classifier for Iris (10 min.)

  • Solution (5 min.)

Development of a k-NN classifer in Python

  • Split the data-set into training(75%) and test(25%)

  • Run the k-NN classification algorithm

  • Plot output from the classifier

  • Find the basic machinery from iris_k-NN_base.py

Task 1: Data-set split

def train_and_test_split(N, test_fraction=None):
#Hint:use np.random function to generate an array of True and False
inputs=np.array([[1, 2], [3, 4],[5,6]])
train_part=[True, False, True]
KNN classification algorithm

KNN algorithm

Display from Tan et al., 2014.

Task 2: Find distance between two vectors

def euclidean_distance(v1, v2, n):
#distance-- a scalar value of euclidian distance between two vectors
distance = 0
%%TODO: write codes to find distances between two vectors
return distance
Task 3: Get prediction

def classify(neighbors, train_targets):
# %% Write codes to classify
#return the majority class
return predicted_class
Actual classification Predicted classification
scatter_plot_actual_classification scatter_plot_predicted_classification
To complete...

Use the basic entropy module seen in class to compute the quality of your solution

File entropy.py is for download from the class repository

import entropy as ent
# ...
myeta = ent.H(myclass, target)
