+ - 0:00:00
Notes for current slide
Notes for next slide

Decision trees in Python

[email protected]

Data Science: Techniques and Applications (DSTA)

February 5th, 2020

1 / 9

Decision-trees: keywords

Discretization, iterated binary segmentation, misclassification ...

FP example

2 / 9

Decision-trees, cont'd

Purity, Entropy, Information Gain, root-to-leaf ...

FP example, 2

3 / 9

A UCI dataset on Bank note authentication

UCI page

Four numerical values from analysis of Wavelet transformation:

  • variance, skewness, curtosis and

  • entropy of image.

One (integer) classification value: class

4 / 9

Decision-tree Classification

  • As an alternative to Entropy, we use Gini index for assessing purity of segments (a further example is here.)

gini equation

5 / 9

Decision-tree Classification, II

  • As an alternative to Entropy, we use Gini index for assessing purity of segments (a further example is here.)

gini equation

gini example

6 / 9

Decision-tree Classification, III

  • As an alternative to Entropy, we use Gini index for assessing purity of segments (a further example is here.)

gini equation

gini example

gini example

7 / 9

Exercise

  1. Dowload and inspect a Decision-tree baseline code: dtree-baseline.py

  2. Lay out and code functions that segment the data according to the best Gini values available.

def get_split(dataset):
b_index, b_value, b_score, b_groups = 999, 999, 999, None
# TODO: Find the best possible place to split the dataset
#
# TODO: assign datapoints to 'left' and 'right' segments
# using the test_split(index, value, dataset)
# function.
# TODO: define a gini_index(groups, classes)
# func. to construct a branch of the tree
return {'index':b_index, 'value':b_value, 'groups':b_groups}

Remember: Gini=0 is the best scenario

8 / 9

Exercise, cont'd

  • Calculate the Gini index for a split dataset
def gini_index(groups, classes):
total_gini = 0.0
# TODO : For each group calculate the gini index.
return total_gini
9 / 9

Decision-trees: keywords

Discretization, iterated binary segmentation, misclassification ...

FP example

2 / 9
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow