+ - 0:00:00
Notes for current slide
Notes for next slide

Linear Algebra, Wine Dataset and Pandas

January 22nd, 2019

Data Science: Techniques and Applications (DSTA)

1 / 7

Linear Algebra

  • Consider the following symmetric matrix:
import numpy as np
matrix=np.array([[43.81,50.01,47.64,36.74,42.00],
[50.01,57.22,54.53,41.66,48.22],
[47.64,54.53,51.97.39.64,45.98],
[36.74,41.66,39.64,31.40,34.44],
[42.00,48.22,45.98,34.64,40.84]
])

Exercises

  • Use transpose function from Numpy to transpose the matrix (hint: m.T).

  • Find the eigenvalues and eigenvectors from this matrix using Numpy's "linalg" function.

2 / 7

Wine Review Dataset

  • The dataset can be found from: https://www.kaggle.com/zynicide/wine-reviews

  • 130k wine reviews with variety, location, winery, price, and description.

  • We are going to use JSON file format in this part of the lab.

  • JSON is a text file where data is stored in nested key-value pair. Much analogous to the python dictionaries.

  • JSON data can be loaded using json package.

  • We are going to use Pandas instead of Csv functions.

3 / 7

DataFrame from Pandas

  • The DataFrame object in Pandas is really a dictionary with values appended as list objects.

. . .

A simple example will reveal the fact.

import pandas as pd
new_dataframe=pd.DataFrame({'column1':[1,2,3,4,5],
'column2':[2.3,4.5,6.7,6.5,5.5]})
print(new_dataframe)
4 / 7

Importing json objects into the dataframe object

import json
import pandas as pd
#package for flattening json in pandas df
from pandas.io.json import json_normalize
#load json object
with open('winemag-data-130k-v2.json') as f:
d = json.load(f)
#lets put the data into a pandas df
#d[0] points to the first datapoint
nycphil = json_normalize(d[0])
nycphil.head()
import pandas as pd
df = pd.read_json("https://www.dcs.bbk.ac.uk/"~abulhasan/dsta/wine-reviews/winemag-data-130k-v2.json")
5 / 7

DataFrame Illustration

  • Pandas DataFrame object takes care of missing data points in the raw data.
data frame
DataFrame Padding from: Data Camp
6 / 7

Array and DataFrame

  • DataFrame object can be created from array and be manipulated like arrays in Numpy.
import numpy as np
import pandas as pd
data = np.array([['','Col1','Col2'],
['Row1',1,2],
['Row2',3,4]])
print(pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:]))
7 / 7

Linear Algebra

  • Consider the following symmetric matrix:
import numpy as np
matrix=np.array([[43.81,50.01,47.64,36.74,42.00],
[50.01,57.22,54.53,41.66,48.22],
[47.64,54.53,51.97.39.64,45.98],
[36.74,41.66,39.64,31.40,34.44],
[42.00,48.22,45.98,34.64,40.84]
])

Exercises

  • Use transpose function from Numpy to transpose the matrix (hint: m.T).

  • Find the eigenvalues and eigenvectors from this matrix using Numpy's "linalg" function.

2 / 7
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow