'

Overview of Machine Learning & Feature Engineering

Понравилась презентация – покажи это...





Слайд 0

Overview of Machine Learning & Feature Engineering Machine Learning 101 Tutorial Strata + Hadoop World, NYC, Sep 2015 Alice Zheng, Dato 1


Слайд 1

About us Chris DuBois Intro to recommenders Alice Zheng Overview of ML Piotr Teterwak Intro to image search & deep learning Krishna Sridhar Deploying ML as a predictive service Danny Bickson TA Alon Palombo TA


Слайд 2

Why machine learning? Model data. Make predictions. Build intelligent applications.


Слайд 3

Classification Predict amongst a discrete set of classes 4


Слайд 4

5 Input Output


Слайд 5

Spam filtering data prediction Spam vs. Not spam


Слайд 6

Text classification EDUCATION FINANCE TECHNOLOGY


Слайд 7

Regression Predict real/numeric values 8


Слайд 8

Stock market Input Output


Слайд 9

Similarity Find things like this 10


Слайд 10

Similar products Product I’m buying Output: other products I might be interested in


Слайд 11

Given image, find similar images http://www.tiltomo.com/


Слайд 12

Recommender systems Learn what I want before I know it 13


Слайд 13

14


Слайд 14

Playlist recommendations Recommendations form coherent & diverse sequence


Слайд 15

Friend recommendations Users and “items” are of the same type


Слайд 16

Clustering Grouping similar items 17


Слайд 17

Clustering images Goldberger et al. Set of Images


Слайд 18

Clustering web search results


Слайд 19

Machine learning … how? Data Answers I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, … Many systems Many tools Many teams Lots of methods/jargon


Слайд 20

The machine learning pipeline I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, … Raw data Features Models


Слайд 21

Three things to know about ML Feature = numeric representation of raw data Model = mathematical “summary” of features Making something that works = choose the right model and features, given data and task


Слайд 22

Feature = numeric representation of raw data


Слайд 23

Representing natural text It is a puppy and it is extremely cute. What’s important? Phrases? Specific words? Ordering? Subject, object, verb? Classify: puppy or not? Raw Text


Слайд 24

Representing natural text It is a puppy and it is extremely cute. Classify: puppy or not? Raw Text Sparse vector representation


Слайд 25

Representing images Image source: “Recognizing and learning object categories,” Li Fei-Fei, Rob Fergus, Anthony Torralba, ICCV 2005—2009. Raw image: millions of RGB triplets, one for each pixel Raw Image


Слайд 26

Representing images Raw Image Deep learning features 3.29 -15 -5.24 48.3 1.36 47.1 -1.9236.5 2.83 95.4 -19 -89 5.09 37.8 Dense vector representation


Слайд 27

Feature space in machine learning Raw data ? high dimensional vectors Collection of data points ? point cloud in feature space Feature engineering = creating features of the appropriate granularity for the task


Слайд 28

Crudely speaking, mathematicians fall into two categories: the algebraists, who find it easiest to reduce all problems to sets of numbers and variables, and the geometers, who understand the world through shapes. -- Masha Gessen, “Perfect Rigor”


Слайд 29

Algebra vs. Geometry a b c a2 + b2 = c2 Algebra Geometry (Euclidean space)


Слайд 30

Visualizing a sphere in 2D x2 + y2 = 1


Слайд 31

Visualizing a sphere in 3D x2 + y2 + z2 = 1 x y z 1 1 1


Слайд 32

Visualizing a sphere in 4D x2 + y2 + z2 + t2 = 1 x y z 1 1 1


Слайд 33

Why are we looking at spheres? = = = = Poincare Conjecture: All physical objects without holes is “equivalent” to a sphere.


Слайд 34

The power of higher dimensions A sphere in 4D can model the birth and death process of physical objects High dimensional features can model many things


Слайд 35

Visualizing Feature Space


Слайд 36

The challenge of high dimension geometry Feature space can have hundreds to millions of dimensions In high dimensions, our geometric imagination is limited Algebra comes to our aid


Слайд 37

Visualizing bag-of-words I have a puppy and it is extremely cute


Слайд 38

Visualizing bag-of-words puppy cute 1 1 1 extremely


Слайд 39

Document point cloud word 1 word 2


Слайд 40

Model = mathematical “summary” of features


Слайд 41

What is a summary? Data ? point cloud in feature space Model = a geometric shape that best “fits” the point cloud


Слайд 42

Clustering model Feature 2 Feature 1 Group data points tightly


Слайд 43

Classification model Feature 2 Feature 1 Decide between two classes


Слайд 44

Regression model Target Feature Fit the target values


Слайд 45

Visualizing Feature Engineering


Слайд 46

When does bag-of-words fail? puppy cat 2 1 1 have Task: find a surface that separates documents about dogs vs. cats Problem: the word “have” adds fluff instead of information 1


Слайд 47

Improving on bag-of-words Idea: “normalize” word counts so that popular words are discounted Term frequency (tf) = Number of times a terms appears in a document Inverse document frequency of word (idf) = N = total number of documents Tf-idf count = tf x idf


Слайд 48

From BOW to tf-idf puppy cat 2 1 1 have idf(puppy) = log 4 idf(cat) = log 4 idf(have) = log 1 = 0 1


Слайд 49

From BOW to tf-idf puppy cat 1 have tfidf(puppy) = log 4 tfidf(cat) = log 4 tfidf(have) = 0 1 log 4 log 4 Tf-idf flattens uninformative dimensions in the BOW point cloud


Слайд 50

Entry points of feature engineering Start from data and task What’s the best text representation for classification? Start from modeling method What kind of features does k-means assume? What does linear regression assume about the data?


Слайд 51

Dato’s Machine Learning Platform


Слайд 52

Dato’s machine learning platform Raw data Features GraphLab Create Dato Distributed Dato Predictive Services


Слайд 53

Data structures for feature engineering Features SFrames SGraphs


Слайд 54

Machine learning toolkits in GraphLab Create Classification/regression Clustering Recommenders Deep learning Similarity search Data matching Sentiment analysis Churn prediction Frequent pattern mining And on…


Слайд 55

Demo


Слайд 56

Dimensionality reduction Feature 1 Feature 2 Flatten non-useful features PCA: Find most non-flat linear subspace


Слайд 57

PCA : Principal Component Analysis Center data at origin


Слайд 58

PCA : Principal Component Analysis Find a line, such that the average distance of every data point to the line is minimized. This is the 1st Principal Component


Слайд 59

PCA : Principal Component Analysis Find a 2nd line, - at right angles to the 1st - such that the average distance of every data point to the line is minimized. This is the 2nd Principal Component


Слайд 60

PCA : Principal Component Analysis Find a 3rd line - at right angles to the previous lines - such that the average distance of every data point to the line is minimized. … There can only be as many principle components as the dimensionality of the data.


Слайд 61

Demo


Слайд 62

Coursera Machine Learning Specialization Learn machine learning in depth Build and deploy intelligent applications Year long certification program Joint project between University of Washington + Dato Details: https://www.coursera.org/specializations/machine-learning


Слайд 63

Next up today alicez@dato.com @RainyData, #StrataConf 11:30am - Intro to recommenders Chris DuBois 1:30pm - Intro to image search & deep learning Piotr Teterwak 3:30pm - Deploying ML as a predictive service Krishna Sridhar


×

HTML:





Ссылка: