Understanding Feature Space in Machine Learning

If you like this presentation – show it...

Slide 0

Understanding Feature Space in Machine Learning Alice Zheng, Dato September 9, 2015 1

Slide 1

My journey so far Applied machine learning (Data science) Build ML tools

Slide 2

Why machine learning? Model data. Make predictions. Build intelligent applications.

Slide 3

The machine learning pipeline I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, … Raw data Features

Slide 4

Feature = numeric representation of raw data

Slide 5

Representing natural text It is a puppy and it is extremely cute. What’s important? Phrases? Specific words? Ordering? Subject, object, verb? Classify: puppy or not? Raw Text

Slide 6

Representing natural text It is a puppy and it is extremely cute. Classify: puppy or not? Raw Text Sparse vector representation

Slide 7

Representing images Image source: “Recognizing and learning object categories,” Li Fei-Fei, Rob Fergus, Anthony Torralba, ICCV 2005—2009. Raw image: millions of RGB triplets, one for each pixel Raw Image

Slide 8

Representing images Raw Image Deep learning features 3.29 -15 -5.24 48.3 1.36 47.1 -1.9236.5 2.83 95.4 -19 -89 5.09 37.8 Dense vector representation

Slide 9

Feature space in machine learning Raw data ? high dimensional vectors Collection of data points ? point cloud in feature space Model = geometric summary of point cloud Feature engineering = creating features of the appropriate granularity for the task

Slide 10

Crudely speaking, mathematicians fall into two categories: the algebraists, who find it easiest to reduce all problems to sets of numbers and variables, and the geometers, who understand the world through shapes. -- Masha Gessen, “Perfect Rigor”

Slide 11

Algebra vs. Geometry a b c a2 + b2 = c2 Algebra Geometry (Euclidean space)

Slide 12

Visualizing a sphere in 2D x2 + y2 = 1

Slide 13

Visualizing a sphere in 3D x2 + y2 + z2 = 1 x y z 1 1 1

Slide 14

Visualizing a sphere in 4D x2 + y2 + z2 + t2 = 1 x y z 1 1 1

Slide 15

Why are we looking at spheres? = = = = Poincare Conjecture: All physical objects without holes is “equivalent” to a sphere.

Slide 16

The power of higher dimensions A sphere in 4D can model the birth and death process of physical objects Point clouds = approximate geometric shapes High dimensional features can model many things

Slide 17

Visualizing Feature Space

Slide 18

The challenge of high dimension geometry Feature space can have hundreds to millions of dimensions In high dimensions, our geometric imagination is limited Algebra comes to our aid

Slide 19

Visualizing bag-of-words I have a puppy and it is extremely cute

Slide 20

Visualizing bag-of-words puppy cute 1 1 1 extremely

Slide 21

Document point cloud word 1 word 2

Slide 22

What is a model? Model = mathematical “summary” of data What’s a summary? A geometric shape

Slide 23

Classification model Feature 2 Feature 1 Decide between two classes

Slide 24

Clustering model Feature 2 Feature 1 Group data points tightly

Slide 25

Regression model Target Feature Fit the target values

Slide 26

Visualizing Feature Engineering

Slide 27

When does bag-of-words fail? puppy cat 2 1 1 have Task: find a surface that separates documents about dogs vs. cats Problem: the word “have” adds fluff instead of information 1

Slide 28

Improving on bag-of-words Idea: “normalize” word counts so that popular words are discounted Term frequency (tf) = Number of times a terms appears in a document Inverse document frequency of word (idf) = N = total number of documents Tf-idf count = tf x idf

Slide 29

From BOW to tf-idf puppy cat 2 1 1 have idf(puppy) = log 4 idf(cat) = log 4 idf(have) = log 1 = 0 1

Slide 30

From BOW to tf-idf puppy cat 1 have tfidf(puppy) = log 4 tfidf(cat) = log 4 tfidf(have) = 0 1 log 4 log 4 Tf-idf flattens uninformative dimensions in the BOW point cloud

Slide 31

Entry points of feature engineering Start from data and task What’s the best text representation for classification? Start from modeling method What kind of features does k-means assume? What does linear regression assume about the data?

Slide 32

That’s not all, folks! There’s a lot more to feature engineering: Feature normalization Feature transformations “Regularizing” models Learning the right features Dato is hiring! jobs@dato.com alicez@dato.com @RainyData