Machine Learning at Netflix

Понравилась презентация – покажи это...

Слайд 0

Слайд 1

Machine Learning @ Netflix (and some lessons learned) Yves Raimond (@moustaki) Research/Engineering Manager Search & Recommendations Algorithm Engineering

Слайд 2

Netflix evolution

Слайд 3

Netflix scale ● ● ● ● ● > 69M members > 50 countries > 1000 device types > 3B hours/month 36% of peak US downstream traffic

Слайд 4

Recommendations @ Netflix ● Goal: Help members find content to watch and enjoy to maximize satisfaction and retention ● Over 80% of what people watch comes from our recommendations ● Top Picks, Because you Watched, Trending Now, Row Ordering, Evidence, Search, Search Recommendations, Personalized Genre Rows, ...

Слайд 5

Models & Algorithms ▪ Regression (Linear, logistic, elastic net) ▪ SVD and other Matrix Factorizations ▪ Factorization Machines ▪ Restricted Boltzmann Machines ▪ Deep Neural Networks ▪ Markov Models and Graph Algorithms ▪ Clustering ▪ Latent Dirichlet Allocation ▪ Gradient Boosted Decision Trees/Random Forests ▪ Gaussian Processes ▪ …

Слайд 6

Some lessons learned

Слайд 7

Build the offline experimentation framework first

Слайд 8

When tackling a new problem ● ● ● What offline metrics can we compute that capture what online improvements we’ re actually trying to achieve? How should the input data to that evaluation be constructed (train, validation, test)? How fast and easy is it to run a full cycle of offline experimentations? ○ ● Minimize time to first metric How replicable is the evaluation? How shareable are the results? ○ ○ Provenance (see Dagobah) Notebooks (see Jupyter, Zeppelin, Spark Notebook)

Слайд 9

When tackling an old problem ● Same… ○ Were the metrics designed when first running experimentation in that space still appropriate now?

Слайд 10

Think about distribution from the outermost layers

Слайд 11

1. For each combination of hyper-parameter (e.g. grid search, random search, gaussian processes…) 2. For each subset of the training data a. b. Multi-core learning (e.g. HogWild) Distributed learning (e.g. ADMM, distributed L-BFGS, …)

Слайд 12

When to use distributed learning? ● The impact of communication overhead when building distributed ML algorithms is non-trivial ● Is your data big enough that the distribution offsets the communication overhead?

Слайд 13

Example: Uncollapsed Gibbs sampler for LDA (more details here)

Слайд 14

Design production code to be experimentation-friendly

Слайд 15

Example development process Idea Offline Modeling (R, Python, MATLAB, …) Data Iterate Missing postprocessing logic Data discrepancies Production environment (A/B test) Final model Actual output Performance issues Implement in production system (Java, C++, …) Code discrepancies

Слайд 16

Avoid dual implementations Experiment code Production code Experiment Production Shared Engine

Слайд 17

To be continued...

Слайд 18

We’re hiring! Yves Raimond (@moustaki)

Слайд 19