'

From Idea to Execution: Spotify's Discover Weekly

Понравилась презентация – покажи это...





Слайд 0

From Idea to Execution: Spotify’s Discover Weekly Or: 5 lessons in building recommendation products at scale Chris Johnson :: @MrChrisJohnson Edward Newett :: @scaladaze DataEngConf • NYC • Nov 2015


Слайд 1

Who are We?? Chris Johnson Edward Newett


Слайд 2

Spotify in Numbers • • • • • • • Started in 2006, now available in 58 markets 75+ Million active users, 20 Million paying subscribers 30+ Million songs, 20,000 new songs added per day 1.5 Billion user generated playlists 1 TB user data logged per day 1,700 node Hadoop cluster 10,000+ Hadoop jobs run daily


Слайд 3

Challenge: 30M songs… how do we recommend music to users?


Слайд 4

Discover


Слайд 5

Radio


Слайд 6

Related Artists


Слайд 7

Discover Weekly • • • • • • • Started in 2006, now available in 58 markets 75+ Million active users, 20 Million paying subscribers 30+ Million songs, 20,000 new songs added per day 1.5 Billion user generated playlists 1 TB user data logged per day 1,700 node Hadoop cluster 10,000+ Hadoop jobs run daily


Слайд 8

The Road to Discover Weekly


Слайд 9

2013 :: Discover Page v1.0 • Personalized News Feed of recommendations • Artists, Album Reviews, News Articles, New Releases, Upcoming Concerts, Social Recommendations, Playlists… • Required a lot of attention and digging to engage with recommendations • No organization of content


Слайд 10

2014 :: Discover Page v2.0 • Recommendations grouped into strips (a la Netflix) • Limited to Albums and New Releases • More organized than News-Feed but still requires active interaction


Слайд 11

Insight: users spending more time on editorial Browse playlists than Discover.


Слайд 12

Idea: combine the personalized experience of Discover with the leanback ease of Browse


Слайд 13

Meanwhile… 2014 Year In Music


Слайд 14

Play it forward: Same content as the Discover Page but.. a playlist


Слайд 15

Lesson 1: Be data driven from start to finish


Слайд 16

2008 2012 2015 Slide from Dan McKinley - Etsy


Слайд 17

Define success metrics BEFORE you release your test • Reach: How many users are you reaching • Depth: For the users you reach, what is the depth of reach. • Retention: For the users you reach, how many do you retain?


Слайд 18

Discover Weekly Key Success Metrics • Reach: DW WAU / Spotify WAU • Depth: DW Time Spent / Spotify WAU • Retention: DW week-over-week retention


Слайд 19

2008 2012 2015 Slide from Dan McKinley - Etsy


Слайд 20

Step 1: Prototype (employee test)


Слайд 21

Step 1: Prototype (employee test)


Слайд 22

Results of Employee Test were very positive!


Слайд 23

2008 2012 2015 Slide from Dan McKinley - Etsy


Слайд 24

Step 2: Release AB Test to 1% of Users


Слайд 25

Google Form 1% Results


Слайд 26

Personalized image resulted in 10% lift in WAU • Initial 0.5% user test • 1% Spaceman image • 1% Personalized image


Слайд 27

Lesson 2: Reuse existing infrastructure in creative ways


Слайд 28

Discover Weekly Data Flow


Слайд 29

Recommendation Models


Слайд 30

Implicit Matrix Factorization •Aggregate all (user, track) streams into a large matrix •Goal: Approximate binary preference matrix by inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of plays, context, and recency as weight Users • • • • 10001001 00100100 10100011 01000100 00100100 10001001 Songs = 1 if user = user = item streamed track latent factor vector X else 0 Y • • • = bias for user = bias for item = regularization parameter latent factor vector [1] Hu Y. & Koren Y. & Volinsky C. (2008) Collaborative Filtering for Implicit Feedback Datasets 8th IEEE International Conference on Data Mining


Слайд 31

Can also use Logistic Loss! •Aggregate all (user, track) streams into a large matrix •Goal: Model probability of user playing a song as logistic, then maximize log likelihood of binary preference matrix, weighting positive observations by a function of plays, context, and recency Users • • = user = item 10001001 00100100 10100011 01000100 00100100 10001001 Songs latent factor vector latent factor vector X Y • • • = bias for user = bias for item = regularization parameter [2] Johnson C. (2014) Logistic Matrix Factorization for Implicit Feedback Data NIPS Workshop on Distributed Matrix Computations


Слайд 32

NLP Models on News and Blogs


Слайд 33

NLP Models work great on Playlists! Playlist itself is a document Songs in playlist are words


Слайд 34

Deep Learning on Audio [3] http://benanne.github.io/2014/08/05/spotify-cnns.html


Слайд 35

Songs in a Latent Space representation •normalized item-vectors


Слайд 36

Songs in a Latent Space representation •user-vector in same space


Слайд 37

Lesson 3: Don’t scale until you need to


Слайд 38

Scaling to 100%: Rollout Challenges ‣Create and publish 75M playlists every week ‣Downloading and processing Facebook images ‣Language translations


Слайд 39

Scaling to 100%: Weekly refresh ‣Time sensitive updates ‣Refresh 75M playlists every Sunday night ‣Take timezones into account


Слайд 40

Discover Weekly publishing flow


Слайд 41


Слайд 42


Слайд 43


Слайд 44

What’s next? Iterating on content quality and interface enhancements


Слайд 45

Iterating on quality and adding a feedback loop.


Слайд 46

DW feedback comes at the expense of presentation bias.


Слайд 47

Lesson 4: Users know best. In the end, AB Test everything!


Слайд 48

Lesson 5 (final lesson!): Empower bottom-up innovation in your org and amazing things will happen.


Слайд 49

Thank You! (btw, we’re hiring Machine Learning and Data Engineers, come chat with us!)


Слайд 50


×

HTML:





Ссылка: