If you like this presentation – show it...
Slide 0
data science @ The New York Times
chris.wiggins@columbia.edu
chris.wiggins@nytimes.com
@chrishwiggins
references: bit.ly/brownrefs
Slide 1
data science @ The New York Times
Slide 2
data science @ The New York Times
Slide 3
“data science”
jobs, jobs, jobs
Slide 4
“data science”
jobs, jobs, jobs
Slide 5
data science: mindset & toolset
drew conway, 2010
Slide 6
modern history:
2009
Slide 7
modern history:
2009
Slide 8
“data science”
ancient history: 2001
Slide 9
“data science”
ancient history: 2001
Slide 10
data science
context
Slide 11
home schooled
Slide 12
B.A. & M.Sc. from Brown
Slide 13
PhD in topology
Slide 14
“By the end of late 1945, I was a
statistician rather than a topologist”
Slide 15
invented: “bit”
Slide 16
invented: “software”
Slide 17
invented: “FFT”
Slide 18
“the progenitor of data science.”  @mshron
Slide 19
“The Future of Data Analysis,” 1962
John W. Tukey
Slide 20
introduces:
“Exploratory data anlaysis”
Slide 21
Tukey 1965, via John Chambers
Slide 22
TUKEY BEGAT S WHICH BEGAT R
Slide 23
Tukey 1972
Slide 24
In 1975, while at Princeton, Tufte was asked to teach a
statistics course to a group of journalists who were visiting
the school to study economics. He developed a set of
readings and lectures on statistical graphics, which he
further developed in joint seminars he subsequently taught
with renowned statistician John Tukey (a pioneer in the ﬁeld
of information design). These course materials became the
foundation for his ﬁrst book on information design, The
Visual Display of Quantitative Information
Tukey 1975
Slide 25
TUKEY BEGAT VDQI
Slide 26
Tukey 1977
Slide 27
TUKEY BEGAT EDA
Slide 28
fast forward > 2001
Slide 29
“The primary agents for change should be
university departments themselves.”
Slide 30
data science histories York Times
@ The New
1. slow burn @Bell: as heretical
statistics (see also Breiman)
2. caught fire 2009now: as job
description
historical rant: bit.ly/datarant
Slide 31
biology: 1892 vs. 1995
Slide 32
biology: 1892 vs. 1995
biology changed for good.
Slide 33
biology: 1892 vs. 1995
new toolset, new mindset
Slide 34
genetics: 1837 vs. 2012
ML toolset; data science mindset
Slide 35
genetics: 1837 vs. 2012
Slide 36
genetics: 1837 vs. 2012
ML toolset; data science mindset
arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost
Slide 37
data science: mindset & toolset
Slide 38
1851
Slide 39
news: 20th century
church
state
Slide 40
church
Slide 41
church
Slide 42
church
Slide 43
news: 20th century
church
state
Slide 44
news: 21st century
church
state
engineering
Slide 45
newspapering: 1851 vs. 1996
1851
1996
Slide 46
example:
millions of views per hour
2015
Slide 47
Slide 48
"...social activities generate large quantities of potentially
valuable data...The data were not generated for the
purpose of learning; however, the potential for learning
is great’’
Slide 49
"...social activities generate large quantities of potentially
valuable data...The data were not generated for the
purpose of learning; however, the potential for learning
is great’’  J Chambers, Bell Labs,1993
Slide 50
data science: the web
Slide 51
data science: the web
is your “online presence”
Slide 52
data science: the web
is a microscope
Slide 53
data science: the web
is an experimental tool
Slide 54
newspapering: 1851 vs. 1996 vs. 2008
1851
1996
2008
Slide 55
“a startup is a temporary organization in search of a
repeatable and scalable business model” —Steve Blank
Slide 56
every publisher is now a startup
Slide 57
every publisher is now a startup
Slide 58
Slide 59
news: 21st century
church
state
engineering
Slide 60
news: 21st century
church
state
engineering
Slide 61
learnings
Slide 62
learnings

predictive modeling
descriptive modeling
prescriptive modeling
Slide 63
(actually ML, shhhh…)

(supervised learning)
(unsupervised learning)
(reinforcement learning)
Slide 64
learnings

predictive modeling
descriptive modeling
prescriptive modeling
cf. modelingsocialdata.org
Slide 65
predictive modeling, e.g.,
cf. modelingsocialdata.org
Slide 66
predictive modeling, e.g.,
“the funnel”
cf. modelingsocialdata.org
Slide 67
super cool stuff
interpretable predictive modeling
cf. modelingsocialdata.org
Slide 68
super cool stuff
interpretable predictive modeling
cf. modelingsocialdata.org
arxiv.org/abs/qbio/0701021
Slide 69
optimization & learning, e.g.,
“How The New York Times Works “popular mechanics, 2015
Slide 70
(some moneys)
optimization & prediction, e.g.,
(some models)
“How The New York Times Works “popular mechanics, 2015
Slide 71
recommendation as predictive modeling
Slide 72
recommendation as predictive modeling
bit.ly/AlexCTM
Slide 73
descriptive modeling, e.g,
cf. daeilkim.com ; import bnpy
Slide 74
modeling your audience
bit.ly/HughesKimSudderthAISTATS15
Slide 75
modeling your audience
(optimization, ultimately)
Slide 76
modeling your audience
also allows insight+targeting as inference
Slide 77
prescriptive modeling
Slide 78
prescriptive modeling
cf. modelingsocialdata.org
Slide 79
prescriptive modeling
aka “A/B testing”;
RCT
cf. modelingsocialdata.org
Slide 80
prescriptive modeling, e.g,
Slide 81
prescriptive modeling, e.g,
Slide 82
prescriptive modeling, e.g,
Slide 83
descriptive:
predictive:
Explore
Learning
Test
prescriptive:
Optimizing
Reporting
Slide 84
descriptive:
predictive:
Explore
Learning
Test
prescriptive:
Optimizing
Reporting
Slide 85
common requirements in
data science:
Slide 86
common requirements in
data science:
1. people
2. ideas
3. things
cf. John Boyd, USAF
Slide 87
data science: ideas
Slide 88
data skills
data science and…

data
data
data
data
engineering
embeds
product
multiliteracies
cf. “data scientists at work”, ch 1
Slide 89
data science: ideas

new mindset > new toolset
Slide 90
data science: people
Slide 91
thanks to the data science team!
Slide 92
data science @ The New York Times
chris.wiggins@columbia.edu
chris.wiggins@nytimes.com
@chrishwiggins
references: bit.ly/brownrefs
Slide 93