The Art and Science of Data-Driven Journalism

If you like this presentation – show it...

Slide 0

The Art and Science of Data-Driven Journalism Alexander B. Howard Tow Fellow, Columbia University May 30, 2014

Slide 1

You know something, John Snow.

Slide 2

This John Snow knew something.

Slide 3

Newspapers have used data for centuries Source: The Guardian

Slide 4

1960s: computer-assisted reporting (CAR) Bob Woodward, via Cliff1066

Slide 5

Traditional tools applying tech to journalism… Calculators and Graphs Mainframe and PCs Spreadsheets Databases Text and code editors Statistics Programming

Slide 6

In the 1990s, government and civil society spread the Internet globally

Slide 7

In the 2000s, mobile phones and social networking connected us ever more

Slide 8

In the 2010s, data creation exploded. Image Credit: Real Time Rome from Senseable.MIT.edu

Slide 9

“Data-driven journalism is the future” Source: Tim Berners-Lee in the Guardian

Slide 10

…combined with new tools & context… Online spreadsheets and wikis Data visualization tools Open source frameworks Code sharing Agile development Cloud storage and processing (EC2 & Heroku) More data and more access Privacy and security riskss

Slide 11

2014: data journalism is the present Gathering, cleaning, organizing, analyzing, visualizing and publishing data to support the creation of acts of journalism

Slide 12

Slide 13

Trendy but not new The collection, protection and interrogation of data as a source, complementing traditional “shoe leather” investigative reporting relying on witnesses, experts and authorities

Slide 14

Slide 15

Dollars for Docs

Slide 16

The Guardian

Slide 17

Chicago Tribune Flame retardants

Slide 18

Slide 19

A tangled web

Slide 20

Slide 21

Los Angeles Times

Slide 22

Slide 23

La Nacion

Slide 24

Reuters: Connected China

Slide 25

Slide 26

Slide 27

Slide 28

Best practices?

Slide 29

Report it out

Slide 30

Slide 31

Show people something new about the world

Slide 32

Slide 33

Tell a story

Slide 34

Center for Public Integrity

Slide 35

Storytelling still matters. “We use these tools to find and tell stories. We use them like we use a telephone. The story is still the thing.” - Anthony DeBarros USA Today Source: Data Journalism and the Big Picture

Slide 36

Make it personal

Slide 37

Slide 38

Understand the context for the data

Slide 39

Slide 40

Show your data

Slide 41

Slide 42

Show your work

Slide 43

Slide 44

Share your code

Slide 45

Slide 46

Consider ethics

Slide 47

Questions Is the data clean? Is the data representative? What biases might be hidden in the data? Was the data legally obtained? Does the data contain personally identifiable information (PII)?

Slide 48

Collection Who gathered the data? How? Was it clear how data would be used? Can people opt-out of collection or usage? “Notice and consent” is not enough “Privacy by design” applies to news apps

Slide 49

Slide 50

Data Analysis & Numeracy N = ? Average vs Median Statistical significance? Correlation != causation Regression to the mean

Slide 51

Slide 52


Slide 53

Bad Data Viz wtfviz.net

Slide 54

Present data with context, in context

Slide 55

Be aware of de-anonymization risks

Slide 56

Emerging trends

Slide 57


Slide 58

Networked reporting of corruption ICIJ: Offshore Leaks

Slide 59

International Consortium of Investigative Journalists Offshoring $ 80 journalists 40 countries 260 gigabytes 2.5 million files

Slide 60

Create your data “If Stage 1 of data journalism was “find and scrape data,” then… Stage 2 was “ask government agencies to release data” in easy to use formats. Stage 3 is going to be “make your own data”, and those sources of data are going to be automated and updated in real-time.” -Javaun Moradi, Mozilla

Slide 61

Safecast open source Geiger counter

Slide 62

Networked accountability

Slide 63

Bus route in Nairobi, Kenya

Slide 64

Sensor Journalism

Slide 65

Slide 66

Slide 67

Citizens as Sensors: Andhra Pradesh

Slide 68

Drones + data collection

Slide 69

Privacy challenges

Slide 70

Slide 71

Open Data, FOIA & Press Freedom

Slide 72

An expanding number of data sources

Slide 73

Slide 74

Slide 75

Social data and crisis data

Slide 76

Open government data platforms

Slide 77

Slide 78

Slide 79

Fauxpen Data In an age of “openwashing”… We need to: Evaluate licenses. Peruse the Terms of Service. Review the governance. Look at community. Check the format.

Slide 80

Slide 81

Slide 82

Center for Public Integrity

Slide 83

Accountability for “personalized redlining” Gun map graphic

Slide 84

Transparency for geographic profiling Gun map graphic WSJ: Websites vary prices, based upon user information

Slide 85

Monitoring predictive policing Gun map graphic Verge: Chicago crime and profiling Geekwire: Predictive Policing

Slide 86

Investigating human tissue trafficking Gun map graphic ICIJ: The data behind skin and bone

Slide 87

Data + journalism + activism + responsive institutions = social change

Slide 88

The fun part: predictions, prognostications and recommendations!

Slide 89

1) Data will become even more of a strategic resource for media.

Slide 90

2) Better tools will emerge that democratize data skills.

Slide 91

3) News apps will explode as a primary way people consume data journalism.

Slide 92

4) Being digital first means being data-centric and mobile-friendly.

Slide 93

5. Expect more robo-journalism. Human relationships and storytelling still matter.

Slide 94

6) More journalists will need to study the social sciences and statistics. Source: Ed Yong

Slide 95

7) There will be higher standards for accuracy and corrections. Source: Jake Harris

Slide 96

8) Competency in security and data protection will become more important. Source: Jake Harris

Slide 97

9) Demand for more transparency on reader data collection and use. Source: eConsultancy

Slide 98

10) More conflicts over public records, data scraping, and ethics will arise. Gun map graphic

Slide 99

12) Data-driven personalization and predictive news in wearables.

Slide 100

13) More diverse newsrooms will produce better (data) journalism. SOURCE: The Atlantic A 2013 ASNE survey of 68 online news organizations found that 63% of them had no minorities.

Slide 101

14) Be mindful of data-ism and bad data. Embrace skepticism.