If you like this presentation – show it...
The Art and Science of Data-Driven Journalism Alexander B. Howard Tow Fellow, Columbia University May 30, 2014
You know something, John Snow.
This John Snow knew something.
Newspapers have used data for centuries Source: The Guardian
1960s: computer-assisted reporting (CAR) Bob Woodward, via Cliff1066
Traditional tools applying tech to journalism… Calculators and Graphs Mainframe and PCs Spreadsheets Databases Text and code editors Statistics Programming
In the 1990s, government and civil society spread the Internet globally
In the 2000s, mobile phones and social networking connected us ever more
In the 2010s, data creation exploded. Image Credit: Real Time Rome from Senseable.MIT.edu
“Data-driven journalism is the future” Source: Tim Berners-Lee in the Guardian
…combined with new tools & context… Online spreadsheets and wikis Data visualization tools Open source frameworks Code sharing Agile development Cloud storage and processing (EC2 & Heroku) More data and more access Privacy and security riskss
2014: data journalism is the present Gathering, cleaning, organizing, analyzing, visualizing and publishing data to support the creation of acts of journalism
Trendy but not new The collection, protection and interrogation of data as a source, complementing traditional “shoe leather” investigative reporting relying on witnesses, experts and authorities
Dollars for Docs
Chicago Tribune Flame retardants
A tangled web
Los Angeles Times
Reuters: Connected China
Report it out
Show people something new about the world
Tell a story
Center for Public Integrity
Storytelling still matters. “We use these tools to find and tell stories. We use them like we use a telephone. The story is still the thing.” - Anthony DeBarros USA Today Source: Data Journalism and the Big Picture
Make it personal
Understand the context for the data
Show your data
Show your work
Share your code
Questions Is the data clean? Is the data representative? What biases might be hidden in the data? Was the data legally obtained? Does the data contain personally identifiable information (PII)?
Collection Who gathered the data? How? Was it clear how data would be used? Can people opt-out of collection or usage? “Notice and consent” is not enough “Privacy by design” applies to news apps
Data Analysis & Numeracy N = ? Average vs Median Statistical significance? Correlation != causation Regression to the mean
Bad Data Vizwtfviz.net
Present data with context, in context
Be aware of de-anonymization risks
Networked reporting of corruption ICIJ: Offshore Leaks
International Consortium of Investigative Journalists Offshoring $80 journalists 40 countries 260 gigabytes2.5 million files
Create your data “If Stage 1 of data journalism was “find and scrape data,” then… Stage 2 was “ask government agencies to release data” in easy to use formats. Stage 3 is going to be “make your own data”, and those sources of data are going to be automated and updated in real-time.” -Javaun Moradi, Mozilla
Safecast open sourceGeiger counter
Bus route in Nairobi, Kenya
Citizens as Sensors: Andhra Pradesh
Drones + data collection
Open Data, FOIA & Press Freedom
An expanding number of data sources
Social data and crisis data
Open government data platforms
Fauxpen Data In an age of “openwashing”… We need to: Evaluate licenses. Peruse the Terms of Service. Review the governance. Look at community. Check the format.
Center for Public Integrity
Accountability for “personalized redlining” Gun map graphic
Transparency for geographic profiling Gun map graphic WSJ: Websites vary prices, based upon user information
Monitoring predictive policing Gun map graphic Verge: Chicago crime and profiling Geekwire: Predictive Policing
Investigating human tissue trafficking Gun map graphic ICIJ: The data behind skin and bone
Data + journalism + activism + responsive institutions = social change
The fun part: predictions, prognostications and recommendations!
1) Data will become even more of a strategic resource for media.
2) Better tools will emerge that democratize data skills.
3) News apps will explode as a primary way people consume data journalism.
4) Being digital first means being data-centric and mobile-friendly.
5. Expect more robo-journalism. Human relationships and storytelling still matter.
6) More journalists will need to study the social sciences and statistics. Source: Ed Yong
7) There will be higher standards for accuracy and corrections. Source: Jake Harris
8) Competency in security and data protection will become more important. Source: Jake Harris
9) Demand for more transparency on reader data collection and use. Source: eConsultancy
10) More conflicts over public records, data scraping, and ethics will arise. Gun map graphic
12) Data-driven personalization and predictive news in wearables.
13) More diverse newsrooms will produce better (data) journalism. SOURCE: The Atlantic A 2013 ASNE survey of 68 online news organizations found that 63% of them had no minorities.
14) Be mindful of data-ism and bad data. Embrace skepticism.