Computational Journalism 2017 Week 4: Computational Journalism Platforms

From the course Frontiers of Computational Journalism, Columbia University, Fall 2017 http://www.compjournalism.com/?p=206

Uploaded by

Jonathan Stray

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

191 views

Computational Journalism 2017 Week 4: Computational Journalism Platforms

From the course Frontiers of Computational Journalism, Columbia University, Fall 2017 http://www.compjournalism.com/?p=206

Uploaded by

Jonathan Stray

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Frontiers of

Computational Journalism
Columbia Journalism School
Week 4: Computational Journalism Platforms

September 29, 2017

This class
What do journalists do with Documents
The Computational Journalism Workbench
Plate Notation
NYT Recommender
What do journalists do with
documents?
Overview prototype running on Iraq security contractor docs, Feb 2012
Technical troubles with a new system meant that almost 70,000 North
Carolina residents received their food stamps late this summer. Thats
8.5 percent of the number of clients the state currently serves every
month. The problem was eventually traced to web browser
compatibility issues. WRAL reporter Tyler Dukes obtained 4,500 pages
of emails on paper from various government departments and
used DocumentCloud and Overview to piece together this story.

https://blog.overviewdocs.com/completed-stories/
Used Overviews topic tree (TF-IDF clustering) to find a group
of key emails from a listserv.
What do Journalists do with Documents, Stray 2016
1. Robust Import
The hardest feature to implement
The most requested, the most used
2. Robust Analysis
What researchers choose
News articles
Academic literature
NLP test data sets

What journalists deal with

PDF dumps
Printed, scanned emails
A million pages scraped from an antique site
CD full of random files
LAPD Crime Descriptions

VICTS AND SUSPS BECAME INV IN VERBA ARGUMENT SUSP

THEN BEGAN HITTING VICTS IN THE FACE
Entity recognition is not solved!
Entities found
out of 150

Incredibly dirty source data. Current methods have low recall (~70%)
3. Search, not exploration
A number of previous tools aim to help the user explore
a document collection (such as [6, 9, 10, 12]), though few
of these tools have been evaluated with users from a
specific target domain who bring their own data, making
us suspect that this imprecise term often masks a lack of
understanding of actual user tasks.

Overview: The Design, Adoption, and Analysis of a Visual Document

Mining Tool For Investigative Journalists, Brehmer et al, 2014
Suffolk County public safety committee transcript,
Reference to a body left on the street due to union dispute
(via Adam Playford, Newsday, 2014)
4. Quantitative Summaries
Count incident types by date. For Level 14, ProPublica, 2015
LAPD Underreported Serious Assaults, Skewing Crime Stats for 8 Years
Los Angeles Times, 2015
The Child Exchange, Reuters, 2014
5. Interactive Methods
Design Study Methodology: Reflections from the Trenches and the Stacks,
Sedlmair et al, 2012
Extracting yes/no answers from database of Foreign Corrupt Practices
Act cases. Comparison by Ariana Giorgi
6. Clarity and Accuracy
We used a machine-learning method
known as latent Dirichlet allocation to
identify the topics in all 14,400 petitions
and to then categorize the briefs. This
enabled us to identify which lawyers
did which kind of work for which sorts
of petitioners. For example, in cases
where workers sue their employers, the
lawyers most successful getting cases
before the court were far more likely to
represent the employers rather than
the employees.

The Echo Chamber, Reuters, 2014

Evaluation Methods for Topic Models
Wallach et. al. 2009
Interpretation refers to the facility with which an
analyst makes inferences about the data through
the lens of a model abstraction. Trust refers to the
actual and perceived accuracy of an analysts
inferences

Interpretation and Trust: Designing Model-driven Visualizations

for Text Analysis, Chuang et al. 2012
Overview prototype running on Wikileaks cables, early 2012
Overview circa 2014
Overviewdocs.com today
Overview Entity and Multisearch plugins
Overview plugin API
Computational Journalism Workbench
cjworkbench.org
Plate Notation
Probability graphs

Node = variable
Edge = dependence (sampled from)
Filled node = observed data
Choose a topic for each word

Both PLSA and LDA model each document as a distribution over

topics. Each word belongs to a single topic.
LDA Plate Notation
topics in doc
topic topic for word words in topics word
word in doc
concentration concentration
parameter parameter

N words D docs K topics

in doc
New York Times recommender
Combining collaborative filtering
and topic modeling
Collaborative Topic Modeling
topic topics in doc topic for word word in doc K topics
concentration (content)

user rating
weight of user topics in doc of doc
selections (collaborative)

variation in
per-user topics topics for user
content only

content +
social

Every Page is Page One
From Everand
Every Page is Page One
Mark Baker
3.5/5 (6)
Applied Data Science Camp - Info
100% (1)
Applied Data Science Camp - Info
12 pages
What Do Journalists Do With Documents? Field Notes For NLP Researchers
No ratings yet
What Do Journalists Do With Documents? Field Notes For NLP Researchers
33 pages
Natural Language Processing in Investigative Journalism
No ratings yet
Natural Language Processing in Investigative Journalism
53 pages
CJDB Cidr11 Clyy Nov10
No ratings yet
CJDB Cidr11 Clyy Nov10
4 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Humanitarian Applications of Big Data: Prof. (MRS.) Sindhu Nair, Mr. Neel Shah, Mr. Pinank Shah
No ratings yet
Humanitarian Applications of Big Data: Prof. (MRS.) Sindhu Nair, Mr. Neel Shah, Mr. Pinank Shah
3 pages
From Algorithms To Stories.
No ratings yet
From Algorithms To Stories.
49 pages
A News Analysis and Tracking System
No ratings yet
A News Analysis and Tracking System
6 pages
FALLSEM2023-24 CSE4022 ETH VL2023240103739 2023-08-23 Reference-Material-II
No ratings yet
FALLSEM2023-24 CSE4022 ETH VL2023240103739 2023-08-23 Reference-Material-II
5 pages
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Drj 10 Module 2 Reading Material - Probelm Structuring
No ratings yet
Drj 10 Module 2 Reading Material - Probelm Structuring
19 pages
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Relationship Extraction: Fundamentals and Applications
From Everand
Relationship Extraction: Fundamentals and Applications
Fouad Sabry
No ratings yet
dvt u4 my notes
No ratings yet
dvt u4 my notes
15 pages
Yvonne Leow's Knight Project Proposal
No ratings yet
Yvonne Leow's Knight Project Proposal
4 pages
Legal Document Clustering and Summarization
No ratings yet
Legal Document Clustering and Summarization
4 pages
Towardsenablingsocialanalysis Ofscientificdata
No ratings yet
Towardsenablingsocialanalysis Ofscientificdata
4 pages
SoUnD Framework_Analyzing (So)Cial
No ratings yet
SoUnD Framework_Analyzing (So)Cial
30 pages
paper42
No ratings yet
paper42
8 pages
AI
No ratings yet
AI
32 pages
The Language of Technical Communication
From Everand
The Language of Technical Communication
Ray Gallon
No ratings yet
Framing
No ratings yet
Framing
17 pages
Data Science for Librarians: Transforming Information into Insight
From Everand
Data Science for Librarians: Transforming Information into Insight
Jason Miller
1/5 (1)
Reverse Engineering Domain Analysis
No ratings yet
Reverse Engineering Domain Analysis
13 pages
Research Challenge On Opinion Mining and Sentiment Analysis: Background
No ratings yet
Research Challenge On Opinion Mining and Sentiment Analysis: Background
9 pages
Document and Knowledge Management Interrelationships
From Everand
Document and Knowledge Management Interrelationships
A. Afritopic
4.5/5 (2)
Presentation 1
No ratings yet
Presentation 1
26 pages
Rapid Exploitation and Analysis of Document
No ratings yet
Rapid Exploitation and Analysis of Document
40 pages
Requirement Engg Unit 3.3
No ratings yet
Requirement Engg Unit 3.3
47 pages
Full Text 01
No ratings yet
Full Text 01
68 pages
Value Creation with Digital Twins: Conceptual Reference Frameworks and Case Study
From Everand
Value Creation with Digital Twins: Conceptual Reference Frameworks and Case Study
Linard Dario Barth
No ratings yet
Chapter 06
No ratings yet
Chapter 06
39 pages
Prediction of Cyber Attacks Using Data Science Technique
No ratings yet
Prediction of Cyber Attacks Using Data Science Technique
11 pages
CPI AI Case Study Machine Reading
No ratings yet
CPI AI Case Study Machine Reading
6 pages
978-3-031-17693-7
No ratings yet
978-3-031-17693-7
245 pages
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Nuts and Bolts of Party Congress Research - Final Version
No ratings yet
The Nuts and Bolts of Party Congress Research - Final Version
23 pages
DAV Imp
No ratings yet
DAV Imp
8 pages
How to Research Qualitatively: Tips for Scientific Working
From Everand
How to Research Qualitatively: Tips for Scientific Working
Martin Gertler
No ratings yet
Data Science Basics
From Everand
Data Science Basics
Zoe Codewell
No ratings yet
1408.1675v3
No ratings yet
1408.1675v3
21 pages
GREC03BZNH
No ratings yet
GREC03BZNH
10 pages
Learning Social Media Analytics with R
From Everand
Learning Social Media Analytics with R
Raghav Bali
No ratings yet
ParasieDagiral2012 Data-Drivenjournalism NMS Preprint PDF
No ratings yet
ParasieDagiral2012 Data-Drivenjournalism NMS Preprint PDF
27 pages
Object Oriented Software Engineering: Analysis
No ratings yet
Object Oriented Software Engineering: Analysis
38 pages
COMMEDIA_Class9AIUnit1_20240821160102-1.docx_20240831_125951_0000
No ratings yet
COMMEDIA_Class9AIUnit1_20240821160102-1.docx_20240831_125951_0000
7 pages
RDF Journal Compilation
No ratings yet
RDF Journal Compilation
7 pages
Wicked, Incomplete, and Uncertain: User Support in the Wild and the Role of Technical Communication
From Everand
Wicked, Incomplete, and Uncertain: User Support in the Wild and the Role of Technical Communication
Jason Swarts
No ratings yet
Buyle 2017
No ratings yet
Buyle 2017
8 pages
DVT UNIT -4 Notes 211124 (1)
No ratings yet
DVT UNIT -4 Notes 211124 (1)
21 pages
Introduction to Data Analysis in Qualitative Research
From Everand
Introduction to Data Analysis in Qualitative Research
Asher Shkedi
No ratings yet
Frameworks For Access Peter Brantley: PG 1 of 5
No ratings yet
Frameworks For Access Peter Brantley: PG 1 of 5
5 pages
Preparing Data for Analysis with JMP
From Everand
Preparing Data for Analysis with JMP
Robert Carver
No ratings yet
A Two Staged NLP Based Framework For Assessing The Sentiments On Indian Supreme Court Judgments
No ratings yet
A Two Staged NLP Based Framework For Assessing The Sentiments On Indian Supreme Court Judgments
10 pages
Print Mo Na Toh
No ratings yet
Print Mo Na Toh
56 pages
443-Se
No ratings yet
443-Se
9 pages
Mastering Social Media Mining with Python
From Everand
Mastering Social Media Mining with Python
Marco Bonzanini
5/5 (1)
Digital Scholary Editions as Interfaces
From Everand
Digital Scholary Editions as Interfaces
BoD - Books on Demand
No ratings yet
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter 2 - Notes_230930_181130
No ratings yet
Chapter 2 - Notes_230930_181130
9 pages
Computational Journalism Week 11: Privacy and Security
No ratings yet
Computational Journalism Week 11: Privacy and Security
89 pages
Computational Journalism 2017 Week 6: Drawing Conclusions From Data
No ratings yet
Computational Journalism 2017 Week 6: Drawing Conclusions From Data
82 pages
Computational Journalism 2017 Week 5: Quantification and Statistics
No ratings yet
Computational Journalism 2017 Week 5: Quantification and Statistics
71 pages
Computational Journalism 2016 Week 11: Privacy and Security
No ratings yet
Computational Journalism 2016 Week 11: Privacy and Security
98 pages
Computational Journalism 2017 Week 1: Introduction
No ratings yet
Computational Journalism 2017 Week 1: Introduction
90 pages
Introduction. Computational Journalism Week 1
100% (1)
Introduction. Computational Journalism Week 1
58 pages
Computational Journalism 2016 Week 6: Drawing Conclusions From Data
No ratings yet
Computational Journalism 2016 Week 6: Drawing Conclusions From Data
75 pages
Practical Digital Security For Journalists
No ratings yet
Practical Digital Security For Journalists
98 pages
Algorithmic Accountability. Computational Journalism Week 9
No ratings yet
Algorithmic Accountability. Computational Journalism Week 9
16 pages
Computational Journalism 2016 Week 4: Filters As Editors
No ratings yet
Computational Journalism 2016 Week 4: Filters As Editors
57 pages
Computational Journalism 2016 Week 3: Algorithmic Filtering
No ratings yet
Computational Journalism 2016 Week 3: Algorithmic Filtering
61 pages
Computational Journalism 2016 Week 2: Text Analysis
No ratings yet
Computational Journalism 2016 Week 2: Text Analysis
68 pages
Clustering. Computational Journalism Week 2
No ratings yet
Clustering. Computational Journalism Week 2
41 pages
Seeing Media Polarization Through Data
No ratings yet
Seeing Media Polarization Through Data
34 pages
Inserting Vertical and Slanted Columns: 1. Start Revit 2021
No ratings yet
Inserting Vertical and Slanted Columns: 1. Start Revit 2021
3 pages
Byte Con Fiden Tial Don Otc Opy: Ga-78Lmt-Usb3
No ratings yet
Byte Con Fiden Tial Don Otc Opy: Ga-78Lmt-Usb3
31 pages
Web Application Hacking Penetration Testing 5 Day Hands On Course Syllabus v2.0 New
No ratings yet
Web Application Hacking Penetration Testing 5 Day Hands On Course Syllabus v2.0 New
8 pages
CenterVue DRS Reference Sheet
No ratings yet
CenterVue DRS Reference Sheet
2 pages
Aspnet Core Web API Best Practices.03
No ratings yet
Aspnet Core Web API Best Practices.03
30 pages
Cookbook Examples Langchain Chat With SQL Using Langchain - Ipynb at Main Google-Gemini Cookbook
No ratings yet
Cookbook Examples Langchain Chat With SQL Using Langchain - Ipynb at Main Google-Gemini Cookbook
9 pages
Conventional and Improved 4-Bit Ripple Carry Adder Comparison
No ratings yet
Conventional and Improved 4-Bit Ripple Carry Adder Comparison
4 pages
IBM Spectrum Protect v8.1.0
No ratings yet
IBM Spectrum Protect v8.1.0
232 pages
Ooad
No ratings yet
Ooad
1 page
5G RAN Operation and Configuration: Exercises
No ratings yet
5G RAN Operation and Configuration: Exercises
24 pages
Minesight Axis Workflow Overview: The Solution For Integrated Operations
No ratings yet
Minesight Axis Workflow Overview: The Solution For Integrated Operations
14 pages
UN55F9000 Manual
No ratings yet
UN55F9000 Manual
179 pages
Custom Export Into Excel in Yii Framework 140123062342 Phpapp01
No ratings yet
Custom Export Into Excel in Yii Framework 140123062342 Phpapp01
4 pages
New Admin UI - Admin Landing Page - Information On Changes and How To Update
No ratings yet
New Admin UI - Admin Landing Page - Information On Changes and How To Update
3 pages
class01_cs230s22
No ratings yet
class01_cs230s22
54 pages
Aspiring Minds Sample Papers
No ratings yet
Aspiring Minds Sample Papers
14 pages
Crucial Brand Styleguide
No ratings yet
Crucial Brand Styleguide
106 pages
Process Maps and Turtle Diagrams Example
100% (3)
Process Maps and Turtle Diagrams Example
2 pages
Software Reliability
100% (1)
Software Reliability
49 pages
Getting Started Guide NX
No ratings yet
Getting Started Guide NX
22 pages
README K4MobiDeDRM Plugin
No ratings yet
README K4MobiDeDRM Plugin
1 page
101 RPA Bots by ElectroNeek
No ratings yet
101 RPA Bots by ElectroNeek
116 pages
bf348 Compal LA-B981P-DIS Acer Aspire E5-511
No ratings yet
bf348 Compal LA-B981P-DIS Acer Aspire E5-511
47 pages
Online Banking
No ratings yet
Online Banking
15 pages
Design Tools and Techniques (Year 3 Semester 1) : University of Mines and Technology, Tarkwa Umat
No ratings yet
Design Tools and Techniques (Year 3 Semester 1) : University of Mines and Technology, Tarkwa Umat
42 pages
ECT426 M5 Ktunotes - in
No ratings yet
ECT426 M5 Ktunotes - in
34 pages
Mobile phone cloning
No ratings yet
Mobile phone cloning
10 pages
Procedure Text
No ratings yet
Procedure Text
8 pages
Top 20 CCNA Interview Questions and Answers
No ratings yet
Top 20 CCNA Interview Questions and Answers
4 pages

Computational Journalism 2017 Week 4: Computational Journalism Platforms

Uploaded by

Computational Journalism 2017 Week 4: Computational Journalism Platforms

Uploaded by

Frontiers of

September 29, 2017

What journalists deal with

VICTS AND SUSPS BECAME INV IN VERBA ARGUMENT SUSP

Overview: The Design, Adoption, and Analysis of a Visual Document

The Echo Chamber, Reuters, 2014

Interpretation and Trust: Designing Model-driven Visualizations

Both PLSA and LDA model each document as a distribution over

N words D docs K topics

You might also like