Computational Journalism 2017 Week 4: Computational Journalism Platforms
Computational Journalism 2017 Week 4: Computational Journalism Platforms
Computational Journalism
Columbia Journalism School
Week 4: Computational Journalism Platforms
https://blog.overviewdocs.com/completed-stories/
Used Overviews topic tree (TF-IDF clustering) to find a group
of key emails from a listserv.
What do Journalists do with Documents, Stray 2016
1. Robust Import
The hardest feature to implement
The most requested, the most used
2. Robust Analysis
What researchers choose
News articles
Academic literature
NLP test data sets
Incredibly dirty source data. Current methods have low recall (~70%)
3. Search, not exploration
A number of previous tools aim to help the user explore
a document collection (such as [6, 9, 10, 12]), though few
of these tools have been evaluated with users from a
specific target domain who bring their own data, making
us suspect that this imprecise term often masks a lack of
understanding of actual user tasks.
Node = variable
Edge = dependence (sampled from)
Filled node = observed data
Choose a topic for each word
user rating
weight of user topics in doc of doc
selections (collaborative)
variation in
per-user topics topics for user
content only
content +
social