SlideShare a Scribd company logo
Using data to improve
student research
EasyBib is an automatic
bibliography composer.
Students use it to cite
sources for their research.
We teach information
literacy.
18%
of all student papers include plagiarism1
Source: (1) TurnItIn; (2) Both Sides Now: Librarians Looking at Information Literacy from High School and College.
50%
likelihood of using a credible vs. non-
credible source1
4%
increase in the use of paper mills and
cheating sites1
~16%
of students are adequately prepared for
college.2
That’s how we felt too..
The problem is becoming
bigger.
Unprepared students
make for unprepared
adults.
It’s not just students who
plagiarize:
•Pal Schmitt, former president
of Hungary
•German education minister
•Jayson Blair (former New
York Times writer)
•Jonah Lehrer, journalist and
author
•Fareed Zakaria (reporter,
author, host)
We are in the right place
to figure it out.
Over half of all
students in the
US (40M)
Over half a billion
citations
We asked ourselves the
following questions:
•What are students using in their
research?
•How good are their sources?
•How can we help them?
We started with the
basics._gaq.push([
'citations._trackEvent',
citationTitle,
citationPublisher,
citationId
]);
Here’s what we found.
Top sources 2010
•Wikipedia
•Google
1.The New York Times
2.CIA World Factbook
3.Oracle Thinkquest
4.Buzzle
5.US BLS
6.Dictionary.com
7.CDC
8.PBS
9.eHow
Source: EasyBib Google Analytics Oct 2010-Nov 2010 data.
What could we do?
•Warn them when their source’s
credibility is in question
•Analyze the quality of their full
bibliography
•Make it easier to not plagiarize
•Suggest better sources
Define credibility.
Improve citation quality
Gave students access to
their own analytics
To combat plagiarism, we
built an audit trail for notes
So after all this...
Does it blend (tm) ?
1. Wikipedia
2. Bio.com
3. History.com
4. PBS
5. Mayo Clinic
6. CDC
7. The New York Times
8. BBC
9. CNN
10.WebMD
11.US BLS
• Wikipedia still on top,
but ...
• No content farms, no
Google..
• WebMD is questionable,
but its credibility can be
argued for.
Source: Apr-May 2013 Google Analytics data
We have to admit, it’s getting
better...
We have to admit, it’s getting
better...
Help students find better
sources
How does the Research
engine currently work?
Cloudant (CouchDB)
MySQL
Lucene/Solr
Slow, asynchronous, lots of moving
parts.
Starting to do a bit more
StatsD::increment($metrics);
$response = $rediska->publish(
array('realtime'),
$citation
);
There’s a lot more we can
do, and data will help us.
Cloudant Search
•Full-text search integrated into Cloudant
•Lucene syntax
•Indexing is easy
function(doc){
index("title", doc.title, {"store": "yes"});
}
•Grouping of sources via chained map-reduce
map: function(doc){
if (doc.title){ emit({"title": doc.title}, 1); }
}
reduce: _sum
dbcopy: citationGroup
------
map: function(doc){
if (doc.title && doc.key.title){ emit(doc.value, doc.key.title); }
}
Live data analysis.
Crowdsourcing.
•Use Cloudant Search to power
feedback on sources (# of times
cited in real time, quality of
bibliographies derived from)
•Allow users to submit their own
credibility evaluations and aggregate
results
SourceRank!
Credibility weighting + crowdsourcing
Synchronous & realtime via Cloudant Search
Value nodes based on nearest neighbors
And other things...
Driving growth
We have the largest UGC citation
set. Making this searchable
creates a “moat.”
The more people that use EasyBib,
the better the tool becomes.
What about other data
analytics tools?
Too stretched to learn more complex tools
(looking for easy answers)
Costs (GA is free!)
EMR, Hadoop, Redshift, Cloudant Search:
This is what’s next.
Questions?
Darshan Somashekar
@darshan
darshan@imagineeasy.com

More Related Content

PPTX
Mapping dh through heterogeneous communicative practices
Wayne State University School of Information Sciences
 
PPTX
Comparing Published Scientific Journal Articles to Their Pre-print Versions
Martin Klein
 
PPTX
How much does $1.7 billion buy?
Martin Klein
 
PPT
RSS Basics And Beyond: Tips and Tricks for Getting the Most out of Syndicate...
Ken Varnum
 
PDF
Comparing Automated Factual Claim Detection Against Judgments of Journalism O...
The Innovative Data Intelligence Research (IDIR) Laboratory, University of Texas at Arlington
 
PDF
The Largest Data Science Program in the World: The Johns Hopkins Data Science...
jtleek
 
PDF
10 things statistics taught us about big data
jtleek
 
PPT
Data, data, data
andrewxhill
 
Mapping dh through heterogeneous communicative practices
Wayne State University School of Information Sciences
 
Comparing Published Scientific Journal Articles to Their Pre-print Versions
Martin Klein
 
How much does $1.7 billion buy?
Martin Klein
 
RSS Basics And Beyond: Tips and Tricks for Getting the Most out of Syndicate...
Ken Varnum
 
Comparing Automated Factual Claim Detection Against Judgments of Journalism O...
The Innovative Data Intelligence Research (IDIR) Laboratory, University of Texas at Arlington
 
The Largest Data Science Program in the World: The Johns Hopkins Data Science...
jtleek
 
10 things statistics taught us about big data
jtleek
 
Data, data, data
andrewxhill
 

Viewers also liked (15)

PDF
Crossing the Chasm (Ikanow - Chicago Summit)
Open Analytics
 
PPTX
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
Open Analytics
 
PPT
CDM….Where do you start? (OA Cyber Summit)
Open Analytics
 
PPTX
An Immigrant’s view of Cyberspace (OA Cyber Summit)
Open Analytics
 
PPTX
Using Real-Time Data to Drive Optimization & Personalization
Open Analytics
 
PDF
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Analytics
 
PPTX
Piwik: An Analytics Alternative (Chicago Summit)
Open Analytics
 
PPTX
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Open Analytics
 
PDF
From Insight to Impact (Chicago Summit - Keynote)
Open Analytics
 
PPTX
Competing in the Digital Economy
Open Analytics
 
PPTX
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
Open Analytics
 
PPTX
M&A Trends in Telco Analytics
Open Analytics
 
PDF
Cyber after Snowden (OA Cyber Summit)
Open Analytics
 
PPTX
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Open Analytics
 
PDF
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Open Analytics
 
Crossing the Chasm (Ikanow - Chicago Summit)
Open Analytics
 
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
Open Analytics
 
CDM….Where do you start? (OA Cyber Summit)
Open Analytics
 
An Immigrant’s view of Cyberspace (OA Cyber Summit)
Open Analytics
 
Using Real-Time Data to Drive Optimization & Personalization
Open Analytics
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Analytics
 
Piwik: An Analytics Alternative (Chicago Summit)
Open Analytics
 
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Open Analytics
 
From Insight to Impact (Chicago Summit - Keynote)
Open Analytics
 
Competing in the Digital Economy
Open Analytics
 
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
Open Analytics
 
M&A Trends in Telco Analytics
Open Analytics
 
Cyber after Snowden (OA Cyber Summit)
Open Analytics
 
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Open Analytics
 
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Open Analytics
 
Ad

Similar to Easybib Open Analytics NYC (20)

PDF
Perceptions of Libraries
Imagine Easy Solutions
 
PDF
The Transition Years: Evaluating Info Lit Skills from High School to College-...
Imagine Easy Solutions
 
PPTX
T carse ESOL_October_2013_3D_Research_presentation
TimCarse
 
PPTX
Trying to stop the kids using google greg sheaf hslg conference 2013
hslgcommittee
 
PPT
Nine Strategies for Enhancing Critical Internet Literacy. Colin Harrison ukla...
Colin Harrison
 
PPTX
Data 101: A Gentle Introduction
ksilk
 
PPTX
Evaluer les nouvelles plates-formes de services web et leur impact sur les bi...
ABES
 
PPTX
The Power of Open Data!
Renaine Julian
 
PPT
How Does Reading & Learning Change on the Internet: Responding to New Literacies
Julie Coiro
 
PPT
Google & garbage lsta 2012
Paige Jaeger
 
PDF
APLIC 2012: Discovering & Dealing with Data
ksilk
 
PDF
Fight for your right!
Lynda Kellam
 
PPTX
The Transition Years
Imagine Easy Solutions
 
PPT
Teaching ten steps to better research
librarykate
 
PPTX
Data 101: A Gentle Introduction
ksilk
 
PDF
@WebSciDL PhD Student Project Reviews August 5&6, 2015
Michael Nelson
 
PPT
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
Lorri Mon
 
PPT
Institutional Repositories (NLA 2011)
Paul Royster
 
PPTX
Introduction to open-data
OpenAccessBelgium
 
PPT
Day 3: Introduction to Information Literacy
Buffy Hamilton
 
Perceptions of Libraries
Imagine Easy Solutions
 
The Transition Years: Evaluating Info Lit Skills from High School to College-...
Imagine Easy Solutions
 
T carse ESOL_October_2013_3D_Research_presentation
TimCarse
 
Trying to stop the kids using google greg sheaf hslg conference 2013
hslgcommittee
 
Nine Strategies for Enhancing Critical Internet Literacy. Colin Harrison ukla...
Colin Harrison
 
Data 101: A Gentle Introduction
ksilk
 
Evaluer les nouvelles plates-formes de services web et leur impact sur les bi...
ABES
 
The Power of Open Data!
Renaine Julian
 
How Does Reading & Learning Change on the Internet: Responding to New Literacies
Julie Coiro
 
Google & garbage lsta 2012
Paige Jaeger
 
APLIC 2012: Discovering & Dealing with Data
ksilk
 
Fight for your right!
Lynda Kellam
 
The Transition Years
Imagine Easy Solutions
 
Teaching ten steps to better research
librarykate
 
Data 101: A Gentle Introduction
ksilk
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
Michael Nelson
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
Lorri Mon
 
Institutional Repositories (NLA 2011)
Paul Royster
 
Introduction to open-data
OpenAccessBelgium
 
Day 3: Introduction to Information Literacy
Buffy Hamilton
 
Ad

More from Open Analytics (15)

PDF
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Open Analytics
 
PPTX
MarkLogic - Open Analytics Meetup
Open Analytics
 
PPTX
The caprate presentation_july2013_open analytics dc meetup
Open Analytics
 
PPTX
Verifeed open analytics_3min deck_071713_final
Open Analytics
 
PDF
HDScores OA DC Pitch
Open Analytics
 
PDF
Oas schwartz 16
Open Analytics
 
PDF
Oas schwartz OA Summit
Open Analytics
 
PDF
Luigi presentation OA Summit
Open Analytics
 
PDF
Intridea ajn-rttos OA NYC Summit
Open Analytics
 
PPTX
Open analytics summit nyc
Open Analytics
 
PPTX
Big data-science-oanyc
Open Analytics
 
PPTX
Optier presentation for open analytics event
Open Analytics
 
PPTX
Candor - open analytics nyc
Open Analytics
 
PPTX
Big data bi-mature-oanyc summit
Open Analytics
 
PPTX
No sql and sql - open analytics summit
Open Analytics
 
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Open Analytics
 
MarkLogic - Open Analytics Meetup
Open Analytics
 
The caprate presentation_july2013_open analytics dc meetup
Open Analytics
 
Verifeed open analytics_3min deck_071713_final
Open Analytics
 
HDScores OA DC Pitch
Open Analytics
 
Oas schwartz 16
Open Analytics
 
Oas schwartz OA Summit
Open Analytics
 
Luigi presentation OA Summit
Open Analytics
 
Intridea ajn-rttos OA NYC Summit
Open Analytics
 
Open analytics summit nyc
Open Analytics
 
Big data-science-oanyc
Open Analytics
 
Optier presentation for open analytics event
Open Analytics
 
Candor - open analytics nyc
Open Analytics
 
Big data bi-mature-oanyc summit
Open Analytics
 
No sql and sql - open analytics summit
Open Analytics
 

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Software Development Methodologies in 2025
KodekX
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
The Future of Artificial Intelligence (AI)
Mukul
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 

Easybib Open Analytics NYC

  • 1. Using data to improve student research
  • 2. EasyBib is an automatic bibliography composer. Students use it to cite sources for their research.
  • 3. We teach information literacy. 18% of all student papers include plagiarism1 Source: (1) TurnItIn; (2) Both Sides Now: Librarians Looking at Information Literacy from High School and College. 50% likelihood of using a credible vs. non- credible source1 4% increase in the use of paper mills and cheating sites1 ~16% of students are adequately prepared for college.2
  • 4. That’s how we felt too..
  • 5. The problem is becoming bigger.
  • 6. Unprepared students make for unprepared adults. It’s not just students who plagiarize: •Pal Schmitt, former president of Hungary •German education minister •Jayson Blair (former New York Times writer) •Jonah Lehrer, journalist and author •Fareed Zakaria (reporter, author, host)
  • 7. We are in the right place to figure it out. Over half of all students in the US (40M) Over half a billion citations
  • 8. We asked ourselves the following questions: •What are students using in their research? •How good are their sources? •How can we help them?
  • 9. We started with the basics._gaq.push([ 'citations._trackEvent', citationTitle, citationPublisher, citationId ]);
  • 10. Here’s what we found. Top sources 2010 •Wikipedia •Google 1.The New York Times 2.CIA World Factbook 3.Oracle Thinkquest 4.Buzzle 5.US BLS 6.Dictionary.com 7.CDC 8.PBS 9.eHow Source: EasyBib Google Analytics Oct 2010-Nov 2010 data.
  • 11. What could we do? •Warn them when their source’s credibility is in question •Analyze the quality of their full bibliography •Make it easier to not plagiarize •Suggest better sources
  • 14. Gave students access to their own analytics
  • 15. To combat plagiarism, we built an audit trail for notes
  • 16. So after all this... Does it blend (tm) ? 1. Wikipedia 2. Bio.com 3. History.com 4. PBS 5. Mayo Clinic 6. CDC 7. The New York Times 8. BBC 9. CNN 10.WebMD 11.US BLS • Wikipedia still on top, but ... • No content farms, no Google.. • WebMD is questionable, but its credibility can be argued for. Source: Apr-May 2013 Google Analytics data
  • 17. We have to admit, it’s getting better... We have to admit, it’s getting better...
  • 18. Help students find better sources
  • 19. How does the Research engine currently work? Cloudant (CouchDB) MySQL Lucene/Solr Slow, asynchronous, lots of moving parts.
  • 20. Starting to do a bit more StatsD::increment($metrics); $response = $rediska->publish( array('realtime'), $citation );
  • 21. There’s a lot more we can do, and data will help us.
  • 22. Cloudant Search •Full-text search integrated into Cloudant •Lucene syntax •Indexing is easy function(doc){ index("title", doc.title, {"store": "yes"}); } •Grouping of sources via chained map-reduce map: function(doc){ if (doc.title){ emit({"title": doc.title}, 1); } } reduce: _sum dbcopy: citationGroup ------ map: function(doc){ if (doc.title && doc.key.title){ emit(doc.value, doc.key.title); } }
  • 23. Live data analysis. Crowdsourcing. •Use Cloudant Search to power feedback on sources (# of times cited in real time, quality of bibliographies derived from) •Allow users to submit their own credibility evaluations and aggregate results
  • 24. SourceRank! Credibility weighting + crowdsourcing Synchronous & realtime via Cloudant Search Value nodes based on nearest neighbors And other things...
  • 25. Driving growth We have the largest UGC citation set. Making this searchable creates a “moat.” The more people that use EasyBib, the better the tool becomes.
  • 26. What about other data analytics tools? Too stretched to learn more complex tools (looking for easy answers) Costs (GA is free!) EMR, Hadoop, Redshift, Cloudant Search: This is what’s next.