SlideShare a Scribd company logo
DMAP	
  –	
  Data	
  Mining	
  
Automa/on	
  for	
  Pla3orms	
  
Parang	
  Saraf	
  
Ph.D.	
  Candidate	
  
Discovery	
  Analy/cs	
  Center	
  
Department	
  of	
  Computer	
  Science	
  
Virginia	
  Tech	
  
parang@vt.edu	
  	
  
What	
  is	
  DMAP?	
  
•  DMAP	
  is	
  a	
  system	
  for	
  aggrega/ng	
  official	
  
informa/on	
  about	
  a	
  company	
  from	
  wide	
  
range	
  of	
  data	
  sources	
  
– News	
  Media	
  
– Official	
  Company	
  Blog	
  Posts	
  
– Official	
  TwiJer	
  and	
  Facebook	
  Handles	
  
•  The	
  intui/ve	
  presenta/on	
  of	
  ar/cles	
  helps	
  in	
  
analyzing	
  them	
  faster	
  and	
  beJer	
  
2	
  
Objec/ve	
  
1.  Overview	
  of	
  the	
  system	
  architecture	
  
2.  Example	
  News	
  data	
  files	
  
3.  Ravens	
  Interface	
  
4.  Overview	
  of	
  the	
  web	
  architecture	
  
5.  Example	
  RSS	
  data	
  files	
  
6.  Horse’s	
  Mouth	
  Interface	
  
7.  Example	
  Facebook	
  and	
  TwiJer	
  data	
  files	
  
8.  Gossipers	
  Interface	
  
9.  Moving	
  from	
  Proof	
  of	
  Concept	
  to	
  Produc/on	
  
System	
  
3	
  
DMAP	
  -­‐	
  Objec/ve	
  
•  A	
  one	
  stop	
  pla3orm	
  for	
  displaying	
  everything	
  
official	
  related	
  to	
  a	
  company	
  
– News	
  Sources	
  (Ravens)	
  
– Official	
  Blogs	
  (Horse’s	
  Mouth)	
  
– Official	
  TwiJer	
  and	
  Facebook	
  Accounts	
  (Gossipers)	
  
– Other	
  Informa/on	
  (Facts)	
  
•  Company	
  Informa/on	
  
•  Web	
  ac/vity	
  informa/on	
  
•  Financial	
  Informa/on	
  
4	
  
System	
  Architecture	
  
5	
  
System	
  Architecture	
  
	
  
Data	
  Collec/on	
  Through	
  
Official	
  APIs	
  
•  We	
  use	
  official	
  APIs	
  to	
  collect	
  data	
  –	
  No	
  Scraping	
  
•  Why	
  use	
  Bing	
  and	
  not	
  Google	
  News?	
  
•  Example	
  search	
  result	
  data	
  
•  Bing	
  Cost	
  Calcula/on	
  
6	
  
System	
  Architecture	
  
	
  
Data	
  Cleaning	
  and	
  
Enrichment	
  
•  Structured	
  data	
  doesn’t	
  mean	
  clean	
  data	
  
•  Fetch	
  the	
  original	
  ar/cle	
  by	
  following	
  the	
  url	
  in	
  the	
  
search	
  results	
  
•  Remove	
  Boilerplate	
  
•  Filter	
  again	
  for	
  the	
  keyword	
  
•  Enrich	
  ar/cle	
  with	
  Basis	
  En/ty	
  Extractor	
  
7	
  
System	
  Architecture	
  
	
  
Duplicate	
  Detec/on	
  and	
  
Data	
  Loading	
  
•  Search	
  for	
  duplicate	
  content	
  
•  Load	
  data	
  in	
  the	
  database	
  
8	
  
Ravens	
  Interface	
  
•  Results	
  control	
  op/ons	
  
•  Dynamically	
  generated	
  Google	
  Trends	
  chart	
  
•  Dynamically	
  generated	
  Word	
  Clouds	
  
•  Ar/cle	
  Detailed	
  View	
  
•  Ar/cle	
  specific	
  word	
  clouds	
  
What	
  happens	
  behind-­‐the-­‐scenes	
  when	
  you	
  
click	
  the	
  green	
  “SUBMIT”	
  buJon?	
  
9	
  
System	
  Architecture	
  
	
  
Web	
  Architecture	
  
Model	
  Controller	
  View	
  (MVC)	
  Framework	
  
10	
  
Server-­‐Side	
  
Model	
   Controller	
  
View	
  
Client-­‐Side	
  
40%	
   60%	
  
System	
  Architecture	
  
	
  
What	
  happens	
  aier	
  clicking	
  “Submit”	
  
11	
  
•  Client-­‐side	
  Sanity	
  Check	
  
•  Checks	
  wriJen	
  
•  If	
  everything	
  is	
  as	
  expected,	
  sends	
  the	
  query	
  
parameters	
  to	
  the	
  server	
  
System	
  Architecture	
  
	
  
What	
  happens	
  aier	
  clicking	
  “Submit”	
  
12	
  
1.  Performs	
  basic	
  sanity	
  checks	
  
2.  Get	
  qualifying	
  ar/cle	
  set	
  based	
  on	
  search	
  parameters	
  
3.  Iden/fy	
  the	
  10	
  search	
  results	
  from	
  this	
  set	
  based	
  on	
  page	
  
number	
  
4.  Do	
  a	
  frequency	
  count	
  of	
  all	
  en//es	
  (People,	
  Loca/on,	
  
Organiza/on	
  and	
  Product)	
  and	
  pick	
  the	
  top	
  50	
  in	
  each	
  
category	
  and	
  determine	
  their	
  font	
  size	
  based	
  on	
  frequency.	
  
5.  Generate	
  Google	
  Trends	
  chart	
  data	
  
6.  Return	
  everything	
  back	
  to	
  the	
  client	
  	
  
System	
  Architecture	
  
	
  
What	
  happens	
  aier	
  clicking	
  “Submit”	
  
13	
  
1.  Display	
  the	
  returned	
  search	
  results	
  
2.  Embed	
  dynamically	
  generated	
  Google	
  Search	
  Trends	
  	
  
3.  Dynamically	
  Generate	
  word	
  clouds	
  using	
  D3	
  
RSS	
  Reader	
  
•  Sample	
  blog	
  feed	
  
•  Problem:	
  How	
  to	
  know	
  if	
  someone	
  has	
  published	
  
a	
  new	
  post?	
  
•  Use	
  a	
  feed	
  reader	
  –	
  Inoreader:	
  
–  How	
  to	
  subscribe	
  to	
  a	
  blog	
  in	
  inoreader	
  
–  Checks	
  blogs	
  at	
  regular	
  intervals	
  for	
  new	
  posts	
  
•  Wrote	
  scripts	
  that	
  downloads	
  “new”	
  posts	
  found	
  
by	
  inoreader	
  
•  Sample	
  blog	
  post	
  example	
  fetched	
  through	
  API	
  
14	
  
Horse’s	
  Mouth	
  
•  Selected	
  companies	
  carry	
  over	
  
•  Now	
  ar/cles	
  are	
  limited	
  only	
  to	
  the	
  official	
  
blogs	
  
15	
  
TwiJer	
  API	
  
•  Have	
  to	
  abide	
  by	
  /me-­‐based	
  rates	
  while	
  
fetching	
  data	
  
•  Tweets	
  and	
  replies	
  both	
  count	
  towards	
  tweets	
  
from	
  an	
  account	
  
– We	
  are	
  only	
  interested	
  in	
  tweets	
  
– Discard	
  reply	
  tweets	
  
– There	
  can	
  be	
  a	
  lag	
  while	
  fetching	
  tweets	
  
•  Sample	
  Tweet	
  fetched	
  through	
  API	
  
16	
  
Facebook	
  API	
  
•  Can’t	
  be	
  automated.	
  Have	
  to	
  generate	
  a	
  new	
  
authoriza/on	
  every	
  /me	
  before	
  fetching	
  data	
  
•  Pictures	
  are	
  provided	
  in	
  low	
  resolu/on	
  
•  Sample	
  Facebook	
  post	
  fetched	
  through	
  API	
  
17	
  
Gossipers	
  
•  Solves	
  the	
  feed	
  personaliza/on	
  problem	
  in	
  true	
  
sense	
  
•  TwiJer	
  allows	
  embedding	
  of	
  tweets:	
  
–  Leads	
  to	
  exact	
  same	
  presenta/on	
  format	
  as	
  they	
  
appear	
  on	
  twiJer.com	
  
–  The	
  media	
  links	
  are	
  all	
  embedded	
  and	
  works	
  	
  
•  Facebook	
  doesn’t	
  allow	
  embedding	
  of	
  posts	
  due	
  
to	
  privacy	
  issues	
  
–  Have	
  to	
  recreate	
  the	
  presenta/on	
  look	
  and	
  style	
  
–  Media	
  links	
  don’t	
  work	
  
•  Pagina/on	
  works	
  individually	
  for	
  both	
  of	
  them	
  
18	
  
Facts	
  
•  Will	
  show	
  the	
  following:	
  
– Sta/c	
  informa/on	
  about	
  the	
  company	
  
– Financial	
  Informa/on	
  
– Web	
  usage	
  data	
  
– Other	
  miscellaneous	
  informa/on	
  	
  
19	
  
20	
  
Thank	
  You	
  
Ad

More Related Content

What's hot (19)

Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project report
Bharat Khanna
 
Discovery of ranking fraud for mobile apps
Discovery of ranking fraud for mobile appsDiscovery of ranking fraud for mobile apps
Discovery of ranking fraud for mobile apps
Nexgen Technology
 
Data Analytics Capstone
Data Analytics CapstoneData Analytics Capstone
Data Analytics Capstone
Macemann
 
Classifying Tech News with Sparkling Water
Classifying Tech News with Sparkling WaterClassifying Tech News with Sparkling Water
Classifying Tech News with Sparkling Water
Sri Ambati
 
Rob Procter
Rob ProcterRob Procter
Rob Procter
NSMNSS
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
SonuCreation
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
Hari Prasad
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
Todd Rutherford
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
Hetu Bhavsar
 
PRG/421 Week 4
PRG/421 Week 4PRG/421 Week 4
PRG/421 Week 4
ashhadiqbal
 
STAT!Ref Installation Instructions
STAT!Ref Installation InstructionsSTAT!Ref Installation Instructions
STAT!Ref Installation Instructions
adonahuemcw
 
Search engines
Search enginesSearch engines
Search engines
Murlidhar Sarda
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
Rahul Jha
 
Fake news detection project
Fake news detection projectFake news detection project
Fake news detection project
HarshdaGhai
 
[DrupalCon] Erase Unconscious Bias From Your AI Datasets
[DrupalCon] Erase Unconscious Bias From Your AI Datasets[DrupalCon] Erase Unconscious Bias From Your AI Datasets
[DrupalCon] Erase Unconscious Bias From Your AI Datasets
Lauren Maffeo
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
Savio Aberneithie
 
Red Blue Presentation
Red Blue PresentationRed Blue Presentation
Red Blue Presentation
Lincoln Jackson
 
Sentiment Analysis
Sentiment Analysis Sentiment Analysis
Sentiment Analysis
prnk08
 
Sentiment analysis on demonetisation
Sentiment analysis on demonetisationSentiment analysis on demonetisation
Sentiment analysis on demonetisation
AbrarMohamed5
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project report
Bharat Khanna
 
Discovery of ranking fraud for mobile apps
Discovery of ranking fraud for mobile appsDiscovery of ranking fraud for mobile apps
Discovery of ranking fraud for mobile apps
Nexgen Technology
 
Data Analytics Capstone
Data Analytics CapstoneData Analytics Capstone
Data Analytics Capstone
Macemann
 
Classifying Tech News with Sparkling Water
Classifying Tech News with Sparkling WaterClassifying Tech News with Sparkling Water
Classifying Tech News with Sparkling Water
Sri Ambati
 
Rob Procter
Rob ProcterRob Procter
Rob Procter
NSMNSS
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
SonuCreation
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
Hari Prasad
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
Todd Rutherford
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
Hetu Bhavsar
 
STAT!Ref Installation Instructions
STAT!Ref Installation InstructionsSTAT!Ref Installation Instructions
STAT!Ref Installation Instructions
adonahuemcw
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
Rahul Jha
 
Fake news detection project
Fake news detection projectFake news detection project
Fake news detection project
HarshdaGhai
 
[DrupalCon] Erase Unconscious Bias From Your AI Datasets
[DrupalCon] Erase Unconscious Bias From Your AI Datasets[DrupalCon] Erase Unconscious Bias From Your AI Datasets
[DrupalCon] Erase Unconscious Bias From Your AI Datasets
Lauren Maffeo
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
Savio Aberneithie
 
Sentiment Analysis
Sentiment Analysis Sentiment Analysis
Sentiment Analysis
prnk08
 
Sentiment analysis on demonetisation
Sentiment analysis on demonetisationSentiment analysis on demonetisation
Sentiment analysis on demonetisation
AbrarMohamed5
 

Similar to DMAP: Data Aggregation and Presentation Framework (20)

James Higginbotham - API Design
James Higginbotham - API DesignJames Higginbotham - API Design
James Higginbotham - API Design
John Zozzaro
 
(ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service (ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service
BIOVIA
 
Learning Web Development with Ruby on Rails Launch
Learning Web Development with Ruby on Rails LaunchLearning Web Development with Ruby on Rails Launch
Learning Web Development with Ruby on Rails Launch
Thiam Hock Ng
 
Data-Analytics using python (Module 4).pptx
Data-Analytics using python (Module 4).pptxData-Analytics using python (Module 4).pptx
Data-Analytics using python (Module 4).pptx
DRSHk10
 
Performance monitoring in a DevOps World
Performance monitoring in a DevOps WorldPerformance monitoring in a DevOps World
Performance monitoring in a DevOps World
Solidify
 
10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB
Mat Keep
 
Documentary watch
Documentary watchDocumentary watch
Documentary watch
Université Aix-Marseille - Service commun de la documentation
 
Tableau Customer Presentation
Tableau Customer PresentationTableau Customer Presentation
Tableau Customer Presentation
Splunk
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
MongoDB
 
Getting the Most out of Siebel CTMS with APIs
Getting the Most out of Siebel CTMS with APIsGetting the Most out of Siebel CTMS with APIs
Getting the Most out of Siebel CTMS with APIs
Perficient, Inc.
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
Nikhil Soni
 
USG Rock Eagle 2017 - PWP at 1000 Days
USG Rock Eagle 2017 - PWP at 1000 DaysUSG Rock Eagle 2017 - PWP at 1000 Days
USG Rock Eagle 2017 - PWP at 1000 Days
Eric Sembrat
 
High-level Guide: Upgrading to SharePoint 2013
High-level Guide: Upgrading to SharePoint 2013High-level Guide: Upgrading to SharePoint 2013
High-level Guide: Upgrading to SharePoint 2013
C5 Insight
 
Access Apps for Office 365 with Power BI
Access Apps for Office 365 with Power BIAccess Apps for Office 365 with Power BI
Access Apps for Office 365 with Power BI
Chris McNulty
 
Documentary watch on the web
Documentary watch on the webDocumentary watch on the web
Documentary watch on the web
Université Aix-Marseille - Service commun de la documentation
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
Maruti Gollapudi
 
Building an Open Source AppSec Pipeline - 2015 Texas Linux Fest
Building an Open Source AppSec Pipeline - 2015 Texas Linux FestBuilding an Open Source AppSec Pipeline - 2015 Texas Linux Fest
Building an Open Source AppSec Pipeline - 2015 Texas Linux Fest
Matt Tesauro
 
My First SharePoint Online PowerApp
My First SharePoint Online PowerAppMy First SharePoint Online PowerApp
My First SharePoint Online PowerApp
Becky Bertram
 
Office Add-ins developer community call-July 2019
Office Add-ins developer community call-July 2019Office Add-ins developer community call-July 2019
Office Add-ins developer community call-July 2019
Microsoft 365 Developer
 
Aprenda mais sobre sua aplicação e seus usuários com Application Insights (DN...
Aprenda mais sobre sua aplicação e seus usuários com Application Insights (DN...Aprenda mais sobre sua aplicação e seus usuários com Application Insights (DN...
Aprenda mais sobre sua aplicação e seus usuários com Application Insights (DN...
Igor Abade
 
James Higginbotham - API Design
James Higginbotham - API DesignJames Higginbotham - API Design
James Higginbotham - API Design
John Zozzaro
 
(ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service (ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service
BIOVIA
 
Learning Web Development with Ruby on Rails Launch
Learning Web Development with Ruby on Rails LaunchLearning Web Development with Ruby on Rails Launch
Learning Web Development with Ruby on Rails Launch
Thiam Hock Ng
 
Data-Analytics using python (Module 4).pptx
Data-Analytics using python (Module 4).pptxData-Analytics using python (Module 4).pptx
Data-Analytics using python (Module 4).pptx
DRSHk10
 
Performance monitoring in a DevOps World
Performance monitoring in a DevOps WorldPerformance monitoring in a DevOps World
Performance monitoring in a DevOps World
Solidify
 
10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB
Mat Keep
 
Tableau Customer Presentation
Tableau Customer PresentationTableau Customer Presentation
Tableau Customer Presentation
Splunk
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
MongoDB
 
Getting the Most out of Siebel CTMS with APIs
Getting the Most out of Siebel CTMS with APIsGetting the Most out of Siebel CTMS with APIs
Getting the Most out of Siebel CTMS with APIs
Perficient, Inc.
 
Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
Nikhil Soni
 
USG Rock Eagle 2017 - PWP at 1000 Days
USG Rock Eagle 2017 - PWP at 1000 DaysUSG Rock Eagle 2017 - PWP at 1000 Days
USG Rock Eagle 2017 - PWP at 1000 Days
Eric Sembrat
 
High-level Guide: Upgrading to SharePoint 2013
High-level Guide: Upgrading to SharePoint 2013High-level Guide: Upgrading to SharePoint 2013
High-level Guide: Upgrading to SharePoint 2013
C5 Insight
 
Access Apps for Office 365 with Power BI
Access Apps for Office 365 with Power BIAccess Apps for Office 365 with Power BI
Access Apps for Office 365 with Power BI
Chris McNulty
 
Building an Open Source AppSec Pipeline - 2015 Texas Linux Fest
Building an Open Source AppSec Pipeline - 2015 Texas Linux FestBuilding an Open Source AppSec Pipeline - 2015 Texas Linux Fest
Building an Open Source AppSec Pipeline - 2015 Texas Linux Fest
Matt Tesauro
 
My First SharePoint Online PowerApp
My First SharePoint Online PowerAppMy First SharePoint Online PowerApp
My First SharePoint Online PowerApp
Becky Bertram
 
Office Add-ins developer community call-July 2019
Office Add-ins developer community call-July 2019Office Add-ins developer community call-July 2019
Office Add-ins developer community call-July 2019
Microsoft 365 Developer
 
Aprenda mais sobre sua aplicação e seus usuários com Application Insights (DN...
Aprenda mais sobre sua aplicação e seus usuários com Application Insights (DN...Aprenda mais sobre sua aplicação e seus usuários com Application Insights (DN...
Aprenda mais sobre sua aplicação e seus usuários com Application Insights (DN...
Igor Abade
 
Ad

More from Parang Saraf (20)

Email and Network Analyzer
Email and Network AnalyzerEmail and Network Analyzer
Email and Network Analyzer
Parang Saraf
 
Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data Perspectives
Parang Saraf
 
Slides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming AnalysisSlides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming Analysis
Parang Saraf
 
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity ModelingSlides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Parang Saraf
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
Parang Saraf
 
News Analyzer
News AnalyzerNews Analyzer
News Analyzer
Parang Saraf
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
Parang Saraf
 
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
Parang Saraf
 
Slides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News ArticlesSlides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News Articles
Parang Saraf
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on Twitter
Parang Saraf
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Parang Saraf
 
EMBERS Posters
EMBERS PostersEMBERS Posters
EMBERS Posters
Parang Saraf
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
Parang Saraf
 
Bayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil UnrestBayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil Unrest
Parang Saraf
 
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
Parang Saraf
 
Safeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data PerspectivesSafeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data Perspectives
Parang Saraf
 
Safeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming AnalysisSafeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming Analysis
Parang Saraf
 
Safeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity ModelingSafeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity Modeling
Parang Saraf
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
Parang Saraf
 
Forex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News ArticlesForex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News Articles
Parang Saraf
 
Email and Network Analyzer
Email and Network AnalyzerEmail and Network Analyzer
Email and Network Analyzer
Parang Saraf
 
Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data Perspectives
Parang Saraf
 
Slides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming AnalysisSlides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming Analysis
Parang Saraf
 
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity ModelingSlides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Parang Saraf
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
Parang Saraf
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
Parang Saraf
 
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
Parang Saraf
 
Slides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News ArticlesSlides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News Articles
Parang Saraf
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on Twitter
Parang Saraf
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Parang Saraf
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
Parang Saraf
 
Bayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil UnrestBayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil Unrest
Parang Saraf
 
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
Parang Saraf
 
Safeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data PerspectivesSafeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data Perspectives
Parang Saraf
 
Safeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming AnalysisSafeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming Analysis
Parang Saraf
 
Safeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity ModelingSafeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity Modeling
Parang Saraf
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
Parang Saraf
 
Forex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News ArticlesForex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News Articles
Parang Saraf
 
Ad

Recently uploaded (20)

GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
Collibra DQ Installation setup and debug
Collibra DQ Installation setup and debugCollibra DQ Installation setup and debug
Collibra DQ Installation setup and debug
karthikprince20
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest InsurerSuncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Process mining Evangelist
 
717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx
dharmendrasingh31102
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 
Impact_of_AI_on_Everyday_Life and info.pptx
Impact_of_AI_on_Everyday_Life and info.pptxImpact_of_AI_on_Everyday_Life and info.pptx
Impact_of_AI_on_Everyday_Life and info.pptx
swatibhusari5
 
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahahE-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
RyanRahardjo2
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
...lab.Lab123456789123456789123456789123456789
...lab.Lab123456789123456789123456789123456789...lab.Lab123456789123456789123456789123456789
...lab.Lab123456789123456789123456789123456789
Ghh
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
Collibra DQ Installation setup and debug
Collibra DQ Installation setup and debugCollibra DQ Installation setup and debug
Collibra DQ Installation setup and debug
karthikprince20
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest InsurerSuncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Process mining Evangelist
 
717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx
dharmendrasingh31102
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 
Impact_of_AI_on_Everyday_Life and info.pptx
Impact_of_AI_on_Everyday_Life and info.pptxImpact_of_AI_on_Everyday_Life and info.pptx
Impact_of_AI_on_Everyday_Life and info.pptx
swatibhusari5
 
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahahE-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
RyanRahardjo2
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
...lab.Lab123456789123456789123456789123456789
...lab.Lab123456789123456789123456789123456789...lab.Lab123456789123456789123456789123456789
...lab.Lab123456789123456789123456789123456789
Ghh
 

DMAP: Data Aggregation and Presentation Framework

  • 1. DMAP  –  Data  Mining   Automa/on  for  Pla3orms   Parang  Saraf   Ph.D.  Candidate   Discovery  Analy/cs  Center   Department  of  Computer  Science   Virginia  Tech   [email protected]    
  • 2. What  is  DMAP?   •  DMAP  is  a  system  for  aggrega/ng  official   informa/on  about  a  company  from  wide   range  of  data  sources   – News  Media   – Official  Company  Blog  Posts   – Official  TwiJer  and  Facebook  Handles   •  The  intui/ve  presenta/on  of  ar/cles  helps  in   analyzing  them  faster  and  beJer   2  
  • 3. Objec/ve   1.  Overview  of  the  system  architecture   2.  Example  News  data  files   3.  Ravens  Interface   4.  Overview  of  the  web  architecture   5.  Example  RSS  data  files   6.  Horse’s  Mouth  Interface   7.  Example  Facebook  and  TwiJer  data  files   8.  Gossipers  Interface   9.  Moving  from  Proof  of  Concept  to  Produc/on   System   3  
  • 4. DMAP  -­‐  Objec/ve   •  A  one  stop  pla3orm  for  displaying  everything   official  related  to  a  company   – News  Sources  (Ravens)   – Official  Blogs  (Horse’s  Mouth)   – Official  TwiJer  and  Facebook  Accounts  (Gossipers)   – Other  Informa/on  (Facts)   •  Company  Informa/on   •  Web  ac/vity  informa/on   •  Financial  Informa/on   4  
  • 6. System  Architecture     Data  Collec/on  Through   Official  APIs   •  We  use  official  APIs  to  collect  data  –  No  Scraping   •  Why  use  Bing  and  not  Google  News?   •  Example  search  result  data   •  Bing  Cost  Calcula/on   6  
  • 7. System  Architecture     Data  Cleaning  and   Enrichment   •  Structured  data  doesn’t  mean  clean  data   •  Fetch  the  original  ar/cle  by  following  the  url  in  the   search  results   •  Remove  Boilerplate   •  Filter  again  for  the  keyword   •  Enrich  ar/cle  with  Basis  En/ty  Extractor   7  
  • 8. System  Architecture     Duplicate  Detec/on  and   Data  Loading   •  Search  for  duplicate  content   •  Load  data  in  the  database   8  
  • 9. Ravens  Interface   •  Results  control  op/ons   •  Dynamically  generated  Google  Trends  chart   •  Dynamically  generated  Word  Clouds   •  Ar/cle  Detailed  View   •  Ar/cle  specific  word  clouds   What  happens  behind-­‐the-­‐scenes  when  you   click  the  green  “SUBMIT”  buJon?   9  
  • 10. System  Architecture     Web  Architecture   Model  Controller  View  (MVC)  Framework   10   Server-­‐Side   Model   Controller   View   Client-­‐Side   40%   60%  
  • 11. System  Architecture     What  happens  aier  clicking  “Submit”   11   •  Client-­‐side  Sanity  Check   •  Checks  wriJen   •  If  everything  is  as  expected,  sends  the  query   parameters  to  the  server  
  • 12. System  Architecture     What  happens  aier  clicking  “Submit”   12   1.  Performs  basic  sanity  checks   2.  Get  qualifying  ar/cle  set  based  on  search  parameters   3.  Iden/fy  the  10  search  results  from  this  set  based  on  page   number   4.  Do  a  frequency  count  of  all  en//es  (People,  Loca/on,   Organiza/on  and  Product)  and  pick  the  top  50  in  each   category  and  determine  their  font  size  based  on  frequency.   5.  Generate  Google  Trends  chart  data   6.  Return  everything  back  to  the  client    
  • 13. System  Architecture     What  happens  aier  clicking  “Submit”   13   1.  Display  the  returned  search  results   2.  Embed  dynamically  generated  Google  Search  Trends     3.  Dynamically  Generate  word  clouds  using  D3  
  • 14. RSS  Reader   •  Sample  blog  feed   •  Problem:  How  to  know  if  someone  has  published   a  new  post?   •  Use  a  feed  reader  –  Inoreader:   –  How  to  subscribe  to  a  blog  in  inoreader   –  Checks  blogs  at  regular  intervals  for  new  posts   •  Wrote  scripts  that  downloads  “new”  posts  found   by  inoreader   •  Sample  blog  post  example  fetched  through  API   14  
  • 15. Horse’s  Mouth   •  Selected  companies  carry  over   •  Now  ar/cles  are  limited  only  to  the  official   blogs   15  
  • 16. TwiJer  API   •  Have  to  abide  by  /me-­‐based  rates  while   fetching  data   •  Tweets  and  replies  both  count  towards  tweets   from  an  account   – We  are  only  interested  in  tweets   – Discard  reply  tweets   – There  can  be  a  lag  while  fetching  tweets   •  Sample  Tweet  fetched  through  API   16  
  • 17. Facebook  API   •  Can’t  be  automated.  Have  to  generate  a  new   authoriza/on  every  /me  before  fetching  data   •  Pictures  are  provided  in  low  resolu/on   •  Sample  Facebook  post  fetched  through  API   17  
  • 18. Gossipers   •  Solves  the  feed  personaliza/on  problem  in  true   sense   •  TwiJer  allows  embedding  of  tweets:   –  Leads  to  exact  same  presenta/on  format  as  they   appear  on  twiJer.com   –  The  media  links  are  all  embedded  and  works     •  Facebook  doesn’t  allow  embedding  of  posts  due   to  privacy  issues   –  Have  to  recreate  the  presenta/on  look  and  style   –  Media  links  don’t  work   •  Pagina/on  works  individually  for  both  of  them   18  
  • 19. Facts   •  Will  show  the  following:   – Sta/c  informa/on  about  the  company   – Financial  Informa/on   – Web  usage  data   – Other  miscellaneous  informa/on     19