SlideShare a Scribd company logo
Predictive Text Analytics Seth Grimes Alta Plana Corporation 301-270-0795 --  http://altaplana.com --  @SethGrimes Predictive Analytics World October 20, 2009
Context Counts
What is Analytics? http://www.tropicalisland.de/NYC_New_York_Brooklyn_Bridge_from_World_Trade_Center_b.jpg x(t) = t y(t) = ½ a (e t/a  + e -t/a ) = acosh(t/a) http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg c:\temp\Marcus.pdf c:\temp\Marcus.pdf
What is Analytics? "SUMLEV","STATE","COUNTY","STNAME","CTYNAME","YEAR","POPESTIMATE", 50,19,1,"Iowa","Adair County",1,8243,4036,4207,446,225,221,994,509 50,19,1,"Iowa","Adair County",2,8243,4036,4207,446,225,221,994,509 50,19,1,"Iowa","Adair County",3,8212,4020,4192,442,222,220,987,505 50,19,1,"Iowa","Adair County",4,8095,3967,4128,432,208,224,935,488 50,19,1,"Iowa","Adair County",5,8003,3924,4079,405,186,219,928,495 50,19,1,"Iowa","Adair County",6,7961,3892,4069,384,183,201,907,472 50,19,1,"Iowa","Adair County",7,7875,3855,4020,366,179,187,871,454 50,19,1,"Iowa","Adair County",8,7795,3817,3978,343,162,181,841,439 50,19,1,"Iowa","Adair County",9,7714,3777,3937,338,159,179,805,417
www.stanford.edu/%7ernusse/wntwindow.html Axin and Frat1 interact with dvl and GSK, bridging Dvl to GSK in Wnt-mediated regulation of LEF-1. Wnt proteins transduce their signals through dishevelled (Dvl) proteins to inhibit glycogen synthase kinase 3beta (GSK), leading to the accumulation of cytosolic beta-catenin and activation of TCF/LEF-1 transcription factors. To understand the mechanism by which Dvl acts through GSK to regulate LEF-1, we investigated the roles of Axin and Frat1 in Wnt-mediated activation of LEF-1 in mammalian cells. We found that Dvl interacts with Axin and with Frat1, both of which interact with GSK. Similarly, the Frat1 homolog GBP binds Xenopus Dishevelled in an interaction that requires GSK. We also found that Dvl, Axin and GSK can form a ternary complex bridged by Axin, and that Frat1 can be recruited into this complex probably by Dvl. The observation that the Dvl-binding domain of either Frat1 or Axin was able to inhibit Wnt-1-induced LEF-1 activation suggests that the interactions between Dvl and Axin and between Dvl and Frat may be important for this signaling pathway. Furthermore, Wnt-1 appeared to promote the disintegration of the Frat1-Dvl-GSK-Axin complex, resulting in the dissociation of GSK from Axin. Thus, formation of the quaternary complex may be an important step in Wnt signaling, by which Dvl recruits Frat1, leading to Frat1-mediated dissociation of GSK from Axin. www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10428961&dopt=Abstract The “Unstructured Data” Challenge
Modelling Text Metadata E.g., title, author, date Statistics Typically via vector space methods E.g., term frequency, cooccurrence, proximity Linguistics Lexicons, gazetteers, phrase books Word morphology, parts of speech, syntactic rules Larger-scale structure including discourse Machine learning
“ Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.” H.P. Luhn,  The Automatic Creation of Literature Abstracts ,  IBM Journal , 1958.
Text Modelling Document content can be considered an unordered “bag of words.” Particular documents are points in a high-dimensional vector space. Salton, Wong & Yang, “A Vector Space Model for Automatic Indexing,” November 1975.
Text Modelling We might construct a term- document matrix ... D1 = "I like databases" D2 = "I hate hate databases" and use a weighting such as  TF-IDF (term frequency–inverse document frequency)… in computing the cosine of the angle between weighted doc-vectors to determine similarity. http://en.wikipedia.org/wiki/Term-document_matrix I like hate databases D1 1 1 0 1 D2 1 0 2 1
Text Modelling Analytical methods make text tractable. Latent semantic indexing utilizing singular value decomposition for term reduction / feature selection. Creates a new, reduced concept space. Takes care of synonymy, polysemy, stemming, etc. Classification technologies / methods: Naive Bayes. Support Vector Machine. K-nearest neighbor.
Text Modelling In the form of  query-document similarity , this is Information Retrieval 101. See, for instance, Salton & Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” 1988. A useful basic tech paper: Russ Albright, SAS, “Taming Text with the SVD,” 2004. Given the complexity of human language, statistical models may fall short.
Semantics “ This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.” -- Hans Peter Luhn, 1958
New York Times , September 8, 1957 Anaphora / coreference:  “They”
Semantics Why do we need linguistics? The Dow  fell  46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq composite  gained  6.84, or 0.32 percent, to 2,162.78. The Dow  gained  46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq composite  fell  6.84, or 0.32 percent, to 2,162.78. (Example: Luca Scagliarini, Expert System.) John pushed  Max .  He  fell. John  pushed Max.  He  laughed. (Example:  Laure Vieu and Patrick Saint-Dizier .)
 
 
 
 
 
 
Text Analytics Text analytics automates what researchers, writers, scholars, and all the rest of us have been doing for years.  Text analytics -- Applies linguistic and/or statistical techniques to extract concepts and patterns  that can be applied to categorize and classify documents, audio, video, images. Transforms “unstructured” information into data  for application of traditional analysis techniques. Unlocks meaning and relationships  in large volumes of information that were previously unprocessable by computer. ...  models text .
Predictive Text Analytics? Let’s consider three interpretations: Predictive text  analytics Prediction applied to text. Predictive analytics  from text sources Analysis of information extracted from text. Predictive  text analytics Clustering and classification of the text at document & feature levels.
Predictive Text Analytics Predictive text  analytics Prediction applied to text. Predictive analytics  from text sources Analysis of information extracted from text. Predictive  text analytics Clustering and classification of the text at document & feature levels.
Predictive Text Basic modelling to facilitate functions such as: Completion Disambiguation : use dictionaries, context Error correction http://en.wikipedia.org/wiki/File:ITap_on_Motorola_C350.jpg
Predictive Text Marti Hearst in  Search User Interfaces : “ Search logs suggest that from 10-15% of queries contain spelling or typographical errors. Fittingly, one important query reformulation tool is spelling suggestions or corrections.”
Predictive Text Analytics Predictive text  analytics Prediction applied to text. Predictive analytics  from text sources Analysis of information extracted from text. Predictive  text analytics Clustering and classification of the text at document & feature levels.
“ The bulk of information value is perceived as coming from data in relational tables. The reason is that data that is structured is easy to mine and analyze.” –  Prabhakar Raghavan, Yahoo Research, former CTO of enterprise-search vendor Verity (now part of Autonomy) ‏, 2004 Yet it’s a truism that 80% of enterprise information is in “unstructured” form.  Predictive Analytics, Text Sources
Consider: E-mail, news & blog articles, microblogging, forum postings, and other social media. Contact-center notes and transcripts; recorded conversations. Surveys, feedback forms, warranty claims, case reports. And every other sort of document imaginable. These sources may contain “traditional” data. The Dow fell 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq composite gained 6.84, or 0.32 percent, to 2,162.78. Sources
Search++ Search is typically answer #1.
Beyond Search Text analytics extracts and classifies by – Entities: names, e-mail addresses, phone numbers Concepts: abstractions of entities. Facts and relationships. Events. Abstract attributes, e.g., “expensive,” “comfortable” Opinions, sentiments: attitudinal data. –  for search indexes, knowledge bases, and databases.
Knowledge Discovery
Visualizing Interrelationships
Applications Text analytics has applications in – Intelligence & law enforcement. Life sciences. Media & publishing including social-media analysis. Competitive intelligence. Voice of the Customer: CRM, product management & marketing. Legal, tax & regulatory, compliance. HR & recruiting. Great  lift  potential when coupled with transactional & operational data.
Predictive Text Analytics Predictive text  analytics Prediction applied to text. Predictive analytics  from text sources Analysis of information extracted from text. Predictive  text analytics Clustering and classification of the text at document & feature levels.
Document processing -- This slide and the next show dynamic, clustered search results from Grokker (now gone)…
… with a zoomable display. Clustering here identifies cohesive groupings of retrieved documents.
 
Sentiment Analysis “ Sentiment analysis is the task of identifying positive and negative opinions, emotions, and evaluations.”  -- Wilson, Wiebe & Hoffman, 2005, “ Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis” Steps include: 1) detection, 2) classification, 3) measurement: WW&H (for example) used over 8,000 subjectivity indicators. Polarity: positive, negative, (both,) or neutral. Intensity.
Complications. Levels: Corpus / data space, i.e., across multiple sources. Document. Statement / sentence. Entity / topic / concept. Language characteristics: Jargon, slang, irony, ambiguity, anaphora, polysemy, synonymy, etc. Context is key.  Discourse analysis comes into play. Sentiment holder ≠ object:  Geithner said unemployment will worsen… Sentiment Analysis
Steps in the Right Direction
 
Unfiltered duplicates External reference “ Kind” = type, variety, not  sentiment. Naïve misclassification ... and Missteps
“ We present a system that adds an emotional dimension to an activity that Internet users engage in frequently, search..”  --  Sood  &  Vasserman  & Hoffman, 2009, “ ESSE : Exploring Mood on the Web” Beyond Polarity
Happy Sad Angry Energetic Confused Aggravated Bouncy Crappy Angry Happy Crushed Bitchy Hyper Depressed Enraged Cheerful Distressed Infuriated Ecstatic Envious Irate Excited Gloomy Pissed off Jubilant Guilty Giddy Intimidated Giggly Jealous Lonely Rejected Sad Scared ----------------------- The three prominent mood groups that emerged from K-Means Clustering on the set of  LiveJournal  mood labels.
Predictive Text Analytics Predictive text  analytics Prediction applied to text. Predictive analytics  from text sources Analysis of information extracted from text. Predictive  text analytics Clustering and classification of the text at document & feature levels.
Seth Grimes Alta Plana Corporation 301-270-0795 –  http://altaplana.com @SethGrimes
Ad

More Related Content

What's hot (20)

Feature Engineering.pdf
Feature Engineering.pdfFeature Engineering.pdf
Feature Engineering.pdf
Rajoo Jha
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
ankit_ppt
 
How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...
How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...
How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...
Edureka!
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
varshakumar21
 
Nltk
NltkNltk
Nltk
Anirudh
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
gokulprasath06
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
Sushil Kulkarni
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Edureka!
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
Hugo Bowne-Anderson
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics Presentation
Skylar Ritchie
 
Tableau 7.0 prsentation
Tableau 7.0 prsentationTableau 7.0 prsentation
Tableau 7.0 prsentation
inam_slides
 
Data Science
Data ScienceData Science
Data Science
Rabin BK
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
Edureka!
 
Movie Sentiment Analysis
Movie Sentiment AnalysisMovie Sentiment Analysis
Movie Sentiment Analysis
Indian School of Business
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
Raffael Marty
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Marina Santini
 
OCR Presentation (Optical Character Recognition)
OCR Presentation (Optical Character Recognition)OCR Presentation (Optical Character Recognition)
OCR Presentation (Optical Character Recognition)
Neeraj Neupane
 
Data analytics
Data analyticsData analytics
Data analytics
Dr.Bhuvaneswari Velumani
 
Business analytics workshop presentation final
Business analytics workshop presentation   finalBusiness analytics workshop presentation   final
Business analytics workshop presentation final
Brian Beveridge
 
Medical image analysis
Medical image analysisMedical image analysis
Medical image analysis
Aboul Ella Hassanien
 
Feature Engineering.pdf
Feature Engineering.pdfFeature Engineering.pdf
Feature Engineering.pdf
Rajoo Jha
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
ankit_ppt
 
How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...
How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...
How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...
Edureka!
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
varshakumar21
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
gokulprasath06
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
Sushil Kulkarni
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Edureka!
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
Hugo Bowne-Anderson
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics Presentation
Skylar Ritchie
 
Tableau 7.0 prsentation
Tableau 7.0 prsentationTableau 7.0 prsentation
Tableau 7.0 prsentation
inam_slides
 
Data Science
Data ScienceData Science
Data Science
Rabin BK
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
Edureka!
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
Raffael Marty
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Marina Santini
 
OCR Presentation (Optical Character Recognition)
OCR Presentation (Optical Character Recognition)OCR Presentation (Optical Character Recognition)
OCR Presentation (Optical Character Recognition)
Neeraj Neupane
 
Business analytics workshop presentation final
Business analytics workshop presentation   finalBusiness analytics workshop presentation   final
Business analytics workshop presentation final
Brian Beveridge
 

Viewers also liked (8)

Text Analytics for Dummies 2010
Text Analytics for Dummies 2010Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
Seth Grimes
 
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Lucidworks
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text Analytics
Daniel Tunkelang
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
Manohar Swamynathan
 
Log Data Mining
Log Data MiningLog Data Mining
Log Data Mining
Anton Chuvakin
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
Jaganadh Gopinadhan
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log Analysis
Anton Chuvakin
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
Derek Kane
 
Text Analytics for Dummies 2010
Text Analytics for Dummies 2010Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
Seth Grimes
 
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Lucidworks
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text Analytics
Daniel Tunkelang
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
Jaganadh Gopinadhan
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log Analysis
Anton Chuvakin
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
Derek Kane
 
Ad

Similar to Predictive Text Analytics (20)

Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
Seth Grimes
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
Seth Grimes
 
Fundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptxFundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptx
aini658222
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
Sumit Sony
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text mining
IRJET Journal
 
data science and analytics in computer science
data science and analytics in computer sciencedata science and analytics in computer science
data science and analytics in computer science
uthradevia5
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
stilliegeorgiana
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
anhcrowley
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
Geohedrick
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert System
guest4513a7
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert System
Mediabistro
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
Yanchang Zhao
 
Phrase Based Indexing
Phrase Based IndexingPhrase Based Indexing
Phrase Based Indexing
balaabirami
 
Phrase Based Indexing and Information Retrivel
Phrase Based Indexing and Information RetrivelPhrase Based Indexing and Information Retrivel
Phrase Based Indexing and Information Retrivel
balaabirami
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled Vision
Seth Grimes
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
DataminingTools Inc
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
Datamining Tools
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
guest0edcaf
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
Meena Nagarajan
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
Seth Grimes
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
Seth Grimes
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
Seth Grimes
 
Fundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptxFundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptx
aini658222
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
Sumit Sony
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text mining
IRJET Journal
 
data science and analytics in computer science
data science and analytics in computer sciencedata science and analytics in computer science
data science and analytics in computer science
uthradevia5
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
stilliegeorgiana
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
anhcrowley
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
Geohedrick
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert System
guest4513a7
 
Web 3 Expert System
Web 3 Expert SystemWeb 3 Expert System
Web 3 Expert System
Mediabistro
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
Yanchang Zhao
 
Phrase Based Indexing
Phrase Based IndexingPhrase Based Indexing
Phrase Based Indexing
balaabirami
 
Phrase Based Indexing and Information Retrivel
Phrase Based Indexing and Information RetrivelPhrase Based Indexing and Information Retrivel
Phrase Based Indexing and Information Retrivel
balaabirami
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled Vision
Seth Grimes
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
guest0edcaf
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
Meena Nagarajan
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
Seth Grimes
 
Ad

More from Seth Grimes (20)

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
Seth Grimes
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
Seth Grimes
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
Seth Grimes
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Seth Grimes
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
Seth Grimes
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Seth Grimes
 
Emotion AI
Emotion AIEmotion AI
Emotion AI
Seth Grimes
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
Seth Grimes
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
Seth Grimes
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
Seth Grimes
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
Seth Grimes
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
Seth Grimes
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
Seth Grimes
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
Seth Grimes
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
Seth Grimes
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
Seth Grimes
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
Seth Grimes
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
Seth Grimes
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
Seth Grimes
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
Seth Grimes
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
Seth Grimes
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
Seth Grimes
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Seth Grimes
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
Seth Grimes
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Seth Grimes
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
Seth Grimes
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
Seth Grimes
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
Seth Grimes
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
Seth Grimes
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
Seth Grimes
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
Seth Grimes
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
Seth Grimes
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
Seth Grimes
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
Seth Grimes
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
Seth Grimes
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
Seth Grimes
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
Seth Grimes
 

Recently uploaded (20)

Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 

Predictive Text Analytics

  • 1. Predictive Text Analytics Seth Grimes Alta Plana Corporation 301-270-0795 -- http://altaplana.com -- @SethGrimes Predictive Analytics World October 20, 2009
  • 3. What is Analytics? http://www.tropicalisland.de/NYC_New_York_Brooklyn_Bridge_from_World_Trade_Center_b.jpg x(t) = t y(t) = ½ a (e t/a + e -t/a ) = acosh(t/a) http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg c:\temp\Marcus.pdf c:\temp\Marcus.pdf
  • 4. What is Analytics? "SUMLEV","STATE","COUNTY","STNAME","CTYNAME","YEAR","POPESTIMATE", 50,19,1,"Iowa","Adair County",1,8243,4036,4207,446,225,221,994,509 50,19,1,"Iowa","Adair County",2,8243,4036,4207,446,225,221,994,509 50,19,1,"Iowa","Adair County",3,8212,4020,4192,442,222,220,987,505 50,19,1,"Iowa","Adair County",4,8095,3967,4128,432,208,224,935,488 50,19,1,"Iowa","Adair County",5,8003,3924,4079,405,186,219,928,495 50,19,1,"Iowa","Adair County",6,7961,3892,4069,384,183,201,907,472 50,19,1,"Iowa","Adair County",7,7875,3855,4020,366,179,187,871,454 50,19,1,"Iowa","Adair County",8,7795,3817,3978,343,162,181,841,439 50,19,1,"Iowa","Adair County",9,7714,3777,3937,338,159,179,805,417
  • 5. www.stanford.edu/%7ernusse/wntwindow.html Axin and Frat1 interact with dvl and GSK, bridging Dvl to GSK in Wnt-mediated regulation of LEF-1. Wnt proteins transduce their signals through dishevelled (Dvl) proteins to inhibit glycogen synthase kinase 3beta (GSK), leading to the accumulation of cytosolic beta-catenin and activation of TCF/LEF-1 transcription factors. To understand the mechanism by which Dvl acts through GSK to regulate LEF-1, we investigated the roles of Axin and Frat1 in Wnt-mediated activation of LEF-1 in mammalian cells. We found that Dvl interacts with Axin and with Frat1, both of which interact with GSK. Similarly, the Frat1 homolog GBP binds Xenopus Dishevelled in an interaction that requires GSK. We also found that Dvl, Axin and GSK can form a ternary complex bridged by Axin, and that Frat1 can be recruited into this complex probably by Dvl. The observation that the Dvl-binding domain of either Frat1 or Axin was able to inhibit Wnt-1-induced LEF-1 activation suggests that the interactions between Dvl and Axin and between Dvl and Frat may be important for this signaling pathway. Furthermore, Wnt-1 appeared to promote the disintegration of the Frat1-Dvl-GSK-Axin complex, resulting in the dissociation of GSK from Axin. Thus, formation of the quaternary complex may be an important step in Wnt signaling, by which Dvl recruits Frat1, leading to Frat1-mediated dissociation of GSK from Axin. www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10428961&dopt=Abstract The “Unstructured Data” Challenge
  • 6. Modelling Text Metadata E.g., title, author, date Statistics Typically via vector space methods E.g., term frequency, cooccurrence, proximity Linguistics Lexicons, gazetteers, phrase books Word morphology, parts of speech, syntactic rules Larger-scale structure including discourse Machine learning
  • 7. “ Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.” H.P. Luhn, The Automatic Creation of Literature Abstracts , IBM Journal , 1958.
  • 8. Text Modelling Document content can be considered an unordered “bag of words.” Particular documents are points in a high-dimensional vector space. Salton, Wong & Yang, “A Vector Space Model for Automatic Indexing,” November 1975.
  • 9. Text Modelling We might construct a term- document matrix ... D1 = "I like databases" D2 = "I hate hate databases" and use a weighting such as TF-IDF (term frequency–inverse document frequency)… in computing the cosine of the angle between weighted doc-vectors to determine similarity. http://en.wikipedia.org/wiki/Term-document_matrix I like hate databases D1 1 1 0 1 D2 1 0 2 1
  • 10. Text Modelling Analytical methods make text tractable. Latent semantic indexing utilizing singular value decomposition for term reduction / feature selection. Creates a new, reduced concept space. Takes care of synonymy, polysemy, stemming, etc. Classification technologies / methods: Naive Bayes. Support Vector Machine. K-nearest neighbor.
  • 11. Text Modelling In the form of query-document similarity , this is Information Retrieval 101. See, for instance, Salton & Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” 1988. A useful basic tech paper: Russ Albright, SAS, “Taming Text with the SVD,” 2004. Given the complexity of human language, statistical models may fall short.
  • 12. Semantics “ This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.” -- Hans Peter Luhn, 1958
  • 13. New York Times , September 8, 1957 Anaphora / coreference: “They”
  • 14. Semantics Why do we need linguistics? The Dow fell 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq composite gained 6.84, or 0.32 percent, to 2,162.78. The Dow gained 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq composite fell 6.84, or 0.32 percent, to 2,162.78. (Example: Luca Scagliarini, Expert System.) John pushed Max . He fell. John pushed Max. He laughed. (Example: Laure Vieu and Patrick Saint-Dizier .)
  • 15.  
  • 16.  
  • 17.  
  • 18.  
  • 19.  
  • 20.  
  • 21. Text Analytics Text analytics automates what researchers, writers, scholars, and all the rest of us have been doing for years. Text analytics -- Applies linguistic and/or statistical techniques to extract concepts and patterns that can be applied to categorize and classify documents, audio, video, images. Transforms “unstructured” information into data for application of traditional analysis techniques. Unlocks meaning and relationships in large volumes of information that were previously unprocessable by computer. ... models text .
  • 22. Predictive Text Analytics? Let’s consider three interpretations: Predictive text analytics Prediction applied to text. Predictive analytics from text sources Analysis of information extracted from text. Predictive text analytics Clustering and classification of the text at document & feature levels.
  • 23. Predictive Text Analytics Predictive text analytics Prediction applied to text. Predictive analytics from text sources Analysis of information extracted from text. Predictive text analytics Clustering and classification of the text at document & feature levels.
  • 24. Predictive Text Basic modelling to facilitate functions such as: Completion Disambiguation : use dictionaries, context Error correction http://en.wikipedia.org/wiki/File:ITap_on_Motorola_C350.jpg
  • 25. Predictive Text Marti Hearst in Search User Interfaces : “ Search logs suggest that from 10-15% of queries contain spelling or typographical errors. Fittingly, one important query reformulation tool is spelling suggestions or corrections.”
  • 26. Predictive Text Analytics Predictive text analytics Prediction applied to text. Predictive analytics from text sources Analysis of information extracted from text. Predictive text analytics Clustering and classification of the text at document & feature levels.
  • 27. “ The bulk of information value is perceived as coming from data in relational tables. The reason is that data that is structured is easy to mine and analyze.” – Prabhakar Raghavan, Yahoo Research, former CTO of enterprise-search vendor Verity (now part of Autonomy) ‏, 2004 Yet it’s a truism that 80% of enterprise information is in “unstructured” form. Predictive Analytics, Text Sources
  • 28. Consider: E-mail, news & blog articles, microblogging, forum postings, and other social media. Contact-center notes and transcripts; recorded conversations. Surveys, feedback forms, warranty claims, case reports. And every other sort of document imaginable. These sources may contain “traditional” data. The Dow fell 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq composite gained 6.84, or 0.32 percent, to 2,162.78. Sources
  • 29. Search++ Search is typically answer #1.
  • 30. Beyond Search Text analytics extracts and classifies by – Entities: names, e-mail addresses, phone numbers Concepts: abstractions of entities. Facts and relationships. Events. Abstract attributes, e.g., “expensive,” “comfortable” Opinions, sentiments: attitudinal data. – for search indexes, knowledge bases, and databases.
  • 33. Applications Text analytics has applications in – Intelligence & law enforcement. Life sciences. Media & publishing including social-media analysis. Competitive intelligence. Voice of the Customer: CRM, product management & marketing. Legal, tax & regulatory, compliance. HR & recruiting. Great lift potential when coupled with transactional & operational data.
  • 34. Predictive Text Analytics Predictive text analytics Prediction applied to text. Predictive analytics from text sources Analysis of information extracted from text. Predictive text analytics Clustering and classification of the text at document & feature levels.
  • 35. Document processing -- This slide and the next show dynamic, clustered search results from Grokker (now gone)…
  • 36. … with a zoomable display. Clustering here identifies cohesive groupings of retrieved documents.
  • 37.  
  • 38. Sentiment Analysis “ Sentiment analysis is the task of identifying positive and negative opinions, emotions, and evaluations.” -- Wilson, Wiebe & Hoffman, 2005, “ Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis” Steps include: 1) detection, 2) classification, 3) measurement: WW&H (for example) used over 8,000 subjectivity indicators. Polarity: positive, negative, (both,) or neutral. Intensity.
  • 39. Complications. Levels: Corpus / data space, i.e., across multiple sources. Document. Statement / sentence. Entity / topic / concept. Language characteristics: Jargon, slang, irony, ambiguity, anaphora, polysemy, synonymy, etc. Context is key. Discourse analysis comes into play. Sentiment holder ≠ object: Geithner said unemployment will worsen… Sentiment Analysis
  • 40. Steps in the Right Direction
  • 41.  
  • 42. Unfiltered duplicates External reference “ Kind” = type, variety, not sentiment. Naïve misclassification ... and Missteps
  • 43. “ We present a system that adds an emotional dimension to an activity that Internet users engage in frequently, search..” -- Sood & Vasserman & Hoffman, 2009, “ ESSE : Exploring Mood on the Web” Beyond Polarity
  • 44. Happy Sad Angry Energetic Confused Aggravated Bouncy Crappy Angry Happy Crushed Bitchy Hyper Depressed Enraged Cheerful Distressed Infuriated Ecstatic Envious Irate Excited Gloomy Pissed off Jubilant Guilty Giddy Intimidated Giggly Jealous Lonely Rejected Sad Scared ----------------------- The three prominent mood groups that emerged from K-Means Clustering on the set of LiveJournal mood labels.
  • 45. Predictive Text Analytics Predictive text analytics Prediction applied to text. Predictive analytics from text sources Analysis of information extracted from text. Predictive text analytics Clustering and classification of the text at document & feature levels.
  • 46. Seth Grimes Alta Plana Corporation 301-270-0795 – http://altaplana.com @SethGrimes

Editor's Notes

  • #33: This course is, in essence, about the information enterprises have and how they use it and how they could better use it. First we look at enterprise information in light of business goals in order to characterize the “unstructured” information gap. We then look at how that information, or at least the textual variety, may be structured for use. Then we look at a few uses, at enriching search, surely one of today’s killer apps, and at enhancing business intelligence via search.