SlideShare a Scribd company logo
Text Analytics for DummiesSeth GrimesAlta Plana Corporation@sethgrimes– 301-270-0795 -- http://altaplana.comText Analytics Summit 2010WorkshopMay 24, 2010
IntroductionSeth Grimes –Principal Consultant with Alta Plana Corporation.Contributing Editor, IntelligentEnterprise.com.Channel Expert, BeyeNETWORK.com.Contributor, KDnuggets.com.Instructor, The Data Warehousing Institute, tdwi.org.Founding Chair, Sentiment Analysis Symposium.Founding Chair, Text Analytics Summit.
PerspectivesPerspective #1: You’re a business analyst or other “end user” or a consultant/integrator.You (or your clients) have lots of text.  You want an automated way to deal with it.Perspective #2: You work in IT.You support end users who have lots of text.Perspective #3: You work for a solution provider.Welcome to my Reeducation Camp.Perspective #4:  Other?You just want to learn about text analytics.
Value in Data“The bulk of information value is perceived as coming from data in relational tables. The reason is that data that is structured is easy to mine and analyze.”-- Prabhakar Raghavan, Yahoo ResearchYet it’s a truism that 80% of enterprise-relevant information originates in “unstructured” form.
Unstructured SourcesConsider:Web pages, E-mail, news & blog articles, forum postings, and other social media.Contact-center notes and transcripts.Surveys, feedback forms, warranty claims.And every kind of corporate documents imaginable.These sources may contain “traditional” data.The Dow fell 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq composite gained 6.84, or 0.32 percent, to 2,162.78.
Key Message -- #1If you are not analyzing text – if you're analyzing only transactional information – you're missing opportunity or incurring risk...“Industries such as travel and hospitality and retail live and die on customer experience.” -- Clarabridge CEO Sid BanerjeeThis is why you’re here.It’s the “Unstructured Data” challenge.
Key Message -- #2Text analytics can boost business results...“Organizations embracing text analytics all report having an epiphany moment when they suddenly knew more than before.” -- Philip Russom, the Data Warehousing Institute...via established BI / data-mining programs, or independently.Text Analytics is an answer to the “Unstructured Data” challenge
Key Message -- #3Some folks may need to expand their views of what BI and business analytics are about.Others can do text analytics without worrying about BI or data mining.Let’s deal with text-BI first...
Text-BI: Back to the FutureBusiness intelligence (BI) as defined in 1958:In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera...  The notion of intelligence is also defined here... as “the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.”-- Hans Peter Luhn, “A Business Intelligence System,”IBM Journal, October 1958
Document input and processingKnowledge handling is key
Business IntelligenceTraditional BI feeds off:Traditional BI feeds off:"SUMLEV","STATE","COUNTY","STNAME","CTYNAME","YEAR","POPESTIMATE",50,19,1,"Iowa","Adair County",1,8243,4036,4207,446,225,221,994,50950,19,1,"Iowa","Adair County",2,8243,4036,4207,446,225,221,994,50950,19,1,"Iowa","Adair County",3,8212,4020,4192,442,222,220,987,50550,19,1,"Iowa","Adair County",4,8095,3967,4128,432,208,224,935,48850,19,1,"Iowa","Adair County",5,8003,3924,4079,405,186,219,928,49550,19,1,"Iowa","Adair County",6,7961,3892,4069,384,183,201,907,47250,19,1,"Iowa","Adair County",7,7875,3855,4020,366,179,187,871,45450,19,1,"Iowa","Adair County",8,7795,3817,3978,343,162,181,841,43950,19,1,"Iowa","Adair County",9,7714,3777,3937,338,159,179,805,417It runs off:
Business IntelligenceTraditional BI produces:http://www.pentaho.com/products/dashboards/
Unstructured SourcesSome information doesn’t come from a data file.Axin and Frat1 interact with dvl and GSK, bridging Dvl to GSK in Wnt-mediated regulation of LEF-1.Wnt proteins transduce their signals through dishevelled (Dvl) proteins to inhibit glycogen synthase kinase 3beta (GSK), leading to the accumulation of cytosolic beta-catenin and activation of TCF/LEF-1 transcription factors. To understand the mechanism by which Dvl acts through GSK to regulate LEF-1, we investigated the roles of Axin and Frat1 in Wnt-mediated activation of LEF-1 in mammalian cells. We found that Dvl interacts with Axin and with Frat1, both of which interact with GSK. Similarly, the Frat1 homolog GBP binds Xenopus Dishevelled in an interaction that requires GSK. We also found that Dvl, Axin and GSK can form a ternary complex bridged by Axin, and that Frat1 can be recruited into this complex probably by Dvl. The observation that the Dvl-binding domain of either Frat1 or Axin was able to inhibit Wnt-1-induced LEF-1 activation suggests that the interactions between Dvl and Axin and between Dvl and Frat may be important for this signaling pathway. Furthermore, Wnt-1 appeared to promote the disintegration of the Frat1-Dvl-GSK-Axin complex, resulting in the dissociation of GSK from Axin. Thus, formation of the quaternary complex may be an important step in Wnt signaling, by which Dvl recruits Frat1, leading to Frat1-mediated dissociation of GSK from Axin.www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10428961&dopt=Abstractwww.stanford.edu/%7ernusse/wntwindow.html
Unstructured SourcesSources may mix fact and sentiment:When you walk in the foyer of the hotel it seems quite inviting but the room was very basis and smelt very badly of stale cigarette smoke, it would have been nice to be asked if we wanted a non smoking room, I know the room was very cheap but I found this very off putting to have to sleep with the smell, and it was to cold to leave the window open. Excellent location for restaurants and barsOverall I would never sell/buy a Motorola V3 unless it is demanded. My life would be way better without this phone being around (I am being 100% serious) Motorola should pay me directly for all the problems I have had with these phones. :-(
Text and ApplicationsWhat do people do with electronic documents?Publish, Manage, and Archive.Index and Search.Categorize and Classify according to metadata & contents.Information Extraction.For textual documents, text analytics enhances #1 & #2 and enables #3 & #4.You need linguistics to do #1 & #4 well, to deal with Semantics.Search is not enough...
SearchSearch, a.k.a. Information Retrieval, is just a start.Search doesn’t help you discover things you’re unaware of.Search results often lack relevance.Search finds documents, not knowledge.Articles from a forum siteArticles from 1987
Search + SemanticsText analytics adds semantic understanding of –Entities: names, e-mail addresses, phone numbers.Concepts: abstractions of entities.Facts and relationships.Abstract attributes, e.g., “expensive,” “comfortable.”Opinions, sentiments: attitudinal information.
Information AccessText analytics enables results that suit the information and the user, e.g., answers –
Presentation of search results can be enhanced by knowledge discovery, e.g., clustering.touchgraph.com/ TGGoogleBrowser.php?start=text%20analytics
Information AccessText analytics transforms Information Retrieval (IR) into Information Access (IA).Search terms become queries.Indexed pages are mined for larger-scale structure, for instance, information categories.Search results are presented intelligently.Capabilities include Information Extraction (IE).Text analytics ≈ text data mining.
Beyond SearchText Mining = Data Mining of textual sources.Clustering and Classification.Link Analysis.Association Rules.Predictive Modelling.Regression.Forecasting.Text Mining = Knowledge Discovery in Text.
Text Analytics Uncovers Structure
Text Analytics DefinitionText analytics automates what researchers, writers, scholars, and all the rest of us have been doing for years.  Text analytics --Applies linguistic and/or statistical techniques to extract concepts and patterns that can be applied to categorize and classify documents, audio, video, images.Transforms “unstructured” information into data for application of traditional analysis techniques.Unlocks meaning and relationships in large volumes of information that were previously unprocessable by computer.
Text Analytics PipelineTypical steps in text analytics include –Retrieve documents for analysis. Apply statistical &/ linguistic &/ structural techniques to identify, tag, and extract entities, concepts, relationships, and events (features) within document sets.Apply statistical pattern-matching & similarity techniques to classify documents and organize extracted features according to a specified or generated categorization / taxonomy.– via a pipeline of statistical & linguistic steps.Let’s look at them...
Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
“Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.”H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
Text ModellingThe text content of a document can be considered an unordered “bag of words.”Particular documents are points in a high-dimensional vector space.Salton, Wong & Yang, “A Vector Space Model for Automatic Indexing,” November 1975.
Text ModellingWe might construct a document-term matrix...D1 = "I like databases"D2 = "I hate hate databases"and use a weighting such as TF-IDF (term frequency–inverse document frequency)…in computing the cosine of the angle between weighted doc-vectors to determine similarity.http://en.wikipedia.org/wiki/Term-document_matrix
Text ModellingAnalytical methods make text tractable.Latent semantic indexing utilizing singular value decomposition for term reduction / feature selection.Creates a new, reduced concept space.Takes care of synonymy, polysemy, stemming, etc.Classification technologies / methods:Naive Bayes.Support Vector Machine.K-nearest neighbor.
Text ModellingIn the form of query-document similarity, this is Information Retrieval 101.See, for instance, Salton & Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” 1988.If we want to get more out of text, we have to do still more...
“Tri-grams” here are pretty good at describing the Whatness of the source text. Yet...“This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.”-- Hans Peter Luhn, 1958
Why Do We Need Linguistics?The Dow fell 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index gained1.44, or 0.11 percent, to 1,263.85.The Dow gained 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85.John pushed Max. He fell.John pushed Max. He laughed.Time flies like an arrow.  Fruit flies like a banana.(Luca Scagliarini, Expert System; Laure Vieu and Patrick Saint-Dizier; Groucho Marx.)
New York Times,September 8, 1957Anaphora / coreference: “They”
Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
Information ExtractionWhen we understand, for instance, parts of speech (POS) – <subject> <verb> <object> – we’re in a position to discern facts and relationships...
Text Analytics for Dummies 2010
Information ExtractionLet's see text augmentation (tagging) in action.  We'll use GATE, an open-source tool, text from sentiment-analysis article used earlier...
Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
Information ExtractionFor content analysis, key in on extracting information.Annotated text is typically marked up with XML.If extraction to databases: Entities and concepts (features) are like dimensions in a standard BI model.  Both classes of object are hierarchically organized and have attributes.We can have both discovered and predetermined classifications (taxonomies) of text features.
An IBM representation: “The standard features are stored in the STANDARD_KW table, keywords with their occurrences in the KEYWORD_KW_OCC table, and the text list features in the TEXTLIST_TEXT table. Every feature table contains the DOC_ID as a reference to the DOCUMENT table.”http://www.ibm.com/developerworks/db2/library/techarticle/dm-0804nicola/
Semi-Structured SourcesAn e-mail message is “semi-structured,” which facilitates extracting metadata --Date: Sun, 13 Mar 2005 19:58:39 -0500From: Adam L. Buchsbaum <alb@research.att.com>To: Seth Grimes <grimes@altaplana.com>Subject: Re: Papers on analysis on streaming dataseth, you should contact diveshsrivastava, divesh@research.att.comregarding at&t labs data streaming technology.AdamSurveys are also typically s-s in a different way...
Structured &‘Unstructured’ InformationThe respondent is invited to explain his/her attitude:
Structured &‘Unstructured’ InformationWe typically look at frequencies and distributions of coded-response questions:Linkage of responses to coded ratings helps in analyses.
Sentiment Analysis“Sentiment analysis is the task of identifying positive and negative opinions, emotions, and evaluations.” -- Wilson, Wiebe & Hoffman, 2005, “Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis”From Dell’s IdeaStorm.com --“Dell really... REALLY need to stop overcharging... and when i say overcharing... i mean atleast double what you would pay to pick up the ram yourself.”
Sentiment AnalysisApplications include:Brand / Reputation Management.Competitive intelligence.Customer Experience Management.Enterprise Feedback Management.Quality improvement.Trend spotting.
Steps in the Right Direction
Text Analytics for Dummies 2010
... And Missteps“Kind” = type, variety, not a sentiment.Complete misclassificationExternal referenceUnfiltered duplicates
Sentiment ComplicationsThere are many complications.Sentiment may be of interest at multiple levels.Corpus / data space, i.e., across multiple sources.Document.Statement / sentence.Entity / topic / concept.Human language is noisy and chaotic!Jargon, slang, irony, ambiguity, anaphora, polysemy, synonymy, etc.Context is key.  Discourse analysis comes into play.Must distinguish the sentiment holder from the object: Greenspan said the recession will…
ApplicationsText analytics has applications in –Intelligence & law enforcement.Life sciences.Media & publishing including social-media analysis and contextual advertizing.Competitive intelligence.Voice of the Customer: CRM, product management & marketing.Legal, tax & regulatory (LTR) including compliance.Recruiting.
Getting to Web 3.0Text analytics enables Web 3.0 and the Semantic Web.Automated content categorization and classification.Text augmentation: metadata generation, content tagging.Information extraction to databases.Exploratory analysis and visualization.
Users’ PerspectiveI estimate a $425 million global market in 2009, up from $350 in 2008.  I foresee 25% growth in 2010.Last year, I published a study report, “Text Analytics 2009: User Perspectives on Solutions and Providers.”I relayed findings from a survey that asked…
Primary ApplicationsWhat are your primary applications where text comes into play?
AnalyzedTextual InformationWhat textual information are you analyzing or do you plan to analyze?Current users responded:
ExtractedInformaitonDo you need (or expect to need) to extract or analyze:
Questions?Discussion?Thanks!
Text Analytics for DummiesSeth GrimesAlta Plana Corporation@sethgrimes– 301-270-0795 -- http://altaplana.comText Analytics Summit 2010WorkshopMay 24, 2010
Ad

More Related Content

What's hot (20)

Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
George Roth
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
Derek Kane
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured Data
Christine Connors
 
Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)
Seth Grimes
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
El Habib NFAOUI
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
CloudTechnologies
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
Pvrtechnologies Nellore
 
Finding Influential People from Historical News Repository (Master's thesis w...
Finding Influential People from Historical News Repository (Master's thesis w...Finding Influential People from Historical News Repository (Master's thesis w...
Finding Influential People from Historical News Repository (Master's thesis w...
Aayushee Gupta
 
Directed versus undirected network analysis of student essays
Directed versus undirected network analysis of student essaysDirected versus undirected network analysis of student essays
Directed versus undirected network analysis of student essays
Roy Clariana
 
Approaches for Keyword Query Routing
Approaches for Keyword Query RoutingApproaches for Keyword Query Routing
Approaches for Keyword Query Routing
IJERA Editor
 
Mind the Semantic Gap
Mind the Semantic GapMind the Semantic Gap
Mind the Semantic Gap
Panos Alexopoulos
 
How many truths can you handle?
How many truths can you handle?How many truths can you handle?
How many truths can you handle?
Panos Alexopoulos
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
NBER
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
guest0edcaf
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Derek Kane
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
Paul Wlodarczyk
 
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
ijtsrd
 
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
IJECEIAES
 
Zomato eda report
Zomato eda reportZomato eda report
Zomato eda report
vidit jain
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics Today
Seth Grimes
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
George Roth
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
Derek Kane
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured Data
Christine Connors
 
Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)
Seth Grimes
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
El Habib NFAOUI
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
CloudTechnologies
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
Pvrtechnologies Nellore
 
Finding Influential People from Historical News Repository (Master's thesis w...
Finding Influential People from Historical News Repository (Master's thesis w...Finding Influential People from Historical News Repository (Master's thesis w...
Finding Influential People from Historical News Repository (Master's thesis w...
Aayushee Gupta
 
Directed versus undirected network analysis of student essays
Directed versus undirected network analysis of student essaysDirected versus undirected network analysis of student essays
Directed versus undirected network analysis of student essays
Roy Clariana
 
Approaches for Keyword Query Routing
Approaches for Keyword Query RoutingApproaches for Keyword Query Routing
Approaches for Keyword Query Routing
IJERA Editor
 
How many truths can you handle?
How many truths can you handle?How many truths can you handle?
How many truths can you handle?
Panos Alexopoulos
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
NBER
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
guest0edcaf
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Derek Kane
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
Paul Wlodarczyk
 
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
ijtsrd
 
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
IJECEIAES
 
Zomato eda report
Zomato eda reportZomato eda report
Zomato eda report
vidit jain
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics Today
Seth Grimes
 

Viewers also liked (19)

Text Analytics
Text Analytics Text Analytics
Text Analytics
Nicolas Morales
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web Data
Pier Luca Lanzi
 
Reverend billy
Reverend billy Reverend billy
Reverend billy
a_antonyan
 
singley+mackie Capabilities Deck
singley+mackie Capabilities Decksingley+mackie Capabilities Deck
singley+mackie Capabilities Deck
Matthew Levine
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text Analytics
Daniel Tunkelang
 
Text Similarities - PG Pushpin
Text Similarities - PG PushpinText Similarities - PG Pushpin
Text Similarities - PG Pushpin
jsurve
 
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
guest5b1607
 
Hasil kajian Big Data di Indonesia
Hasil kajian Big Data di IndonesiaHasil kajian Big Data di Indonesia
Hasil kajian Big Data di Indonesia
Heru Sutadi
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
Manohar Swamynathan
 
Text Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewText Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry View
Seth Grimes
 
Log Data Mining
Log Data MiningLog Data Mining
Log Data Mining
Anton Chuvakin
 
Text Analytics Presentation LinkedIn
Text Analytics Presentation LinkedInText Analytics Presentation LinkedIn
Text Analytics Presentation LinkedIn
Stacy Faught
 
When to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning CloudWhen to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning Cloud
MeaningCloud
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
Jaganadh Gopinadhan
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
Seth Grimes
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log Analysis
Anton Chuvakin
 
Writing A Literary Analysis Essay
Writing A Literary Analysis EssayWriting A Literary Analysis Essay
Writing A Literary Analysis Essay
Prof S
 
Critical Discourse Analysis adel thamery
 Critical Discourse Analysis adel thamery Critical Discourse Analysis adel thamery
Critical Discourse Analysis adel thamery
Adel Thamery
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web Data
Pier Luca Lanzi
 
Reverend billy
Reverend billy Reverend billy
Reverend billy
a_antonyan
 
singley+mackie Capabilities Deck
singley+mackie Capabilities Decksingley+mackie Capabilities Deck
singley+mackie Capabilities Deck
Matthew Levine
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text Analytics
Daniel Tunkelang
 
Text Similarities - PG Pushpin
Text Similarities - PG PushpinText Similarities - PG Pushpin
Text Similarities - PG Pushpin
jsurve
 
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
guest5b1607
 
Hasil kajian Big Data di Indonesia
Hasil kajian Big Data di IndonesiaHasil kajian Big Data di Indonesia
Hasil kajian Big Data di Indonesia
Heru Sutadi
 
Text Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewText Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry View
Seth Grimes
 
Text Analytics Presentation LinkedIn
Text Analytics Presentation LinkedInText Analytics Presentation LinkedIn
Text Analytics Presentation LinkedIn
Stacy Faught
 
When to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning CloudWhen to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning Cloud
MeaningCloud
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
Jaganadh Gopinadhan
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
Seth Grimes
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log Analysis
Anton Chuvakin
 
Writing A Literary Analysis Essay
Writing A Literary Analysis EssayWriting A Literary Analysis Essay
Writing A Literary Analysis Essay
Prof S
 
Critical Discourse Analysis adel thamery
 Critical Discourse Analysis adel thamery Critical Discourse Analysis adel thamery
Critical Discourse Analysis adel thamery
Adel Thamery
 
Ad

Similar to Text Analytics for Dummies 2010 (20)

Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14
Steven Toole
 
ADGS Computer Systems
ADGS Computer SystemsADGS Computer Systems
ADGS Computer Systems
Natalya Rostun, MBA
 
ADGS Computer Systems
ADGS Computer SystemsADGS Computer Systems
ADGS Computer Systems
Natalya Rostun, MBA
 
ADGS Computer Systems
ADGS Computer SystemsADGS Computer Systems
ADGS Computer Systems
Natalya Rostun, MBA
 
ADGS Computer Systems
ADGS Computer SystemsADGS Computer Systems
ADGS Computer Systems
Natalya Rostun, MBA
 
365 Data Science
365 Data Science365 Data Science
365 Data Science
IvanHo572682
 
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Dan Keldsen
 
Barga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learningBarga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learning
maldonadojorge
 
What is business analytics
What is business analyticsWhat is business analytics
What is business analytics
Sherpa Consulting
 
Accelerate Data Discovery
Accelerate Data Discovery   Accelerate Data Discovery
Accelerate Data Discovery
Attivio
 
Language First Protocol from QSi
Language First Protocol from QSiLanguage First Protocol from QSi
Language First Protocol from QSi
John O'Gorman
 
12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics
Seth Grimes
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
Diana Maynard
 
Big data Analytics Fundamentals Chapter 1
Big data Analytics Fundamentals Chapter 1Big data Analytics Fundamentals Chapter 1
Big data Analytics Fundamentals Chapter 1
karpagavalli38
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)
Truong Bomi
 
What Is Unstructured Data and Why Is It Essential for Business Success.pdf
What Is Unstructured Data and Why Is It Essential for Business Success.pdfWhat Is Unstructured Data and Why Is It Essential for Business Success.pdf
What Is Unstructured Data and Why Is It Essential for Business Success.pdf
Data Scraping and Data Extraction
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
Shailja Khurana
 
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Gianluca Tarasconi
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-final
jcscholtes
 
The analytics-stack-guidebook
The analytics-stack-guidebookThe analytics-stack-guidebook
The analytics-stack-guidebook
Ashish Tiwari
 
Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14
Steven Toole
 
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Dan Keldsen
 
Barga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learningBarga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learning
maldonadojorge
 
Accelerate Data Discovery
Accelerate Data Discovery   Accelerate Data Discovery
Accelerate Data Discovery
Attivio
 
Language First Protocol from QSi
Language First Protocol from QSiLanguage First Protocol from QSi
Language First Protocol from QSi
John O'Gorman
 
12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics
Seth Grimes
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
Diana Maynard
 
Big data Analytics Fundamentals Chapter 1
Big data Analytics Fundamentals Chapter 1Big data Analytics Fundamentals Chapter 1
Big data Analytics Fundamentals Chapter 1
karpagavalli38
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)
Truong Bomi
 
What Is Unstructured Data and Why Is It Essential for Business Success.pdf
What Is Unstructured Data and Why Is It Essential for Business Success.pdfWhat Is Unstructured Data and Why Is It Essential for Business Success.pdf
What Is Unstructured Data and Why Is It Essential for Business Success.pdf
Data Scraping and Data Extraction
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
Shailja Khurana
 
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Gianluca Tarasconi
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-final
jcscholtes
 
The analytics-stack-guidebook
The analytics-stack-guidebookThe analytics-stack-guidebook
The analytics-stack-guidebook
Ashish Tiwari
 
Ad

More from Seth Grimes (20)

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
Seth Grimes
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
Seth Grimes
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
Seth Grimes
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Seth Grimes
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
Seth Grimes
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Seth Grimes
 
Emotion AI
Emotion AIEmotion AI
Emotion AI
Seth Grimes
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
Seth Grimes
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
Seth Grimes
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
Seth Grimes
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
Seth Grimes
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
Seth Grimes
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
Seth Grimes
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
Seth Grimes
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
Seth Grimes
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
Seth Grimes
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
Seth Grimes
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
Seth Grimes
 
Social Data Sentiment Analysis
Social Data Sentiment AnalysisSocial Data Sentiment Analysis
Social Data Sentiment Analysis
Seth Grimes
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
Seth Grimes
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
Seth Grimes
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
Seth Grimes
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Seth Grimes
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
Seth Grimes
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Seth Grimes
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
Seth Grimes
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
Seth Grimes
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
Seth Grimes
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
Seth Grimes
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
Seth Grimes
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
Seth Grimes
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
Seth Grimes
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
Seth Grimes
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
Seth Grimes
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
Seth Grimes
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
Seth Grimes
 
Social Data Sentiment Analysis
Social Data Sentiment AnalysisSocial Data Sentiment Analysis
Social Data Sentiment Analysis
Seth Grimes
 

Recently uploaded (20)

Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 

Text Analytics for Dummies 2010

  • 1. Text Analytics for DummiesSeth GrimesAlta Plana Corporation@sethgrimes– 301-270-0795 -- http://altaplana.comText Analytics Summit 2010WorkshopMay 24, 2010
  • 2. IntroductionSeth Grimes –Principal Consultant with Alta Plana Corporation.Contributing Editor, IntelligentEnterprise.com.Channel Expert, BeyeNETWORK.com.Contributor, KDnuggets.com.Instructor, The Data Warehousing Institute, tdwi.org.Founding Chair, Sentiment Analysis Symposium.Founding Chair, Text Analytics Summit.
  • 3. PerspectivesPerspective #1: You’re a business analyst or other “end user” or a consultant/integrator.You (or your clients) have lots of text. You want an automated way to deal with it.Perspective #2: You work in IT.You support end users who have lots of text.Perspective #3: You work for a solution provider.Welcome to my Reeducation Camp.Perspective #4: Other?You just want to learn about text analytics.
  • 4. Value in Data“The bulk of information value is perceived as coming from data in relational tables. The reason is that data that is structured is easy to mine and analyze.”-- Prabhakar Raghavan, Yahoo ResearchYet it’s a truism that 80% of enterprise-relevant information originates in “unstructured” form.
  • 5. Unstructured SourcesConsider:Web pages, E-mail, news & blog articles, forum postings, and other social media.Contact-center notes and transcripts.Surveys, feedback forms, warranty claims.And every kind of corporate documents imaginable.These sources may contain “traditional” data.The Dow fell 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85, and the Nasdaq composite gained 6.84, or 0.32 percent, to 2,162.78.
  • 6. Key Message -- #1If you are not analyzing text – if you're analyzing only transactional information – you're missing opportunity or incurring risk...“Industries such as travel and hospitality and retail live and die on customer experience.” -- Clarabridge CEO Sid BanerjeeThis is why you’re here.It’s the “Unstructured Data” challenge.
  • 7. Key Message -- #2Text analytics can boost business results...“Organizations embracing text analytics all report having an epiphany moment when they suddenly knew more than before.” -- Philip Russom, the Data Warehousing Institute...via established BI / data-mining programs, or independently.Text Analytics is an answer to the “Unstructured Data” challenge
  • 8. Key Message -- #3Some folks may need to expand their views of what BI and business analytics are about.Others can do text analytics without worrying about BI or data mining.Let’s deal with text-BI first...
  • 9. Text-BI: Back to the FutureBusiness intelligence (BI) as defined in 1958:In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera... The notion of intelligence is also defined here... as “the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.”-- Hans Peter Luhn, “A Business Intelligence System,”IBM Journal, October 1958
  • 10. Document input and processingKnowledge handling is key
  • 11. Business IntelligenceTraditional BI feeds off:Traditional BI feeds off:"SUMLEV","STATE","COUNTY","STNAME","CTYNAME","YEAR","POPESTIMATE",50,19,1,"Iowa","Adair County",1,8243,4036,4207,446,225,221,994,50950,19,1,"Iowa","Adair County",2,8243,4036,4207,446,225,221,994,50950,19,1,"Iowa","Adair County",3,8212,4020,4192,442,222,220,987,50550,19,1,"Iowa","Adair County",4,8095,3967,4128,432,208,224,935,48850,19,1,"Iowa","Adair County",5,8003,3924,4079,405,186,219,928,49550,19,1,"Iowa","Adair County",6,7961,3892,4069,384,183,201,907,47250,19,1,"Iowa","Adair County",7,7875,3855,4020,366,179,187,871,45450,19,1,"Iowa","Adair County",8,7795,3817,3978,343,162,181,841,43950,19,1,"Iowa","Adair County",9,7714,3777,3937,338,159,179,805,417It runs off:
  • 12. Business IntelligenceTraditional BI produces:http://www.pentaho.com/products/dashboards/
  • 13. Unstructured SourcesSome information doesn’t come from a data file.Axin and Frat1 interact with dvl and GSK, bridging Dvl to GSK in Wnt-mediated regulation of LEF-1.Wnt proteins transduce their signals through dishevelled (Dvl) proteins to inhibit glycogen synthase kinase 3beta (GSK), leading to the accumulation of cytosolic beta-catenin and activation of TCF/LEF-1 transcription factors. To understand the mechanism by which Dvl acts through GSK to regulate LEF-1, we investigated the roles of Axin and Frat1 in Wnt-mediated activation of LEF-1 in mammalian cells. We found that Dvl interacts with Axin and with Frat1, both of which interact with GSK. Similarly, the Frat1 homolog GBP binds Xenopus Dishevelled in an interaction that requires GSK. We also found that Dvl, Axin and GSK can form a ternary complex bridged by Axin, and that Frat1 can be recruited into this complex probably by Dvl. The observation that the Dvl-binding domain of either Frat1 or Axin was able to inhibit Wnt-1-induced LEF-1 activation suggests that the interactions between Dvl and Axin and between Dvl and Frat may be important for this signaling pathway. Furthermore, Wnt-1 appeared to promote the disintegration of the Frat1-Dvl-GSK-Axin complex, resulting in the dissociation of GSK from Axin. Thus, formation of the quaternary complex may be an important step in Wnt signaling, by which Dvl recruits Frat1, leading to Frat1-mediated dissociation of GSK from Axin.www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10428961&dopt=Abstractwww.stanford.edu/%7ernusse/wntwindow.html
  • 14. Unstructured SourcesSources may mix fact and sentiment:When you walk in the foyer of the hotel it seems quite inviting but the room was very basis and smelt very badly of stale cigarette smoke, it would have been nice to be asked if we wanted a non smoking room, I know the room was very cheap but I found this very off putting to have to sleep with the smell, and it was to cold to leave the window open. Excellent location for restaurants and barsOverall I would never sell/buy a Motorola V3 unless it is demanded. My life would be way better without this phone being around (I am being 100% serious) Motorola should pay me directly for all the problems I have had with these phones. :-(
  • 15. Text and ApplicationsWhat do people do with electronic documents?Publish, Manage, and Archive.Index and Search.Categorize and Classify according to metadata & contents.Information Extraction.For textual documents, text analytics enhances #1 & #2 and enables #3 & #4.You need linguistics to do #1 & #4 well, to deal with Semantics.Search is not enough...
  • 16. SearchSearch, a.k.a. Information Retrieval, is just a start.Search doesn’t help you discover things you’re unaware of.Search results often lack relevance.Search finds documents, not knowledge.Articles from a forum siteArticles from 1987
  • 17. Search + SemanticsText analytics adds semantic understanding of –Entities: names, e-mail addresses, phone numbers.Concepts: abstractions of entities.Facts and relationships.Abstract attributes, e.g., “expensive,” “comfortable.”Opinions, sentiments: attitudinal information.
  • 18. Information AccessText analytics enables results that suit the information and the user, e.g., answers –
  • 19. Presentation of search results can be enhanced by knowledge discovery, e.g., clustering.touchgraph.com/ TGGoogleBrowser.php?start=text%20analytics
  • 20. Information AccessText analytics transforms Information Retrieval (IR) into Information Access (IA).Search terms become queries.Indexed pages are mined for larger-scale structure, for instance, information categories.Search results are presented intelligently.Capabilities include Information Extraction (IE).Text analytics ≈ text data mining.
  • 21. Beyond SearchText Mining = Data Mining of textual sources.Clustering and Classification.Link Analysis.Association Rules.Predictive Modelling.Regression.Forecasting.Text Mining = Knowledge Discovery in Text.
  • 23. Text Analytics DefinitionText analytics automates what researchers, writers, scholars, and all the rest of us have been doing for years. Text analytics --Applies linguistic and/or statistical techniques to extract concepts and patterns that can be applied to categorize and classify documents, audio, video, images.Transforms “unstructured” information into data for application of traditional analysis techniques.Unlocks meaning and relationships in large volumes of information that were previously unprocessable by computer.
  • 24. Text Analytics PipelineTypical steps in text analytics include –Retrieve documents for analysis. Apply statistical &/ linguistic &/ structural techniques to identify, tag, and extract entities, concepts, relationships, and events (features) within document sets.Apply statistical pattern-matching & similarity techniques to classify documents and organize extracted features according to a specified or generated categorization / taxonomy.– via a pipeline of statistical & linguistic steps.Let’s look at them...
  • 27. “Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.”H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
  • 28. Text ModellingThe text content of a document can be considered an unordered “bag of words.”Particular documents are points in a high-dimensional vector space.Salton, Wong & Yang, “A Vector Space Model for Automatic Indexing,” November 1975.
  • 29. Text ModellingWe might construct a document-term matrix...D1 = "I like databases"D2 = "I hate hate databases"and use a weighting such as TF-IDF (term frequency–inverse document frequency)…in computing the cosine of the angle between weighted doc-vectors to determine similarity.http://en.wikipedia.org/wiki/Term-document_matrix
  • 30. Text ModellingAnalytical methods make text tractable.Latent semantic indexing utilizing singular value decomposition for term reduction / feature selection.Creates a new, reduced concept space.Takes care of synonymy, polysemy, stemming, etc.Classification technologies / methods:Naive Bayes.Support Vector Machine.K-nearest neighbor.
  • 31. Text ModellingIn the form of query-document similarity, this is Information Retrieval 101.See, for instance, Salton & Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” 1988.If we want to get more out of text, we have to do still more...
  • 32. “Tri-grams” here are pretty good at describing the Whatness of the source text. Yet...“This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.”-- Hans Peter Luhn, 1958
  • 33. Why Do We Need Linguistics?The Dow fell 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index gained1.44, or 0.11 percent, to 1,263.85.The Dow gained 46.58, or 0.42 percent, to 11,002.14. The Standard & Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85.John pushed Max. He fell.John pushed Max. He laughed.Time flies like an arrow. Fruit flies like a banana.(Luca Scagliarini, Expert System; Laure Vieu and Patrick Saint-Dizier; Groucho Marx.)
  • 34. New York Times,September 8, 1957Anaphora / coreference: “They”
  • 38. Information ExtractionWhen we understand, for instance, parts of speech (POS) – <subject> <verb> <object> – we’re in a position to discern facts and relationships...
  • 40. Information ExtractionLet's see text augmentation (tagging) in action. We'll use GATE, an open-source tool, text from sentiment-analysis article used earlier...
  • 44. Information ExtractionFor content analysis, key in on extracting information.Annotated text is typically marked up with XML.If extraction to databases: Entities and concepts (features) are like dimensions in a standard BI model. Both classes of object are hierarchically organized and have attributes.We can have both discovered and predetermined classifications (taxonomies) of text features.
  • 45. An IBM representation: “The standard features are stored in the STANDARD_KW table, keywords with their occurrences in the KEYWORD_KW_OCC table, and the text list features in the TEXTLIST_TEXT table. Every feature table contains the DOC_ID as a reference to the DOCUMENT table.”http://www.ibm.com/developerworks/db2/library/techarticle/dm-0804nicola/
  • 46. Semi-Structured SourcesAn e-mail message is “semi-structured,” which facilitates extracting metadata --Date: Sun, 13 Mar 2005 19:58:39 -0500From: Adam L. Buchsbaum <[email protected]>To: Seth Grimes <[email protected]>Subject: Re: Papers on analysis on streaming dataseth, you should contact diveshsrivastava, [email protected] at&t labs data streaming technology.AdamSurveys are also typically s-s in a different way...
  • 47. Structured &‘Unstructured’ InformationThe respondent is invited to explain his/her attitude:
  • 48. Structured &‘Unstructured’ InformationWe typically look at frequencies and distributions of coded-response questions:Linkage of responses to coded ratings helps in analyses.
  • 49. Sentiment Analysis“Sentiment analysis is the task of identifying positive and negative opinions, emotions, and evaluations.” -- Wilson, Wiebe & Hoffman, 2005, “Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis”From Dell’s IdeaStorm.com --“Dell really... REALLY need to stop overcharging... and when i say overcharing... i mean atleast double what you would pay to pick up the ram yourself.”
  • 50. Sentiment AnalysisApplications include:Brand / Reputation Management.Competitive intelligence.Customer Experience Management.Enterprise Feedback Management.Quality improvement.Trend spotting.
  • 51. Steps in the Right Direction
  • 53. ... And Missteps“Kind” = type, variety, not a sentiment.Complete misclassificationExternal referenceUnfiltered duplicates
  • 54. Sentiment ComplicationsThere are many complications.Sentiment may be of interest at multiple levels.Corpus / data space, i.e., across multiple sources.Document.Statement / sentence.Entity / topic / concept.Human language is noisy and chaotic!Jargon, slang, irony, ambiguity, anaphora, polysemy, synonymy, etc.Context is key. Discourse analysis comes into play.Must distinguish the sentiment holder from the object: Greenspan said the recession will…
  • 55. ApplicationsText analytics has applications in –Intelligence & law enforcement.Life sciences.Media & publishing including social-media analysis and contextual advertizing.Competitive intelligence.Voice of the Customer: CRM, product management & marketing.Legal, tax & regulatory (LTR) including compliance.Recruiting.
  • 56. Getting to Web 3.0Text analytics enables Web 3.0 and the Semantic Web.Automated content categorization and classification.Text augmentation: metadata generation, content tagging.Information extraction to databases.Exploratory analysis and visualization.
  • 57. Users’ PerspectiveI estimate a $425 million global market in 2009, up from $350 in 2008. I foresee 25% growth in 2010.Last year, I published a study report, “Text Analytics 2009: User Perspectives on Solutions and Providers.”I relayed findings from a survey that asked…
  • 58. Primary ApplicationsWhat are your primary applications where text comes into play?
  • 59. AnalyzedTextual InformationWhat textual information are you analyzing or do you plan to analyze?Current users responded:
  • 60. ExtractedInformaitonDo you need (or expect to need) to extract or analyze:
  • 62. Text Analytics for DummiesSeth GrimesAlta Plana Corporation@sethgrimes– 301-270-0795 -- http://altaplana.comText Analytics Summit 2010WorkshopMay 24, 2010