SlideShare a Scribd company logo
SAS Global 2021 Introduction to Natural Language Processing
Natural Language Processing—An Introduction
Colleen M. Farrelly, Staticlysm
Brief bio –
Colleen M. Farrelly is a machine learning scientist whose expertise includes
supervised learning, unsupervised learning, psychometrics, topological data
analysis, and natural language processing. She has an analytics book in review
that touches upon the analysis of text data with topological data analysis tools.
Introduction
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Text Data and Applications
• What do all of these have in
common?
• Clinical case notes
• Chatbot conversations
• Client email interactions
• Court case
summaries/transcripts
• Published research articles
• Tweets
• Voice recordings
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Text Data and Applications
• Commonalities
• Text data
• Contain potentially-
informative features for
predicting an outcome or
categorizing data
• May contain information
not available in structured
datasets
• Linguistic insight on the
speaker/writer
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Example
Legal
• Imagine both the witness and the robber in these two examples.
• How might these observations impact the outcome of a police investigation?
• Statement 1:
• She pulled the gun, took the money, and ran.
• Statement 2:
• The petite blonde pulled a shotgun on the clerk at station 2, filled a bag with cash from the
register, and absconded with the money and a handful of pens.
• How many suspects might the police have to stop to find Bonnie and Clyde?
Which witness statement might have more impact on a jury?
• How might differences in clinical case notes by clinicians inform health outcome
models? How might they reflect on the individual clinician?
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Making Sense of Text Data
• Natural language
processing (NLP)
• Collection of tools to parse
human language into
something understandable by
algorithms
• What is said
• Computational linguistics
• Deriving insight about human
behavior or traits based on
text data
• How it’s said
Common NLP Tools
An Overview
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Parsing Documents/Sentences
An Example
• Tokens (words or punctuation)
• Punctuation (non-word tokens)
• Stop words (less important words)
• Root words (stemming/lemmatizing)
Bonnie hopped into Clyde’s new car.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Tagging Features
• Parts of speech
• Clauses
• Grammatical relations
• Entity recognition
Bonnie hopped into Clyde’s new car.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Deriving Sentiment
• Language-dependent
• Sentiment dictionaries
• Positive/negative/neutral
(afinn, for instance)
• Emotion groups from
psychological models
Bonnie hopped into Clyde’s new car.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Vectorizing/Summarizing Results
• Many options for turning
NLP results into usable
data in machine learning
and statistical tools:
• Vectorization
• Word frequency matrices
• Summary tables
Bonnie hopped into Clyde’s new car.
Using Statistical Tools to Understand NLP
An Overview
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Summary Statistics
• Common summary
statistic uses
1. Conversation length
(example: engagement
metric)
2. Swear count (example:
escalation marker)
3. Conversation sentiment
over time (example:
engagement and
satisfaction)
4. Key word frequency
(example: products with
most issues)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Use as Machine Learning Features
• Examples combining
NLP data with data
from structured
databases
1. Clustering (example:
types of churn from
client feedback and
account data)
2. Predictive modeling
(example: patient
outcomes from case
notes and medical
records)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Psychometric Applications
• Some published papers:
1. Personality trait
identification in industrial
psychology research
2. Author identification in
plagiarism software
3. Quantification of release
risk in justice systems
4. Quantification of relapse
risk in mental health
applications
Other Uses of NLP
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Other Common NLP Applications
• Chatbots
• Personal assistants
• Translation services
• Sentence completion
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
In General
Useful References/Software
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Main NLP Software Options
• NLTK (Python)
• spaCy (Python)
• Stanford CoreNLP (Java)
• John Snow Labs/Spark NLP (Spark)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Some NLP Literature
• Dunnmon, J. A., Ratner, A. J., Saab, K., Khandwala, N., Markert, M., Sagreiya, H., ...
& Ré, C. (2020). Cross-modal data programming enables rapid medical machine
learning. Patterns, 100019.
• Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June).
Learning word vectors for sentiment analysis. In Proceedings of the 49th annual
meeting of the association for computational linguistics: Human language
technologies (pp. 142-150).
• Pennebaker, J. W. (2011). The secret life of pronouns. New Scientist, 211(2828),
42-45.
• Polsley, S., Jhunjhunwala, P., & Huang, R. (2016, December). Casesummarizer: a
system for automated summarization of legal texts. In Proceedings of COLING
2016, the 26th international conference on Computational Linguistics: System
Demonstrations (pp. 258-262).
• Velupillai, S., Suominen, H., Liakata, M., Roberts, A., Shah, A. D., Morley, K., ... &
Chapman, W. (2018). Using clinical Natural Language Processing for health
outcomes research: Overview and actionable suggestions for future advances.
Journal of biomedical informatics, 88, 11-19.
Thank you!
Contact Information
cfarrelly@med.miami.edu
SAS Global 2021 Introduction to Natural Language Processing
Ad

Recommended

The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulations
tbruce
 
Information Extraction
Information Extraction
Ignacio Delgado
 
Managing Mature Taxonomies: Resolving Orphan Terms
Managing Mature Taxonomies: Resolving Orphan Terms
Heather Hedden
 
Analysing Demonetisation through Text Mining using Live Twitter Data!
Analysing Demonetisation through Text Mining using Live Twitter Data!
Ivy Pro School
 
SRL4ORL: Semantic Role Labelling for Opinion Role Labelling
SRL4ORL: Semantic Role Labelling for Opinion Role Labelling
Ana Marasović
 
Text Mining
Text Mining
sathish sak
 
Synonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred Terms
Heather Hedden
 
6CS4_AI_Unit-5 @zammers.pptx(for artificial intelligence)
6CS4_AI_Unit-5 @zammers.pptx(for artificial intelligence)
Abhishekjain980450
 
NLP Introduction.ppt machine learning presentation
NLP Introduction.ppt machine learning presentation
PriyankaRamavath3
 
Veda Semantics - introduction document
Veda Semantics - introduction document
rajatkr
 
Natural Language Processing 20 March.pptx
Natural Language Processing 20 March.pptx
Sonam Mittal
 
artificial intelligence Chapter 6 - NLP.pdf
artificial intelligence Chapter 6 - NLP.pdf
naolseyum9
 
A Guide to Natural Language Processing NLP.pdf
A Guide to Natural Language Processing NLP.pdf
imoliviabennett
 
Introduction to NLP.pptx
Introduction to NLP.pptx
jkamble
 
Introduction to NLP_1.pptx
Introduction to NLP_1.pptx
jkamble
 
AI_Lecture_10.pptx
AI_Lecture_10.pptx
saadurrehman35
 
nlp ppt.pdf
nlp ppt.pdf
SaiKiran983895
 
NLP in artificial intelligence .pdf
NLP in artificial intelligence .pdf
RohanMalik45
 
Natural language processing
Natural language processing
KarenVacca
 
A Guide to Natural Language Processing NLP.pdf
A Guide to Natural Language Processing NLP.pdf
SoluLab1231
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
International Journal of Engineering Inventions www.ijeijournal.com
 
Natural Language Processing.pptx
Natural Language Processing.pptx
PriyadharshiniG41
 
Natural Language Processing for development
Natural Language Processing for development
Aravind Reddy
 
Introduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Role of Natural Language Processing in AI - Overview
Role of Natural Language Processing in AI - Overview
GrapesTech Solutions
 
Intro to AI of [chapter 6-7- 8 ] (1).pdf
Intro to AI of [chapter 6-7- 8 ] (1).pdf
naolseyum9
 
Natural language processing
Natural language processing
Janu Jahnavi
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
RajkiranVeluri
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
Colleen Farrelly
 

More Related Content

Similar to SAS Global 2021 Introduction to Natural Language Processing (20)

NLP Introduction.ppt machine learning presentation
NLP Introduction.ppt machine learning presentation
PriyankaRamavath3
 
Veda Semantics - introduction document
Veda Semantics - introduction document
rajatkr
 
Natural Language Processing 20 March.pptx
Natural Language Processing 20 March.pptx
Sonam Mittal
 
artificial intelligence Chapter 6 - NLP.pdf
artificial intelligence Chapter 6 - NLP.pdf
naolseyum9
 
A Guide to Natural Language Processing NLP.pdf
A Guide to Natural Language Processing NLP.pdf
imoliviabennett
 
Introduction to NLP.pptx
Introduction to NLP.pptx
jkamble
 
Introduction to NLP_1.pptx
Introduction to NLP_1.pptx
jkamble
 
AI_Lecture_10.pptx
AI_Lecture_10.pptx
saadurrehman35
 
nlp ppt.pdf
nlp ppt.pdf
SaiKiran983895
 
NLP in artificial intelligence .pdf
NLP in artificial intelligence .pdf
RohanMalik45
 
Natural language processing
Natural language processing
KarenVacca
 
A Guide to Natural Language Processing NLP.pdf
A Guide to Natural Language Processing NLP.pdf
SoluLab1231
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
International Journal of Engineering Inventions www.ijeijournal.com
 
Natural Language Processing.pptx
Natural Language Processing.pptx
PriyadharshiniG41
 
Natural Language Processing for development
Natural Language Processing for development
Aravind Reddy
 
Introduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Role of Natural Language Processing in AI - Overview
Role of Natural Language Processing in AI - Overview
GrapesTech Solutions
 
Intro to AI of [chapter 6-7- 8 ] (1).pdf
Intro to AI of [chapter 6-7- 8 ] (1).pdf
naolseyum9
 
Natural language processing
Natural language processing
Janu Jahnavi
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
RajkiranVeluri
 
NLP Introduction.ppt machine learning presentation
NLP Introduction.ppt machine learning presentation
PriyankaRamavath3
 
Veda Semantics - introduction document
Veda Semantics - introduction document
rajatkr
 
Natural Language Processing 20 March.pptx
Natural Language Processing 20 March.pptx
Sonam Mittal
 
artificial intelligence Chapter 6 - NLP.pdf
artificial intelligence Chapter 6 - NLP.pdf
naolseyum9
 
A Guide to Natural Language Processing NLP.pdf
A Guide to Natural Language Processing NLP.pdf
imoliviabennett
 
Introduction to NLP.pptx
Introduction to NLP.pptx
jkamble
 
Introduction to NLP_1.pptx
Introduction to NLP_1.pptx
jkamble
 
NLP in artificial intelligence .pdf
NLP in artificial intelligence .pdf
RohanMalik45
 
Natural language processing
Natural language processing
KarenVacca
 
A Guide to Natural Language Processing NLP.pdf
A Guide to Natural Language Processing NLP.pdf
SoluLab1231
 
Natural Language Processing.pptx
Natural Language Processing.pptx
PriyadharshiniG41
 
Natural Language Processing for development
Natural Language Processing for development
Aravind Reddy
 
Introduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Role of Natural Language Processing in AI - Overview
Role of Natural Language Processing in AI - Overview
GrapesTech Solutions
 
Intro to AI of [chapter 6-7- 8 ] (1).pdf
Intro to AI of [chapter 6-7- 8 ] (1).pdf
naolseyum9
 
Natural language processing
Natural language processing
Janu Jahnavi
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
RajkiranVeluri
 

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptx
Colleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
Colleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
Colleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
Colleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
Colleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
Colleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptx
Colleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptx
Colleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
Colleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
Colleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptx
Colleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
Colleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
Colleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptx
Colleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
Colleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
Colleen Farrelly
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network Science
Colleen Farrelly
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptx
Colleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
Colleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
Colleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
Colleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
Colleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
Colleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptx
Colleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptx
Colleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
Colleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
Colleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptx
Colleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
Colleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
Colleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptx
Colleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
Colleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
Colleen Farrelly
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network Science
Colleen Farrelly
 
Ad

Recently uploaded (20)

UPS and Big Data intro to Business Analytics.pptx
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
 
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
Introduction for GenAI for Faculty for University.pdf
Introduction for GenAI for Faculty for University.pdf
Saeed999312
 
Presentation by Tariq & Mohammed (1).pptx
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
MRI Pulse Sequence in radiology physics.pptx
MRI Pulse Sequence in radiology physics.pptx
BelaynehBishaw
 
Statistics-and-Computer-Tools-for-Analyzing-of-Assessment-Data.pptx
Statistics-and-Computer-Tools-for-Analyzing-of-Assessment-Data.pptx
pelaezmaryjoy90
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Crafting-Research-Recommendations Grade 12.pptx
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
Boost Business Efficiency with Professional Data Entry Services
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
NASA ESE Study Results v4 05.29.2020.pptx
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
The Influence off Flexible Work Policies
The Influence off Flexible Work Policies
sales480687
 
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
payalmistryb
 
Attendance Presentation Project Excel.pptx
Attendance Presentation Project Excel.pptx
s2025266191
 
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
Taqyea
 
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
UPS and Big Data intro to Business Analytics.pptx
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
 
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
Introduction for GenAI for Faculty for University.pdf
Introduction for GenAI for Faculty for University.pdf
Saeed999312
 
Presentation by Tariq & Mohammed (1).pptx
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
MRI Pulse Sequence in radiology physics.pptx
MRI Pulse Sequence in radiology physics.pptx
BelaynehBishaw
 
Statistics-and-Computer-Tools-for-Analyzing-of-Assessment-Data.pptx
Statistics-and-Computer-Tools-for-Analyzing-of-Assessment-Data.pptx
pelaezmaryjoy90
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Crafting-Research-Recommendations Grade 12.pptx
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
Boost Business Efficiency with Professional Data Entry Services
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
NASA ESE Study Results v4 05.29.2020.pptx
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
The Influence off Flexible Work Policies
The Influence off Flexible Work Policies
sales480687
 
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
payalmistryb
 
Attendance Presentation Project Excel.pptx
Attendance Presentation Project Excel.pptx
s2025266191
 
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
Taqyea
 
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
Ad

SAS Global 2021 Introduction to Natural Language Processing

  • 2. Natural Language Processing—An Introduction Colleen M. Farrelly, Staticlysm Brief bio – Colleen M. Farrelly is a machine learning scientist whose expertise includes supervised learning, unsupervised learning, psychometrics, topological data analysis, and natural language processing. She has an analytics book in review that touches upon the analysis of text data with topological data analysis tools.
  • 4. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Text Data and Applications • What do all of these have in common? • Clinical case notes • Chatbot conversations • Client email interactions • Court case summaries/transcripts • Published research articles • Tweets • Voice recordings
  • 5. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Text Data and Applications • Commonalities • Text data • Contain potentially- informative features for predicting an outcome or categorizing data • May contain information not available in structured datasets • Linguistic insight on the speaker/writer
  • 6. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Example Legal • Imagine both the witness and the robber in these two examples. • How might these observations impact the outcome of a police investigation? • Statement 1: • She pulled the gun, took the money, and ran. • Statement 2: • The petite blonde pulled a shotgun on the clerk at station 2, filled a bag with cash from the register, and absconded with the money and a handful of pens. • How many suspects might the police have to stop to find Bonnie and Clyde? Which witness statement might have more impact on a jury? • How might differences in clinical case notes by clinicians inform health outcome models? How might they reflect on the individual clinician?
  • 7. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Making Sense of Text Data • Natural language processing (NLP) • Collection of tools to parse human language into something understandable by algorithms • What is said • Computational linguistics • Deriving insight about human behavior or traits based on text data • How it’s said
  • 9. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Parsing Documents/Sentences An Example • Tokens (words or punctuation) • Punctuation (non-word tokens) • Stop words (less important words) • Root words (stemming/lemmatizing) Bonnie hopped into Clyde’s new car.
  • 10. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Tagging Features • Parts of speech • Clauses • Grammatical relations • Entity recognition Bonnie hopped into Clyde’s new car.
  • 11. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Deriving Sentiment • Language-dependent • Sentiment dictionaries • Positive/negative/neutral (afinn, for instance) • Emotion groups from psychological models Bonnie hopped into Clyde’s new car.
  • 12. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Vectorizing/Summarizing Results • Many options for turning NLP results into usable data in machine learning and statistical tools: • Vectorization • Word frequency matrices • Summary tables Bonnie hopped into Clyde’s new car.
  • 13. Using Statistical Tools to Understand NLP An Overview
  • 14. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Summary Statistics • Common summary statistic uses 1. Conversation length (example: engagement metric) 2. Swear count (example: escalation marker) 3. Conversation sentiment over time (example: engagement and satisfaction) 4. Key word frequency (example: products with most issues)
  • 15. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Use as Machine Learning Features • Examples combining NLP data with data from structured databases 1. Clustering (example: types of churn from client feedback and account data) 2. Predictive modeling (example: patient outcomes from case notes and medical records)
  • 16. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Psychometric Applications • Some published papers: 1. Personality trait identification in industrial psychology research 2. Author identification in plagiarism software 3. Quantification of release risk in justice systems 4. Quantification of relapse risk in mental health applications
  • 18. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Other Common NLP Applications • Chatbots • Personal assistants • Translation services • Sentence completion
  • 19. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. In General
  • 21. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Main NLP Software Options • NLTK (Python) • spaCy (Python) • Stanford CoreNLP (Java) • John Snow Labs/Spark NLP (Spark)
  • 22. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Some NLP Literature • Dunnmon, J. A., Ratner, A. J., Saab, K., Khandwala, N., Markert, M., Sagreiya, H., ... & Ré, C. (2020). Cross-modal data programming enables rapid medical machine learning. Patterns, 100019. • Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 142-150). • Pennebaker, J. W. (2011). The secret life of pronouns. New Scientist, 211(2828), 42-45. • Polsley, S., Jhunjhunwala, P., & Huang, R. (2016, December). Casesummarizer: a system for automated summarization of legal texts. In Proceedings of COLING 2016, the 26th international conference on Computational Linguistics: System Demonstrations (pp. 258-262). • Velupillai, S., Suominen, H., Liakata, M., Roberts, A., Shah, A. D., Morley, K., ... & Chapman, W. (2018). Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances. Journal of biomedical informatics, 88, 11-19.