SlideShare a Scribd company logo
NLTK Natural Language Processing made easy Elvis Joel D’Souza Gopikrishnan Nambiar Ashutosh Pandey
WHAT: Session Objective To introduce Natural Language Toolkit(NLTK), an open source library which simplifies the implementation of Natural Language Processing(NLP) in Python.
HOW: Session Layout This session is divided into 3 parts: Python – The programming language Natural Language Processing (NLP) – The concept Natural Language Toolkit (NLTK) – The tool for NLP implementation in Python
 
Why Python?
Data Structures Python has 4 built-in data structures: List Tuple Dictionary Set
List A list in Python is an  ordered   group  of items (or  elements ).  It is a very general structure, and list elements don't have to be of the same type.  listOfWords = [‘this’,’is’,’a’,’list’,’of’,’words’]  listOfRandomStuff = [1,’pen’,’costs’,’Rs.’,6.50]
Tuple A tuple in Python is much like a  list  except that it is  immutable  (unchangeable) once created.  They are generally used for data which should not be edited. Example:  ( 100 , 10 , 0.01 ,’ hundred ’) Number Square root Reciprocal Number in words
Return a tuple def   func (x,y):  # code to compute a and b return  (a,b) One very useful situation is  returning multiple values  from a function. To return multiple values in many other languages requires creating an object or container of some type.
Dictionary A dictionary in python is a collection of unordered  values  which are accessed by  key . Example: Here, the key is the character and the value is its position in the alphabet { 1 : ‘ one ’,  2 : ‘ two ’,  3 : ‘ three ’}
Sets Python also has an implementation of the mathematical set.  Unlike sequence objects such as lists and tuples, in which each element is indexed, a set is an  unordered  collection of objects.  Sets also  cannot  have  duplicate  members - a given object appears in a set 0 or 1 times. SetOfBrowsers=set([‘IE’,’Firefox’,’Opera’,’Chrome’])
Control Statements
Decision Control - If num = 3
Loop Control - While number  = 10
Loop Control - For
Functions - Syntax def   functionname (arg1, arg2, ...): statement1  statement2  return  variable
Functions - Example
Modules A module is a file containing Python definitions and statements.  The file name is the module name with the suffix .py appended. A module can be  imported by another program to make use of its functionality.
Import import   math The import keyword is used to tell Python, that we need the ‘math’ module. This statement makes all the functions in this module accessible in the program.
Using Modules – An Example print  math. sqrt( 100 )   sqrt is a function math is a module math.sqrt(100) returns 10 This is being printed to the standard output
Natural Language Processing (NLP)
Natural Language Processing The term  natural language processing  encompasses a broad set of techniques for automated generation, manipulation, and analysis of natural or human languages
Why NLP Applications for processing large amounts of texts require NLP expertise Index and search large texts Speech understanding Information extraction Automatic summarization
Stemming Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form.  The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.  When you apply stemming on 'cats', the result is 'cat'
Part of speech tagging(POS Tagging) Part-of-speech (POS) tag: A word can be classified into one or more lexical or part-of-speech categories  such as nouns, verbs, adjectives, and articles, to name a few. A POS tag is a symbol representing such a lexical category, e.g., NN (noun), VB (verb), JJ (adjective), AT (article).
POS tagging - continued Given a sentence and a set of POS tags, a common language processing task is to automatically assign POS tags to each word in the sentence.  State-of-the-art POS taggers can achieve accuracy as high as 96%.
POS Tagging – An Example The   ball   is   red NOUN VERB ADJECTIVE ARTICLE
Parsing Parsing a sentence involves the use of linguistic knowledge of a language to discover the way in which a sentence is structured
Parsing– An Example The   boy   went   home NOUN VERB NOUN ARTICLE NP VP The boy went home
Challenges We will often imply additional information in spoken language by the way we place stress on words.  The sentence "I never said she stole my money" demonstrates the importance stress can play in a sentence, and thus the inherent difficulty a natural language processor can have in parsing it.
Depending on which word the speaker places the stress, sentences could have several distinct meanings Here goes an example…
" I  never said she stole my money“  Someone else said it, but  I  didn't.  "I  never  said she stole my money“    I simply didn't ever say it.  "I never  said  she stole my money"   I might have implied it in some way, but I never explicitly said it.  "I never said  she  stole my money"    I said someone took it; I didn't say it was she.
"I never said she  stole  my money"    I just said she probably borrowed it.  "I never said she stole  my  money"   I said she stole someone else's money.  "I never said she stole my  money "   I said she stole something, but not my money
NLTK Natural Language Toolkit
Design Goals
Exploring Corpora Corpus is a large collection of text which is used to either train an NLP program or is used as input by an NLP program In NLTK , a corpus can be loaded using the PlainTextCorpusReader Class
 
Loading your own corpus >>> from nltk.corpus import PlaintextCorpusReader corpus_root = ‘C:\text\’ >>> wordlists = PlaintextCorpusReader(corpus_root, '.*‘) >>> wordlists.fileids() ['README', 'connectives', 'propernames', 'web2', 'web2a', 'words'] >>> wordlists.words('connectives') ['the', 'of', 'and', 'to', 'a', 'in', 'that', 'is', ...]
NLTK Corpora Gutenberg corpus Brown corpus Wordnet Stopwords Shakespeare corpus Treebank And many more…
Computing with Language: Simple Statistics Frequency Distributions >>> fdist1 = FreqDist(text1) >>> fdist1 [2] <FreqDist with 260819 outcomes> >>> vocabulary1 = fdist1.keys() >>> vocabulary1[:50] [',', 'the', '.', 'of', 'and', 'a', 'to', ';', 'in', 'that', &quot;'&quot;, '-', 'his', 'it', 'I', 's', 'is', 'he', 'with', 'was', 'as', '&quot;', 'all', 'for', 'this', '!', 'at', 'by', 'but', 'not', '--', 'him', 'from', 'be', 'on', 'so', 'whale', 'one', 'you', 'had', 'have', 'there', 'But', 'or', 'were', 'now', 'which', '?', 'me', 'like'] >>> fdist1['whale'] 906
Cumulative Frequency Plot for 50 Most Frequently Words in  Moby Dick
POS tagging
WordNet Lemmatizer
Parsing >>> from nltk.parse import ShiftReduceParser >>> sr = ShiftReduceParser(grammar) >>> sentence1 = 'the cat chased the dog'.split() >>> sentence2 = 'the cat chased the dog on the rug'.split() >>> for t in sr.nbest_parse(sentence1): ...  print t (S (NP (DT the) (N cat)) (VP (V chased) (NP (DT the) (N dog))))
Authorship Attribution An Example
Find nltk @  <python-installation>\Lib\site-packages\nltk
The Road Ahead Python:  http://www.python.org A Byte of Python, Swaroop CH  http://www.swaroopch.com/notes/python Natural Language Processing: Speech And Language Processing, Jurafsky and Martin Foundations of Statistical Natural Language Processing, Manning and Schutze Natural Language Toolkit: http://www.nltk.org   (for NLTK Book, Documentation) Upcoming book by O'reilly Publishers
Ad

More Related Content

What's hot (20)

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Yasir Khan
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
Daiki Tanaka
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topics
ankit_ppt
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Basha Chand
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
NLTK
NLTKNLTK
NLTK
Muhammed Shokr
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Edureka!
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
Julien SIMON
 
Nlp
NlpNlp
Nlp
Nishanthini Mary
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Yogendra Tamang
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
National Institute of Technology Durgapur
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
Md.Sumon Sarder
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Shuntaro Yada
 
Language models
Language modelsLanguage models
Language models
Maryam Khordad
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
Divya Gera
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
gulshan kumar
 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentation
Ahmed rebai
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Yasir Khan
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
Daiki Tanaka
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topics
ankit_ppt
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Basha Chand
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Edureka!
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
Julien SIMON
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Yogendra Tamang
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
Md.Sumon Sarder
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Shuntaro Yada
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
Divya Gera
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
gulshan kumar
 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentation
Ahmed rebai
 

Viewers also liked (20)

Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
Jacob Perkins
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Jaganadh Gopinadhan
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
Daniel Adenew
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
rohitnayak
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Benjamin Bengfort
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Python
anntp
 
Natural Language Processing with Neo4j
Natural Language Processing with Neo4jNatural Language Processing with Neo4j
Natural Language Processing with Neo4j
Kenny Bastani
 
GPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo MoliniGPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo Molini
Big Data Spain
 
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...
Fiona Campbell
 
JavaScript Leaks
JavaScript LeaksJavaScript Leaks
JavaScript Leaks
silenceIT Inc.
 
Chaplin.js in real life
Chaplin.js in real lifeChaplin.js in real life
Chaplin.js in real life
Yehor Nazarkin
 
Knowledge extraction from the Encyclopedia of Life using Python NLTK
Knowledge extraction from the Encyclopedia of Life using Python NLTKKnowledge extraction from the Encyclopedia of Life using Python NLTK
Knowledge extraction from the Encyclopedia of Life using Python NLTK
Anne Thessen
 
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...
codin9cafe
 
PG-Strom
PG-StromPG-Strom
PG-Strom
Kohei KaiGai
 
Practical Natural Language Processing From Theory to Industrial Applications
Practical Natural Language Processing From Theory to Industrial Applications Practical Natural Language Processing From Theory to Industrial Applications
Practical Natural Language Processing From Theory to Industrial Applications
Jaganadh Gopinadhan
 
Predicting Candidate Performance From Text NLP
Predicting Candidate Performance From Text NLP Predicting Candidate Performance From Text NLP
Predicting Candidate Performance From Text NLP
Benjamin Taylor
 
Artifial intelligence
Artifial intelligenceArtifial intelligence
Artifial intelligence
Raga Deepthi
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translation
guest873a50
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
Daniel Adenew
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
rohitnayak
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Benjamin Bengfort
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Python
anntp
 
Natural Language Processing with Neo4j
Natural Language Processing with Neo4jNatural Language Processing with Neo4j
Natural Language Processing with Neo4j
Kenny Bastani
 
GPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo MoliniGPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo Molini
Big Data Spain
 
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...
Fiona Campbell
 
Chaplin.js in real life
Chaplin.js in real lifeChaplin.js in real life
Chaplin.js in real life
Yehor Nazarkin
 
Knowledge extraction from the Encyclopedia of Life using Python NLTK
Knowledge extraction from the Encyclopedia of Life using Python NLTKKnowledge extraction from the Encyclopedia of Life using Python NLTK
Knowledge extraction from the Encyclopedia of Life using Python NLTK
Anne Thessen
 
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...
codin9cafe
 
Practical Natural Language Processing From Theory to Industrial Applications
Practical Natural Language Processing From Theory to Industrial Applications Practical Natural Language Processing From Theory to Industrial Applications
Practical Natural Language Processing From Theory to Industrial Applications
Jaganadh Gopinadhan
 
Predicting Candidate Performance From Text NLP
Predicting Candidate Performance From Text NLP Predicting Candidate Performance From Text NLP
Predicting Candidate Performance From Text NLP
Benjamin Taylor
 
Artifial intelligence
Artifial intelligenceArtifial intelligence
Artifial intelligence
Raga Deepthi
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translation
guest873a50
 
Ad

Similar to NLTK: Natural Language Processing made easy (20)

overview of natural language processing concepts
overview of natural language processing conceptsoverview of natural language processing concepts
overview of natural language processing concepts
nazimsattar
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
Lakshya Sivaramakrishnan
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
AlyaaMachi
 
BOW.pptx
BOW.pptxBOW.pptx
BOW.pptx
SourabhRuhil4
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
Massimo Schenone
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
Lipika Sharma
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
pavankalyanadroittec
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
Rahul Borate
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
seungwoo kim
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
Sumit Raj
 
NLP todo
NLP todoNLP todo
NLP todo
Rohit Verma
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
Rajnish Raj
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
Alyona Medelyan
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text Mining
Sushanti Acharya
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
Chandan Deb
 
Text classification-php-v4
Text classification-php-v4Text classification-php-v4
Text classification-php-v4
Glenn De Backer
 
Basic of Python- Hands on Session
Basic of Python- Hands on SessionBasic of Python- Hands on Session
Basic of Python- Hands on Session
Dharmesh Tank
 
NLP
NLPNLP
NLP
Mohamed El-Serngawy
 
NLP
NLPNLP
NLP
guestff64339
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
Osebe Sammi
 
overview of natural language processing concepts
overview of natural language processing conceptsoverview of natural language processing concepts
overview of natural language processing concepts
nazimsattar
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
Lakshya Sivaramakrishnan
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
AlyaaMachi
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
Massimo Schenone
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
pavankalyanadroittec
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
seungwoo kim
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
Sumit Raj
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
Rajnish Raj
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
Alyona Medelyan
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text Mining
Sushanti Acharya
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
Chandan Deb
 
Text classification-php-v4
Text classification-php-v4Text classification-php-v4
Text classification-php-v4
Glenn De Backer
 
Basic of Python- Hands on Session
Basic of Python- Hands on SessionBasic of Python- Hands on Session
Basic of Python- Hands on Session
Dharmesh Tank
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
Osebe Sammi
 
Ad

NLTK: Natural Language Processing made easy

  • 1. NLTK Natural Language Processing made easy Elvis Joel D’Souza Gopikrishnan Nambiar Ashutosh Pandey
  • 2. WHAT: Session Objective To introduce Natural Language Toolkit(NLTK), an open source library which simplifies the implementation of Natural Language Processing(NLP) in Python.
  • 3. HOW: Session Layout This session is divided into 3 parts: Python – The programming language Natural Language Processing (NLP) – The concept Natural Language Toolkit (NLTK) – The tool for NLP implementation in Python
  • 4.  
  • 6. Data Structures Python has 4 built-in data structures: List Tuple Dictionary Set
  • 7. List A list in Python is an ordered group of items (or elements ). It is a very general structure, and list elements don't have to be of the same type. listOfWords = [‘this’,’is’,’a’,’list’,’of’,’words’] listOfRandomStuff = [1,’pen’,’costs’,’Rs.’,6.50]
  • 8. Tuple A tuple in Python is much like a list except that it is immutable (unchangeable) once created. They are generally used for data which should not be edited. Example: ( 100 , 10 , 0.01 ,’ hundred ’) Number Square root Reciprocal Number in words
  • 9. Return a tuple def func (x,y): # code to compute a and b return (a,b) One very useful situation is returning multiple values from a function. To return multiple values in many other languages requires creating an object or container of some type.
  • 10. Dictionary A dictionary in python is a collection of unordered values which are accessed by key . Example: Here, the key is the character and the value is its position in the alphabet { 1 : ‘ one ’, 2 : ‘ two ’, 3 : ‘ three ’}
  • 11. Sets Python also has an implementation of the mathematical set. Unlike sequence objects such as lists and tuples, in which each element is indexed, a set is an unordered collection of objects. Sets also cannot have duplicate members - a given object appears in a set 0 or 1 times. SetOfBrowsers=set([‘IE’,’Firefox’,’Opera’,’Chrome’])
  • 13. Decision Control - If num = 3
  • 14. Loop Control - While number = 10
  • 16. Functions - Syntax def functionname (arg1, arg2, ...): statement1 statement2 return variable
  • 18. Modules A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended. A module can be imported by another program to make use of its functionality.
  • 19. Import import math The import keyword is used to tell Python, that we need the ‘math’ module. This statement makes all the functions in this module accessible in the program.
  • 20. Using Modules – An Example print math. sqrt( 100 ) sqrt is a function math is a module math.sqrt(100) returns 10 This is being printed to the standard output
  • 22. Natural Language Processing The term natural language processing encompasses a broad set of techniques for automated generation, manipulation, and analysis of natural or human languages
  • 23. Why NLP Applications for processing large amounts of texts require NLP expertise Index and search large texts Speech understanding Information extraction Automatic summarization
  • 24. Stemming Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. When you apply stemming on 'cats', the result is 'cat'
  • 25. Part of speech tagging(POS Tagging) Part-of-speech (POS) tag: A word can be classified into one or more lexical or part-of-speech categories such as nouns, verbs, adjectives, and articles, to name a few. A POS tag is a symbol representing such a lexical category, e.g., NN (noun), VB (verb), JJ (adjective), AT (article).
  • 26. POS tagging - continued Given a sentence and a set of POS tags, a common language processing task is to automatically assign POS tags to each word in the sentence. State-of-the-art POS taggers can achieve accuracy as high as 96%.
  • 27. POS Tagging – An Example The ball is red NOUN VERB ADJECTIVE ARTICLE
  • 28. Parsing Parsing a sentence involves the use of linguistic knowledge of a language to discover the way in which a sentence is structured
  • 29. Parsing– An Example The boy went home NOUN VERB NOUN ARTICLE NP VP The boy went home
  • 30. Challenges We will often imply additional information in spoken language by the way we place stress on words. The sentence &quot;I never said she stole my money&quot; demonstrates the importance stress can play in a sentence, and thus the inherent difficulty a natural language processor can have in parsing it.
  • 31. Depending on which word the speaker places the stress, sentences could have several distinct meanings Here goes an example…
  • 32. &quot; I never said she stole my money“ Someone else said it, but I didn't. &quot;I never said she stole my money“ I simply didn't ever say it. &quot;I never said she stole my money&quot; I might have implied it in some way, but I never explicitly said it. &quot;I never said she stole my money&quot; I said someone took it; I didn't say it was she.
  • 33. &quot;I never said she stole my money&quot; I just said she probably borrowed it. &quot;I never said she stole my money&quot; I said she stole someone else's money. &quot;I never said she stole my money &quot; I said she stole something, but not my money
  • 36. Exploring Corpora Corpus is a large collection of text which is used to either train an NLP program or is used as input by an NLP program In NLTK , a corpus can be loaded using the PlainTextCorpusReader Class
  • 37.  
  • 38. Loading your own corpus >>> from nltk.corpus import PlaintextCorpusReader corpus_root = ‘C:\text\’ >>> wordlists = PlaintextCorpusReader(corpus_root, '.*‘) >>> wordlists.fileids() ['README', 'connectives', 'propernames', 'web2', 'web2a', 'words'] >>> wordlists.words('connectives') ['the', 'of', 'and', 'to', 'a', 'in', 'that', 'is', ...]
  • 39. NLTK Corpora Gutenberg corpus Brown corpus Wordnet Stopwords Shakespeare corpus Treebank And many more…
  • 40. Computing with Language: Simple Statistics Frequency Distributions >>> fdist1 = FreqDist(text1) >>> fdist1 [2] <FreqDist with 260819 outcomes> >>> vocabulary1 = fdist1.keys() >>> vocabulary1[:50] [',', 'the', '.', 'of', 'and', 'a', 'to', ';', 'in', 'that', &quot;'&quot;, '-', 'his', 'it', 'I', 's', 'is', 'he', 'with', 'was', 'as', '&quot;', 'all', 'for', 'this', '!', 'at', 'by', 'but', 'not', '--', 'him', 'from', 'be', 'on', 'so', 'whale', 'one', 'you', 'had', 'have', 'there', 'But', 'or', 'were', 'now', 'which', '?', 'me', 'like'] >>> fdist1['whale'] 906
  • 41. Cumulative Frequency Plot for 50 Most Frequently Words in Moby Dick
  • 44. Parsing >>> from nltk.parse import ShiftReduceParser >>> sr = ShiftReduceParser(grammar) >>> sentence1 = 'the cat chased the dog'.split() >>> sentence2 = 'the cat chased the dog on the rug'.split() >>> for t in sr.nbest_parse(sentence1): ... print t (S (NP (DT the) (N cat)) (VP (V chased) (NP (DT the) (N dog))))
  • 46. Find nltk @ <python-installation>\Lib\site-packages\nltk
  • 47. The Road Ahead Python: http://www.python.org A Byte of Python, Swaroop CH http://www.swaroopch.com/notes/python Natural Language Processing: Speech And Language Processing, Jurafsky and Martin Foundations of Statistical Natural Language Processing, Manning and Schutze Natural Language Toolkit: http://www.nltk.org (for NLTK Book, Documentation) Upcoming book by O'reilly Publishers