About NLP
About NLP
NLP offers exact answers to the question i.e. does not offer unnecessary and unwanted
information.
-NLP is unpredictable
-NLP is unable to adapt to the new domain, and it has a limited function that's why
NLP is built for a single and specific task only.
How does NLP work
1. Sentence Segmentation
Sentence Segmentation breaks the paragraph into separate sentences.
Example:- Independence Day is one of the important festivals for every Indian citizen. It is
celebrated on the 15th of August each year ever since India got independence from the British
rule. The day celebrates independence in the true sense.
1. "Independence Day is one of the important festivals for every Indian citizen."
2. "It is celebrated on the 15th of August each year ever since India got independence from the
British rule."
e.g. Coursera offers Corporate Training, Online Training, and Winter Training.
e.g. In lemmatization, the words intelligence, intelligent, and intelligently has a root word
intelligent, which
2. Word Tokenization
Word Tokenizer is used to break the sentence into linguistic units (tokens) such as words, punctuation, numbers, alphanumeric,
etc.
Tokens need not to be further decomposed for subsequent processing.
Also, Handling Abbreviations, Hyphenated Words, Numerical and special expressions
e.g. Coursera offers Corporate Training, Online Training, and Winter Training.
Word Tokenizer output:
"Coursera", "offers", "Corporate", "Training", "Online", "Training", "and", "Winter", "Training","”
3. Stemming
Stemming is used to normalize words into its base form or root form
eg. The words celebrates, celebrated and celebrating are originated with a single root wor
“celebrate." The issue with stemming is that sometimes it produces the root word which
may not have any meaning.
E.g. intelligence, intelligent, and intelligently are originated with a single root word
"intelligen."
In English, the word "intelligen" do not have any meaning.
TF-IDF
Team Frequency :- TF of a term or word is the number of times the term appears in a
document compared to the total number of words in the document.
Tf= (frequency of the word in the sentence) / (total number of words in the sentence
Inverse document frequency :- IDF of a term reflects the proportion of documents in
the corpus that contain the term. Words unique to a small percentage of documents
(e.g., technical jargon terms) receive higher importance values than words common
across all documents (e.g., a, the, and).
Idf = (Total number of sentences) / (Number of sentences that word presents
Conclusion
Natural Language Processing is the practice of teaching machines
to understand and interpret conversational inputs from humans.
Thank You
Sirisha Reddy
10’B’
Roll No:-50