0% found this document useful (0 votes)
18 views

Exp 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Exp 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Department of Computer Science & Engineering (AI&ML)

BE SEM :VII AY: 2024-25

Subject: Natural Language Processing Lab

Aim: Implementation of: (i) NER (Named Entity Recognition ) using NLTK.

Theory:Theory: Named Entity Recognition (NER) is a technique in Natural Language


Processing (NLP) used to identify and classify entities in a text into predefined categories such
as names of people, organizations, locations, dates, and more.

Imagine you have a box of mixed candies, and you want to sort them into different groups:
chocolates, gummies, and lollipops. NER does something similar with words in a sentence. It
"looks" at the sentence and sorts certain words into categories like:

● People's names (e.g., "Alice", "John")


● Organizations (e.g., "Google", "United Nations")
● Locations (e.g., "New York", "Mount Everest")
● Dates (e.g., "July 4th", "2023")
● Others (e.g., "books", "movies")

For example, in the sentence "Alice visited the Eiffel Tower in Paris on July 4th," an NER
system might identify:

● Alice as a Person
● Eiffel Tower as a Location
● Paris as a Location
● July 4th as a Date

NER helps computers understand the context of the text better by identifying and categorizing
important pieces of information.

Key Components of NER

Department of Computer Science & Engineering-(AI&ML) | APSIT


1. Entity Types: These are predefined categories into which entities are classified. Common
types include:
○ Person (PER): Names of people.
○ Organization (ORG): Names of companies, agencies, institutions.
○ Location (LOC): Names of countries, cities, landmarks.
○ Miscellaneous (MISC): Other entities such as events, works of art, dates, times,
etc.
2. Tokenization: The first step in NER is breaking down the text into smaller pieces called
tokens, typically words or phrases.
3. Feature Extraction: Extracting useful information from the text to help in identifying
entities. Features can include:
○ Lexical features: The actual words and their parts of speech.
○ Contextual features: The surrounding words and their parts of speech.
○ Orthographic features: Capitalization, punctuation, and other text-specific
details.
4. Model Training: NER systems are often trained on labeled datasets where the entities are
manually annotated. Common algorithms used include:
○ Rule-based methods: Using predefined linguistic rules.
○ Machine Learning: Using models like Conditional Random Fields (CRFs) or
Support Vector Machines (SVMs).
○ Deep Learning: Using neural networks, especially Recurrent Neural Networks
(RNNs) and Transformers like BERT (Bidirectional Encoder Representations
from Transformers).

Steps in NER

1. Preprocessing: Clean and prepare the text data, including tokenization and removing
irrelevant characters.
2. Feature Extraction: Extract features from the tokens to provide input to the model.
3. Model Application: Use the trained model to identify and classify entities in the text.
4. Post-processing: Refining and formatting the output for the desired application.

Conclusion: NER is a fundamental component of many NLP applications, enabling systems to


understand and process text in a way that recognizes and utilizes important entities.

Department of Computer Science & Engineering-(AI&ML) | APSIT

You might also like