0% found this document useful (0 votes)

22 views

Unit Vapplications Notes

This document discusses several topics related to natural language processing and information retrieval, including: 1. Language models that are used to understand and process natural human languages for tasks like text classification, information retrieval, and information extraction. Common approaches involve n-gram character models. 2. Information retrieval systems that aim to find documents relevant to a user's information need in response to a query. Scoring functions like BM25 are used to rank documents by relevance to the query. 3. Methods for evaluating information retrieval systems based on metrics like precision and recall calculated from relevant and non-relevant documents returned for a query. Refinements to basic models incorporate factors like stemming, metadata, and link analysis.

Uploaded by

C. SANTHOSH KUMAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Unit Vapplications Notes

Uploaded by

C. SANTHOSH KUMAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

UNIT V APPLICATIONS 9

AI applications – Language Models – Information Retrieval- Information Extraction – Natural Language

Processing - Machine Translation – Speech Recognition – Robot – Hardware – Perception – Planning –
Moving

There are over a trillion pages of information on the Web, almost all of it in natural Knowledge
Acquisition language. An agent that wants to do needs to understand (at least partially) the
ambiguous, messy languages that humans use. We examine the problem from
the point of view of specific information-seeking tasks: text classification, information retrieval,
and information extraction. One common factor in addressing these tasks is the use of
language models: models that predict the probability distribution of language expression.

LANGUAGE MODELS:
Formal languages, such as the programming languages Java or Python, have precisely defined language
models. A language can be defined as a set of strings; ―print(2 + 2)‖ is a
legal program in the language Python, whereas ―2)+(2 print‖ is not,

Natural languages, such as English or Spanish, cannot be characterized as a definitive

set of sentences.

Natural languages are also ambiguous.

natural languages are difficult to deal with because they are very large, and
constantly changing. Thus, our language models are, at best, an approximation.

N-gram character models

A written text is composed of characters—letters, digits, punctuation, and spaces

in English. Thus, one of the simplest language models is a probability distribution over sequences of
characters. we write P(c1:N) for the probability of a sequence of N characters, c1 through cN . In one
Web collection, P(―the‖)=0.027 and P(―zgq‖)=0.000000002. A sequence of written symbols of length n is
called an n-gram, with special case ―unigram‖ for 1-gram, ―bigram‖ for 2-gram, and ―trigram‖ for 3-
gram.

A model of the probability distribution of n-letter sequences is thus called an n-gram model. An n-gram
model is defined as a Markov chain of order n − 1. in a Markov chain the probability of character ci
depends only on the immediately preceding characters, not on any other characters. So in a trigram model

P(ci | c1:i−1) = P(ci | ci−2:i−1) .

We can define the probability of a sequence of characters P(c1:N) under the trigram model
by first factoring with the chain rule and then using the Markov assumption:
One approach to language identification is to first build a trigram character model of
each candidate language, P(ci | ci−2:i−1,L), where the variable ranges over languages. For
each the model is built by counting trigrams in a corpus of that language.

Smoothing n-gram models

The major complication of n-gram models is that the training corpus provides only an estimate of the true
probability distribution. For common character sequences such as ― th‖ any
English corpus will give a good estimate: about 1.5% of all trigrams. On the other hand, ― ht‖
is very uncommon—no dictionary words start with ht. It is likely that the sequence would
have a count of zero in a training corpus of standard English. Does that mean we should assign P(―
th‖)=0?.

If we did, then the text ―The program issues an http request‖ would have an English probability of zero,
which seems wrong. We have a problem in generalization: we
want our language models to generalize well to texts they haven’t seen yet. Just because we
have never seen ― http‖ before does not mean that our model should claim that it is impossible. Thus, we
will adjust our language model so that sequences that have a count of zero in
the training corpus will be assigned a small nonzero probability (and the other counts will be
adjusted downward slightly so that the probability still sums to 1). The process od adjusting the
probability of low-frequency counts is called smoothing.

Linear interpolation smoothing is a backoff model that combines trigram, bigram, and unigram

models by linear interpolation. It defines the probability estimate as

P *(ci|ci−2:i−1) = λ3P(ci|ci−2:i−1) + λ2P(ci|ci−1) + λ1P(ci) ,

Model evaluation

With so many possible n-gram models—unigram, bigram, trigram, interpolated smoothing

with different values of λ, etc .We can evaluate a model with cross-validation. Split the corpus into a
training corpus and a validation corpus. Determine the parameters of the model from the training data.
Then evaluate the model on the validation corpus. The evaluation can be a task-specific metric, such as
measuring accuracy on language identification. Alternatively we can have a task-independent model of
language quality: calculate the probability assigned to the validation corpus by the model; the higher the
probability the better. This metric is inconvenient because the probability of a large corpus will
be a very small number, and floating-point underflow becomes an issue.

Information retrieval

Information retrieval is the task of finding documents that are relevant to a user’s need for
information. The best-known examples of information retrieval systems are search engines
on the World Wide Web.

An information retrieval (henceforth IR) system can be characterized by

1. A corpus of documents. Each system must decide what it wants to treat as a document: a
paragraph, a page, or a multipage text.
2. Queries posed in a query language. A query specifies what the user wants to know.The query
language can be just a list of words, such as [AI book]; or it can specify a phrase of words that must
be adjacent, as in [―AI book‖]; it can contain Boolean operators as in [AI AND book]; it can include
non-Boolean operators such as [AI NEAR book] or [AI book site:www.aaai.org].
3. A result set. This is the subset of documents that the IR system judges to be relevant to
the query. By relevant, we mean likely to be of use to the person who posed the query,
for the particular information need expressed in the query.
4. A presentation of the result set. This can be as simple as a ranked list of document
titles or as complex as a rotating color map of the result set projected onto a threedimensional space,
endered as a two-dimensional display.

The earliest IR systems worked on a Boolean keyword model. Each word in the document collection is
treated as a Boolean feature that is true of a document if the word occurs in the document and false if it
does not.

IR scoring functions

A scoring function takes a document and a query and returns a numeric score; the most
relevant documents have the highest scores. In the BM25 function, the score is a linear
weighted combination of scores for each of the words that make up the query. Three factors
affect the weight of a query term: First, the frequency with which a query term appears in
a document (also known as TF for term frequency).

The BM25 function takes all three of these into account. We assume we have created
an index of the N documents in the corpus so that we can look up TF(qi, dj), the count of
the number of times word qi appears in document dj. We also assume a table of document
frequency counts, DF(qi), that gives the number of documents that contain the word qi.
Then, given a document dj and a query consisting of the words q1:N, we have

IR system evaluation

Imagine that an IR system has

returned a result set for a single query, for which we know which documents are and are not
relevant, out of a corpus of 100 documents. The document counts in each category are given
in the following table:

IR refinements

The BM25 scoring function uses a word model that treats all words as completely independent, but we
now that some words are correlated: ―couch‖ is closely related to both ―couches‖ and ―sofa.‖ Many IR
systems attempt to account for these correlations. For example, if the query is [couch], it would be a
shame to exclude from the result set those documents that mention ―COUCH‖ or ―couches‖ but not
―couch.‖ Most IR systems do case folding of ―COUCH‖ to ―couch,‖ and some use a stemming
algorithm to reduce ―couches‖ to the stem form ―couch,‖ both in the query and the documents. This
typically yields a small increase in recall (on the order of 2% for English). However, it can harm
precision. For example, stemming ―stocking‖ to ―stock‖ will tend to decrease precision for
queries about either foot coverings or financial instruments, although it could improve recall
for queries about warehousing. Stemming algorithms based on rules (e.g., remove ―-ing‖)
cannot avoid this problem, but algorithms based on dictionaries (don’t remove ―-ing‖ if the
word is already listed in the dictionary) can.
IR can be improved by considering metadata—data outside of the text of the document. Examples
include human-supplied keywords and publication data.LOn the Web, hypertext links between documents
are a crucial source of information.

The PageRank algorithm

The PageRank for a page p is defined as:

The HITS algorithm

The Hyperlink-Induced Topic Search algorithm, also known as ―Hubs and Authorities‖ or HITS, is
another influential link-analysis algorithm. HITS differs from PageRank in several ways. First, it is a
query-dependent measure: it rates pages with respect to a query. That means that it must be computed
anew for each query—a computational burden that most search engines have elected not to take on.
Given a query, HITS first finds a set of pages that are relevant to the query. It does that by intersecting hit
lists of query words, and then adding pages in the link neighborhood of these pages—pages that link to or
are linked from one of the pages in the original relevant set. Each page in this set is considered an
authority on the query to the degree that other pages in the relevant set point to it. A page is considered
a hub to the degree that it points to other authoritative pages in the relevant set. Just as with PageRank,
we don’t want to merely count the number of links; we want to give more value to the high-quality hubs
and authorities. Thus, as with PageRank, we iterate a process that updates the authority score of a page to
be the sum of the hub scores of the pages that point to it, and the hub score to be the sum of the authority
scores of the pages it points to. If we then normalize the scores and repeat k times, the process will
converge.

Question answering

Question answering is a somewhat different task, in which the query really is a question, and the answer
is not a ranked list of documents but rather a short response—a sentence, or even just a phrase.

INFORMATION EXTRACTION
Information extraction is the process of acquiring knowledge by skimming a text and looking for
occurrences of a particular class of object and for relationships among objects. A
typical task is to extract instances of addresses from Web pages, with database fields for
street, city, state, and zip code; or instances of storms from weather reports, with fields for
temperature, wind speed, and precipitation. In a limited domain, this can be done with high
accuracy. As the domain gets more general, more complex linguistic models and more complex learning
techniques are necessary.

Finite-state automata for information extraction

The simplest type of information extraction system is an attribute-based extraction system
that assumes that the entire text refers to a single object and the task is to extract attributes of
that object. For example, we mentioned in Section 12.7 the problem of extracting from the
text ―IBM ThinkBook 970. Our price: $399.00‖ the set of attributes {Manufacturer=IBM,
Model=ThinkBook970, Price=$399.00}. We can address this problem by defining a template (also
known as a pattern) for each attribute we would like to extract. The template is
defined by a finite state automaton, the simplest example of which is the regular expression,
or regex. Regular expressions are used in Unix commands such as grep, in programming
languages such as Perl, and in word processors such as Microsoft Word. The details vary
slightly from one tool to another and so are best learned from the appropriate manual, but
here we show how to build up a regular expression template for prices in dollars:
[0-9] matches any digit from 0 to 9
[0-9]+ matches one or more digits
[.][0-9][0-9] matches a period followed by two digits
([.][0-9][0-9])? matches a period followed by two digits, or nothing
[$][0-9]+([.][0-9][0-9])? matches $249.99 or $1.23 or $1000000 or . . .
Templates are often defined with three parts: a prefix regex, a target regex, and a postfix regex.
For prices, the target regex is as above, the prefix would look for strings such as ―price:‖ and
the postfix could be empty. The idea is that some clues about an attribute come from the
attribute value itself and some come from the surrounding text.

One step up from attribute-based extraction systems are relational extraction systems,
which deal with multiple objects and the relations among them. Thus, when these systems
see the text ―$249.99,‖ they need to determine not just that it is a price, but also which object
has that price. A typical relational-based extraction system is FASTUS, which handles news
stories about corporate mergers and acquisitions.

Bridgestone Sports Co. said Friday it has set up a joint venture in Taiwan with a local concern and a Japanese
trading house to produce golf clubs to be shipped to Japan. and extract the relations:

A relational extraction system can be built as a series of cascaded finite-state transducers.

That is, the system consists of a series of small, efficient finite-state automata (FSAs), where
each automaton receives text as input, transduces the text into a different format, and passes
it along to the next automaton. FASTUS consists of five stages:
1. Tokenization
2. Complex-word handling
3. Basic-group handling
4. Complex-phrase handling
5. Structure merging
FASTUS’s first stage is tokenization, which segments the stream of characters into tokens
(words, numbers, and punctuation). For English, tokenization can be fairly simple; just separating
characters at white space or punctuation does a fairly good job. Some tokenizers also
deal with markup languages such as HTML, SGML, and XML.
The second stage handles complex words, including collocations such as ―set up‖ and
―joint venture,‖ as well as proper names such as ―Bridgestone Sports Co.‖ These are recognized by a
combination of lexical entries and finite-state grammar rules. For example, a
company name might be recognized by the rule
CapitalizedWord+ (―Company‖ | ―Co‖ | ―Inc‖ | ―Ltd‖)
The third stage handles basic groups, meaning noun groups and verb groups. The idea is
to chunk these into units that will be managed by the later stages. We will see how to write
a complex description of noun and verb phrases in Chapter 23, but here we have simple
rules that only approximate the complexity of English, but have the advantage of being representable by
finite state automata.

Probabilistic models for information extraction

When information extraction must be attempted from noisy or varied input, simple finite-state
approaches fare poorly. It is too hard to get all the rules and their priorities right; it is better
to use a probabilistic model rather than a rule-based model. The simplest probabilistic model
for sequences with hidden state is the hidden Markov model, or HMM.
an HMM models a progression through a sequence of
hidden states, xt, with an observation et at each step. To apply HMMs to information extraction, we can
either build one big HMM for all the attributes or build a separate HMM
for each attribute. The observations are the words of the text, and the
hidden states are whether we are in the target, prefix, or postfix part of the attribute template,
or in the background .

Conditional random fields for information extraction

An HMM is a generative model; it models the full joint

probability of observations and hidden states, and thus can be used to generate samples. That
is, we can use the HMM model not only to parse a text and recover the speaker and date,
but also to generate a random instance of a text containing a speaker and a date. Since we’re
not interested in that task, it is natural to ask whether we might be better off with a model
that doesn’t bother modeling that possibility. All we need in order to understand a text is a
discriminative model, one that models the conditional probability of the hidden attributes
given the observations (the text). Given a text e1:N , the conditional model finds the hidden
state sequence X1:N that maximizes P(X1:N | e1:N).
Modeling this directly gives us some freedom. We don’t need the independence assumptions of the
Markov model—we can have an xt that is dependent on x1. A framework
for this type of model is the conditional random field, or CRF, which models a conditional
probability distribution of a set of target variables given a set of observed variables. Like
Bayesian networks, CRFs can represent many different structures of dependencies among the
variables. One common structure is the linear-chain conditional random field for repreRANDOM FIELD
senting Markov dependencies among variables in a temporal sequence. Thus, HMMs are the
temporal version of naive Bayes models, and linear-chain CRFs are the temporal version of
logistic regression, where the predicted target is an entire state sequence rather than a single
binary variable.
Let e1:N be the observations (e.g., words in a document), and x1:N be the sequence of
hidden states (e.g., the prefix, target, and postfix states). A linear-chain conditional random
field defines a conditional probability distribution:
P(x1:N|e1:N) = α e[PN i=1 F (xi−1,xi,e,i)] ,
where α is a normalization factor (to make sure the probabilities sum to 1), and F is a feature
function defined as the weighted sum of a collection of k component feature functions:
F(xi−1, xi, e, i) = k λk fk(xi−1, xi, e, i).

Machine translation

Machine translation is the automatic translation of text from one natural language (the source)
to another (the target).

A translator (human or machine) often needs to understand the actual situation described in the source,
not just the individual words.

Machine translation systems

All translation systems must model the source and target languages, but systems vary in the
type of models they use. Some systems attempt to analyze the source language text all the way
into an interlingua knowledge representation and then generate sentences in the target language from that
representation. This is difficult because it involves three unsolved problems:
creating a complete knowledge representation of everything; parsing into that representation;
and generating sentences from that representation.

Other systems are based on a transfer model. They keep a database of translation rules
(or examples), and whenever the rule (or example) matches, they translate directly. Transfer
can occur at the lexical, syntactic, or semantic level. For example, a strictly syntactic rule
maps English [Adjective Noun] to French [Noun Adjective]. A mixed syntactic and lexical
rule maps French [S1 ―et puis‖ S2] to English [S1 ―and then‖ S2]. Figure 23.12 diagrams the
various transfer points

Statistical machine translation

we have seen how complex the translation task can be, it should come as no surprise that the most
successful machine translation systems are built by training a probabilistic
model using statistics gathered from a large corpus of text. This approach does not need
a complex ontology of interlingua concepts, nor does it need handcrafted grammars of the
source and target languages, nor a hand-labeled treebank. All it needs is data—sample translations from
which a translation model can be learned. To translate a sentence in, say, English
(e) into French (f), we find the string of words f ∗ that maximizes
f ∗ = argmax P(f | e) = argmax P(e | f) P(f) .
Here the factor P(f) is the target language model for French; it says how probable a given sentence is in
French. P(e|f) is the translation model; it says how probable an English
sentence is as a translation for a given French sentence. Similarly, P(f | e) is a translation
model from English to French.

In diagnostic applications like medicine, it is easier to model the domain in the causal direction:
P(symptoms | disease) rather than P(disease | symptoms). But in translation both
directions are equally easy. The earliest work in statistical machine translation did apply
Bayes’ rule—in part because the researchers had a good language model, P(f), and wanted
to make use of it, and in part because they came from a background in speech recognition,
which is a diagnostic problem. We follow their lead in this chapter, but we note that recent work in
statistical machine translation often optimizes P(f | e) directly, using a more
sophisticated model that takes into account many of the features from the language model.

The translation model is learned from a bilingual corpus—a collection of parallel texts,
each an English/French pair

given a source English sentence, e,

finding a French translation f is a matter of three steps:
1. Break the English sentence into phrases e1, . . . , en.
2. For each phrase ei, choose a corresponding French phrase fi. We use the notation
P(fi | ei) for the phrasal probability that fi is a translation of ei.
3. Choose a permutation of the phrases f1, . . . , fn. We will specify this permutation in a
way that seems a little complicated, but is designed to have a simple probability distribution: For each fi,
we choose a distortion di, which is the number of words that
phrase fi has moved with respect to fi−1; positive for moving to the right, negative for
moving to the left, and zero if fi immediately follows fi−1.

an example of the process. At the top, the sentence ―There is a smelly

wumpus sleeping in 2 2‖ is broken into five phrases, e1, . . . , e5. Each of them is translated
into a corresponding phrase fi, and then these are permuted into the order f1, f3, f4, f2, f5.
We specify the permutation in terms of the distortions di of each French phrase, defined as
di = START(fi) − END(fi−1) − 1 ,
where START(fi) is the ordinal number of the first word of phrase fi in the French sentence,
and END(fi−1) is the ordinal number of the last word of phrase fi−1. In Figure 23.13 we see
that f5, ―` a 2 2,‖ immediately follows f4, ―qui dort,‖ and thus d5 =0. Phrase f2, however, has
moved one words to the right of f1, so d2 =1. As a special case we have d1 =0, because f1
starts at position 1 and END(f0) is defined to be 0 (even though f0 does not exist).
All that remains is to learn the phrasal and distortion probabilities. We sketch the procedure; see the notes
at the end of the chapter for details.
1. Find parallel texts: First, gather a parallel bilingual corpus.

2. Segment into sentences: The unit of translation is a sentence, so we will have to break
the corpus into sentences.
3. Align sentences: For each sentence in the English version, determine what sentence(s)
it corresponds to in the French version.
4. Align phrases: Within a sentence, phrases can be aligned by a process that is similar to
that used for sentence alignment, but requiring iterative improvement.
5. Extract distortions: Once we have an alignment of phrases we can define distortion
probabilities. Simply count how often distortion occurs in the corpus for each distance
d = 0, ±1, ±2, . . ., and apply smoothing.
6. 6. Improve estimates with EM: Use expectation–maximization to improve the estimates
of P(f | e) and P(d) values. We compute the best alignments with the current values
of these parameters in the E step, then update the estimates in the M step and iterate the
process until convergence.

Speech recognition

Speech recognition is the task of identifying a sequence of words uttered by a speaker, given
the acoustic signal.
people interact with speech recognition systems every day to navigate voice mail systems,
search the Web from mobile phones, and other applications. Speech is an attractive option
when hands-free operation is necessary, as when operating machinery.
Speech recognition is difficult because the sounds made by a speaker are ambiguous
and, well, noisy. As a well-known example, the phrase ―recognize speech‖ sounds almost
the same as ―wreck a nice beach‖ when spoken quickly. Even this short example shows
several of the issues that make speech problematic. First, segmentation: written words in
English have spaces between them, but in fast speech there are no pauses in ―wreck a nice‖
that would distinguish it as a multiword phrase as opposed to the single word ―recognize.‖
Second, coarticulation: when speaking quickly the ―s‖ sound at the end of ―nice‖ merges
with the ―b‖ sound at the beginning of ―beach,‖ yielding something that is close to a ―sp.‖
Another problem that does not show up in this example is homophones—words like ―to,‖
―too,‖ and ―two‖ that sound the same but differ in meaning.
We can view speech recognition as a problem in most-likely-sequence explanation. As
we saw in Section 15.2, this is the problem of computing the most likely sequence of state
variables, x1:t, given a sequence of observations e1:t. In this case the state variables are the
words, and the observations are sounds. More precisely, an observation is a vector of features
extracted from the audio signal. As usual, the most likely sequence can be computed with the
help of Bayes’ rule to be:
argmax word1:t P (word 1:t | sound 1:t) = argmax word1:t P (sound 1:t | word 1:t)P (word 1:t) .
Here P (sound 1:t|word 1:t) is the acoustic model. It describes the sounds of words—that
―ceiling‖ begins with a soft ―c‖ and sounds the same as ―sealing.‖ P (word 1:t) is known as
the language model. It specifies the prior probability of each utterance—for example, that
―ceiling fan‖ is about 500 times more likely as a word sequence than ―sealing fan.

Acoustic model

Sound waves are periodic changes in pressure that propagate through the air. When these
waves strike the diaphragm of a microphone, the back-and-forth movement generates an
electric current. An analog-to-digital converter measures the size of the current—which approximates
the amplitude of the sound wave—at discrete intervals called the sampling rate.
Speech sounds, which are mostly in the range of 100 Hz (100 cycles per second) to 1000 Hz,
are typically sampled at a rate of 8 kHz. (CDs and mp3 files are sampled at 44.1 kHz.) The
precision of each measurement is determined by the quantization factor; speech recognizers
typically keep 8 to 12 bits. That means that a low-end system, sampling at 8 kHz with 8-bit
quantization, would require nearly half a megabyte per minute of speech.
Since we only want to know what words were spoken, not exactly what they sounded
like, we don’t need to keep all that information. We only need to distinguish between differ PHONE ent
speech sounds. Linguists have identified about 100 speech sounds, or phones, that can be
composed to form all the words in all known human languages. Roughly speaking, a phone
is the sound that corresponds to a single vowel or consonant, but there are some complications:
combinations of letters, such as ―th‖ and ―ng‖ produce single phones, and some letters
produce different phones in different contexts (e.g., the ―a‖ in rat and rate.
A phoneme is the smallest unit of sound that has a distinct meaning to speakers of a particular
language.
First, we observe that although the sound frequencies in speech may be several kHz,
the changes in the content of the signal occur much less often, perhaps at no more than 100
Hz. Therefore, speech systems summarize the properties of the signal over time slices called
frames. A frame length of about 10 milliseconds (i.e., 80 samples at 8 kHz) is short enough
to ensure that few short-duration phenomena will be missed. Overlapping frames are used to
make sure that we don’t miss a signal because it happens to fall on a frame boundary.
Each frame is summarized by a vector of features. Picking out features from a speech
signal is like listening to an orchestra and saying ―here the French horns are playing loudly
and the violins are playing softly.‖ We’ll give a brief overview of the features in a typical
system. First, a Fourier transform is used to determine the amount of acoustic energy at
about a dozen frequencies. Then we compute a measure called the mel frequency cepstral
coefficient (MFCC) or MFCC for each frequency. We also compute the total energy in

the frame. That gives thirteen features; for each one we compute the difference between
this frame and the previous frame, and the difference between differences, for a total of 39
features. These are continuous-valued; the easiest way to fit them into the HMM framework
is to discretize the values.

English Heaven-Sent Bece Mock - June, 2024
No ratings yet
English Heaven-Sent Bece Mock - June, 2024
8 pages
AI Unit V
No ratings yet
AI Unit V
64 pages
Unit 5
No ratings yet
Unit 5
26 pages
PPT08-Natural Language Processing
100% (1)
PPT08-Natural Language Processing
44 pages
Unit V-AI-KCS071
No ratings yet
Unit V-AI-KCS071
28 pages
Unit 5-Aiml
No ratings yet
Unit 5-Aiml
25 pages
Ai Unit 5
No ratings yet
Ai Unit 5
16 pages
Applications of AI
No ratings yet
Applications of AI
11 pages
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
No ratings yet
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
29 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
ai
No ratings yet
ai
13 pages
Pert23 - NLP
No ratings yet
Pert23 - NLP
30 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
My M-7
No ratings yet
My M-7
44 pages
Ai Unit 3 Part 2
No ratings yet
Ai Unit 3 Part 2
8 pages
Analysis of Statistical Parsing in Natural Language Processing
No ratings yet
Analysis of Statistical Parsing in Natural Language Processing
6 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Inex03 Pto
No ratings yet
Inex03 Pto
8 pages
Unit 2b
No ratings yet
Unit 2b
22 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
CCS369 - TSS-Unit 3
No ratings yet
CCS369 - TSS-Unit 3
55 pages
Faculty Name: Dr. Humera Khanam Subject Name:NLP
No ratings yet
Faculty Name: Dr. Humera Khanam Subject Name:NLP
206 pages
Ai Lecture22
No ratings yet
Ai Lecture22
32 pages
IR Assignment-2
No ratings yet
IR Assignment-2
8 pages
Chapter 6-NLP
No ratings yet
Chapter 6-NLP
8 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
NLP-UNITS-IV-V
No ratings yet
NLP-UNITS-IV-V
30 pages
NLP - Module 5
No ratings yet
NLP - Module 5
58 pages
Evaluating Language Models
No ratings yet
Evaluating Language Models
21 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
NLP ANONYMOUS QB Ans
No ratings yet
NLP ANONYMOUS QB Ans
21 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
UNIT 6- NLP NOTES
No ratings yet
UNIT 6- NLP NOTES
7 pages
Tamrakar 2015
No ratings yet
Tamrakar 2015
6 pages
6.Chapter6_LanguageModel
No ratings yet
6.Chapter6_LanguageModel
33 pages
BCSE306L_AI_MODULE-7_SMSATAPATHY
No ratings yet
BCSE306L_AI_MODULE-7_SMSATAPATHY
51 pages
Chapter 4 - Processing Text
No ratings yet
Chapter 4 - Processing Text
7 pages
NLP KEY
No ratings yet
NLP KEY
16 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
KEN2570-5-Search and IR
No ratings yet
KEN2570-5-Search and IR
18 pages
NLP_Midterm_Spring2025
No ratings yet
NLP_Midterm_Spring2025
7 pages
3. n grams
No ratings yet
3. n grams
3 pages
IR Chap7
No ratings yet
IR Chap7
30 pages
Bengali Information Retrieval System (Birs)
No ratings yet
Bengali Information Retrieval System (Birs)
12 pages
Topic 4 W4 - Text Processing
No ratings yet
Topic 4 W4 - Text Processing
42 pages
CME4408 P5 N-grams Smooting
No ratings yet
CME4408 P5 N-grams Smooting
43 pages
Chapter Five (ISR)
No ratings yet
Chapter Five (ISR)
17 pages
L02-IR Models MMN
No ratings yet
L02-IR Models MMN
27 pages
N Grams
No ratings yet
N Grams
51 pages
NLP_Module 2(1)
No ratings yet
NLP_Module 2(1)
77 pages
NLP 4
No ratings yet
NLP 4
33 pages
Information Retreival Methods
No ratings yet
Information Retreival Methods
19 pages
APznzaYcMuGblum6dLzbrgiqMX2w4bWppprlEIRBfo09L31Bp7TZFL5IS8fAgxpO CgAMHeGkzBPPpNf0HEhWu8R KPQ7TsySBlMbc3jvEt9kxIc1YYGbxUS583lgyfO7lZAHU2 BSem 3ALyNioOTTg0CnINSsKq3i83eomvYlk6zzhHghfqeU4YpU8o48sJRXG5AYG r
No ratings yet
APznzaYcMuGblum6dLzbrgiqMX2w4bWppprlEIRBfo09L31Bp7TZFL5IS8fAgxpO CgAMHeGkzBPPpNf0HEhWu8R KPQ7TsySBlMbc3jvEt9kxIc1YYGbxUS583lgyfO7lZAHU2 BSem 3ALyNioOTTg0CnINSsKq3i83eomvYlk6zzhHghfqeU4YpU8o48sJRXG5AYG r
60 pages
NLP_Week_02
No ratings yet
NLP_Week_02
54 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Natural Language Processing_Notes_Unit 2.docx
No ratings yet
Natural Language Processing_Notes_Unit 2.docx
19 pages
nlp 9 que
No ratings yet
nlp 9 que
10 pages
NLP QB
No ratings yet
NLP QB
7 pages
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
A Note On The Creative Aspect of Language Use - Chomsky PDF
No ratings yet
A Note On The Creative Aspect of Language Use - Chomsky PDF
13 pages
Book Suselinux Reference - en
No ratings yet
Book Suselinux Reference - en
422 pages
GORGOR AGENCY UNIVERSITIES For Agencies PDF
No ratings yet
GORGOR AGENCY UNIVERSITIES For Agencies PDF
84 pages
Customs of The Tagalogs
No ratings yet
Customs of The Tagalogs
122 pages
Reading Paragraph MC
No ratings yet
Reading Paragraph MC
6 pages
06 - 14 - ESL-Brains-From-a-pushover-to-your-own-boss-TV-8349
No ratings yet
06 - 14 - ESL-Brains-From-a-pushover-to-your-own-boss-TV-8349
7 pages
The Role of Language in Shaping Cultural Identity
No ratings yet
The Role of Language in Shaping Cultural Identity
15 pages
The English Route: Teacher's Manual
0% (1)
The English Route: Teacher's Manual
134 pages
Vocabulary: Match A To B To Make Phrases. A B
No ratings yet
Vocabulary: Match A To B To Make Phrases. A B
2 pages
334 648 1 SM
No ratings yet
334 648 1 SM
4 pages
Lesson Plan - Adhd
No ratings yet
Lesson Plan - Adhd
4 pages
WORKSHOP 3 - PIFLE-I Phase 2 - Practice-3
No ratings yet
WORKSHOP 3 - PIFLE-I Phase 2 - Practice-3
5 pages
Đề cương giữa kì 1 - anh 11 hs
No ratings yet
Đề cương giữa kì 1 - anh 11 hs
9 pages
Latex
No ratings yet
Latex
241 pages
Searching With PHP and Mysql: by Cal Henderson
No ratings yet
Searching With PHP and Mysql: by Cal Henderson
8 pages
Writing About My Future Plans
No ratings yet
Writing About My Future Plans
1 page
UDAKO TXOSTENA - DBH1 2223 Fitxategiaren Kopia
No ratings yet
UDAKO TXOSTENA - DBH1 2223 Fitxategiaren Kopia
17 pages
Making Oral Presentations: Mcgraw-Hill/Irwin
No ratings yet
Making Oral Presentations: Mcgraw-Hill/Irwin
14 pages
Drive 61 Muthoot
No ratings yet
Drive 61 Muthoot
3 pages
7 - Islamic Mysticism - A Short History - A.Knysh PDF
0% (1)
7 - Islamic Mysticism - A Short History - A.Knysh PDF
14 pages
Grammar: Understanding The Basic. London, England: Cambridge University
No ratings yet
Grammar: Understanding The Basic. London, England: Cambridge University
11 pages
RWS - Action Plan
No ratings yet
RWS - Action Plan
88 pages
SCM
No ratings yet
SCM
55 pages
Literary stylistics
No ratings yet
Literary stylistics
37 pages
Spelling Errors & Wrong Words
No ratings yet
Spelling Errors & Wrong Words
16 pages
Grade 7 English FAL (Platinum) Navigation Pack
100% (1)
Grade 7 English FAL (Platinum) Navigation Pack
65 pages
Carpet Weavers, Morocco Analysis
No ratings yet
Carpet Weavers, Morocco Analysis
14 pages
Writings and Speeches OF Sir Syed Ahmad Khan: Compiled and Edited by
No ratings yet
Writings and Speeches OF Sir Syed Ahmad Khan: Compiled and Edited by
17 pages
Using the subjunctive in Turkish Turkish Basics
No ratings yet
Using the subjunctive in Turkish Turkish Basics
5 pages

Unit Vapplications Notes

Uploaded by

Unit Vapplications Notes

Uploaded by

UNIT V APPLICATIONS 9

AI applications – Language Models – Information Retrieval- Information Extraction – Natural Language

Natural languages, such as English or Spanish, cannot be characterized as a definitive

Natural languages are also ambiguous.

N-gram character models

A written text is composed of characters—letters, digits, punctuation, and spaces

P(ci | c1:i−1) = P(ci | ci−2:i−1) .

Smoothing n-gram models

models by linear interpolation. It defines the probability estimate as

P *(ci|ci−2:i−1) = λ3P(ci|ci−2:i−1) + λ2P(ci|ci−1) + λ1P(ci) ,

With so many possible n-gram models—unigram, bigram, trigram, interpolated smoothing

An information retrieval (henceforth IR) system can be characterized by

Imagine that an IR system has

The PageRank algorithm

The PageRank for a page p is defined as:

The HITS algorithm

Finite-state automata for information extraction

A relational extraction system can be built as a series of cascaded finite-state transducers.

Probabilistic models for information extraction

Conditional random fields for information extraction

An HMM is a generative model; it models the full joint

Machine translation systems

Statistical machine translation

given a source English sentence, e,

an example of the process. At the top, the sentence ―There is a smelly

You might also like