0% found this document useful (0 votes)
18 views

TSA@1

Uploaded by

kiruthikaa T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

TSA@1

Uploaded by

kiruthikaa T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

1.

INFORMATION RETRIVAL SYSTEM:


Information Retrieval (IR) can be defined as a software program that deals
with the organization, storage, retrieval, and evaluation of information from
document repositories, particularly textual information.
IR Model?
An Information Retrieval (IR) model selects and ranks the document that is
required by the user or the user has asked for in the form of a query. The
documents and the queries are represented in a similar manner, so that
document selection and ranking can be formalized by a matching function
that returns a retrieval status value (RSV) for each document in the
collection. Many of the Information Retrieval systems represent document
contents by a set of descriptors, called terms, belonging to a vocabulary V.
The estimation of the probability of user’s relevance rel for each
document d and query q with respect to a set R q of training documents:
Prob (rel|d, q, Rq)

The User Task: The information first is supposed to be translated into a query
by the user. In the information retrieval system, there is a set of words that
convey the semantics of the information that is required whereas, in a data
retrieval system, a query expression is used to convey the constraints which
are satisfied by the objects. Example: A user wants to search for something
but ends up searching with another thing. This means that the user is
browsing and not searching. The above figure shows the interaction of the
user through different tasks.
 Logical View of the Documents: A long time ago, documents were
represented through a set of index terms or keywords. Nowadays,
modern computers represent documents by a full set of words which
reduces the set of representative keywords. This can be done by
eliminating stopwords i.e. articles and connectives. These operations are
text operations. These text operations reduce the complexity of the
document representation from full text to set of index terms.
Past, Present, and Future of Information Retrieval
1. Early Developments: As there was an increase in the need for a lot of
information, it became necessary to build data structures to get faster
access. The index is the data structure for faster retrieval of information.
Over centuries manual categorization of hierarchies was done for indexes.
2. Information Retrieval In Libraries: Libraries were the first to adopt IR
systems for information retrieval. In first-generation, it consisted, automation
of previous technologies, and the search was based on author name and
title. In the second generation, it included searching by subject heading,
keywords, etc. In the third generation, it consisted of graphical interfaces,
electronic forms, hypertext features, etc.
3. The Web and Digital Libraries: It is cheaper than various sources of
information, it provides greater access to networks due to digital
communication and it gives free access to publish on a larger medium.
Advantages of Information Retrieval
1. Efficient Access: Information retrieval techniques make it possible for users
to easily locate and retrieve vast amounts of data or information.
2. Personalization of Results: User profiling and personalization techniques
are used in information retrieval models to tailor search results to individual
preferences and behaviors.
3. Scalability: Information retrieval models are capable of handling increasing
data volumes.
4. Precision: These systems can provide highly accurate and relevant search
results, reducing the likelihood of irrelevant information appearing in search
results.
Disadvantages of Information Retrieval
1. Information Overload: When a lot of information is available, users often
face information overload, making it difficult to find the most useful and
relevant material.
2. Lack of Context: Information retrieval systems may fail to understand the
context of a user’s query, potentially leading to inaccurate results.
3. Privacy and Security Concerns: As information retrieval systems often
access sensitive user data, they can raise privacy and security concerns.
4. Maintenance Challenges: Keeping these systems up-to-date and effective
requires ongoing efforts, including regular updates, data cleaning, and
algorithm adjustments.
5. Bias and fairness: Ensuring that information retrieval systems do not exhibit
biases and provide fair and unbiased results is a crucial challenge, especially
in contexts like web search engines and recommendation systems
2.Recurrent Neural Networks (RNN):
RNNs are a type of neural network that can be used to model sequence
data. RNNs, which are formed from feedforward networks, are similar to
human brains in their behaviour. Simply said, recurrent neural
networks can anticipate sequential data in a way that other algorithms
can’t.

All of the inputs and outputs in standard neural networks are independent of one

another, however in some circumstances, such as when predicting the next word of

a phrase, the prior words are necessary, and so the previous words must be

remembered. As a result, RNN was created, which used a Hidden Layer to

overcome the problem. The most important component of RNN is the Hidden state,

which remembers specific information about a sequence.

RNNs have a Memory that stores all information about the calculations. It employs

the same settings for each input since it produces the same outcome by performing

the same task on all inputs or hidden layers.

RNN Architecture

RNNs are a type of neural network that has hidden states and allows past outputs to

be used as inputs. They usually go like this:


Here’s a breakdown of its key components:

 Input Layer: This layer receives the initial element of the sequence data. For

example, in a sentence, it might receive the first word as a vector representation.

 Hidden Layer: The heart of the RNN, the hidden layer contains a set of

interconnected neurons. Each neuron processes the current input along with the

information from the previous hidden layer’s state. This “state” captures the

network’s memory of past inputs, allowing it to understand the current element in

context.

 Activation Function: This function introduces non-linearity into the network,

enabling it to learn complex patterns. It transforms the combined input from the

current input layer and the previous hidden layer state before passing it on.

 Output Layer: The output layer generates the network’s prediction based on the

processed information. In a language model, it might predict the next word in the

sequence.

 Recurrent Connection: A key distinction of RNNs is the recurrent connection within

the hidden layer. This connection allows the network to pass the hidden state

information (the network’s memory) to the next time step. It’s like passing a baton in

a relay race, carrying information about previous inputs forward

The Architecture of a Traditional RNN

RNNs are a type of neural network that has hidden states and allows past outputs to

be used as inputs. They usually go like this:


RNN architecture can vary depending on the problem you’re trying to solve. From

those with a single input and output to those with many (with variations between).

Below are some examples of RNN architectures :

 One To One: There is only one pair here. A one-to-one architecture is used in

traditional neural networks.

 One To Many: A single input in a one-to-many network might result in numerous

outputs. One too many networks are used in the production of music, for example.
 Many To One: In this scenario, a single output is produced by combining many

inputs from distinct time steps. Sentiment analysis and emotion identification use

such networks, in which the class label is determined by a sequence of words.

 Many To Many: For many to many, there are numerous options. Two inputs yield

three outputs. Machine translation systems, such as English to French or vice versa

translation systems, use many to many networks.

Advantages and disadvantages of RNN

Advantages of RNNs:

 Handle sequential data effectively, including text, speech, and time series.

 Process inputs of any length, unlike feedforward neural networks.

 Share weights across time steps, enhancing training efficiency.

Disadvantages of RNNs:

 Prone to vanishing and exploding gradient problems, hindering learning.

 Training can be challenging, especially for long sequences.

 Computationally slower than other neural network architectures.


3.ANALYSIS IN NATURAL LANGUAGE PROCESSING:

Lexical Analysis

 Definition: Breaks down language into units (lexemes) such as words, and categorizes
them into parts of speech (POS).
 Example: The sentence "The cat is sleeping." can be broken down into words: "The"
(determiner), "cat" (noun), "is" (verb), "sleeping" (verb).

Syntactic Analysis

 Definition: Examines the structure of sentences to ensure grammatical correctness.


 Example: The sentence "She loves reading books" is grammatically correct. Conversely,
"She love reading books" is not correct due to subject-verb agreement error.

Semantic Analysis

 Definition: Determines the literal meaning of words and sentences.


 Example: The sentence "The apple is blue" is syntactically correct but semantically
incorrect because apples are not typically blue.

Discourse Integration

 Definition: Considers previous sentences to clarify the meaning of ambiguous language.


 Example: In the text "John went to the park. He enjoyed it," discourse integration
clarifies that "it" refers to the park.

Pragmatic Analysis

 Definition: Understands the intended meaning behind language, beyond the literal
interpretation.
 Example: The phrase "It's raining cats and dogs" is understood pragmatically to mean
it’s raining heavily, not literally that animals are falling from the sky.
4. the calculated bi-gram probabilities and the final
probability for the sentence "They play in a big garden" using
a bi-gram language model with Laplace smoothing.

You might also like