The document discusses various natural language processing algorithms including translated sentence mining, paraphrase mining, semantic textual similarity, semantic search, and clustering. These algorithms are used to find semantically similar sentences across languages, rephrase questions and texts, determine the similarity between questions and answers, improve search accuracy by understanding search intent, and categorize data to speed up access to answers.
The document discusses various natural language processing algorithms including translated sentence mining, paraphrase mining, semantic textual similarity, semantic search, and clustering. These algorithms are used to find semantically similar sentences across languages, rephrase questions and texts, determine the similarity between questions and answers, improve search accuracy by understanding search intent, and categorize data to speed up access to answers.
The benefit of the algorithm in the The idea of algorithms Algorithms NAME
project
Translated Sentence Mining
It is used to find sentences with the same describes the process of finding the closest similar sentences between Translated meaning from different languages. When we search for a question in a specific language, we several different languages. For Sentence Mining اسراء رجب عبد1- can find the answer in a source that is not from example, one set of sentences from one الرازق عرفان the language of the question. language and another set from a different language. We want to find all the similar sentences between the two languages, so we used translated sentence mining. The overall project can benefit It is based on the idea image search أسماء عبدالباسط-2 and facilitate the search process, as it of converting the image to its فتحى عبدالباسط is possible to search for an image in vector and converting the text this book with the help of this to its vector, then comparing algorithm them and extracting the most appropriate sentences for this image Paraphrase mining is the task of We use it to rephrase the question in more finding paraphrases (texts with identical / similar meaning) in a اشراق أشرف السيد3- than one possible way in different terms, عبد الحليم large corpus of sentences but with the same meaning Paraphrase It is restatement of a text, passage, It can also be used a lot in the translation Mining or work giving the meaning in process from one language to another another form because the translation gives the same meaning in different ways, so we use it Given a list of sentences / texts, when the question is in one language and this function performs paraphrase the answer is in another language, then it is mining. It compares all sentences translated into the question language to against all other sentences and give the same desired meaning returns a list with the pairs that have the highest cosine similarity score Cross- / Bi- Bi-Encoder vs. Cross-Encoder:: Encoders
Bi-Encoders produce for a given
sentence a sentence embedding. فاطمة السعيد-4 We pass to a BERT independently محمد شومان the sentences A and B, which result in the sentence embeddings u and v. These sentence embedding can then be compared using cosine similarity: In contrast, for a Cross- using a Cross-Encoder for semantic textual Encoder, we pass both sentences similarity (STS). simultaneously to the Transformer network. It produces than an output We Known the similarity degree between value between 0 and 1 indicating the Question and answer. the similarity of the input sentence pair:
For example:
computes the score between a
query and all possible
sentences in a corpus using a
Cross-Encoder for semantic textual similarity (STS).
It output then the most similar
sentences for the given query. Semantic search is a data searching technique in a which a search أية كمال أيوب-5 query aims to not only find محمد used to improve search accuracy by Semantic keywords, but to determine the understanding the content of the search search intent and contextual meaning of query. the the words a person is using for search.
It is provides more meaningful
search results by evaluating and understanding the search phrase and finding the most relevant results in a website, database or any other data repository. Retrieve & ايمان ايمن محمد-6 You can use this framework to compute سليمان sentence / text embeddings for more than 100 question answering retrieval improved Re-Rank by using Retrieve & Re-Rank. we first languages. These embeddings can then be use a retrieval system that retrieves a compared e.g. with cosine-similarity to find large list of e.g. 100 we can use either sentences with a similar meaning. This can be lexical search, e.g. with ElasticSearch, useful for semantic textual similar, semantic or we can use dense retrieval with a bi- search, or paraphrase mining. encoder. A re-ranker based on a Cross- Encoder ايمان محمد على-7 The cluster will be searshed for advanced Clustering categories close to the answer and therefore the speed of acess to the answer of the question. 1Convert data from mixed public data to specific features
2Dividing mixed data into seprate
categories based on his youthful qualities together.