0% found this document useful (0 votes)
9 views

NLP Assignment

The document outlines two NLP problems: Continuous Bag of Words (CBOW) and the comparison between Skip-gram and GloVe models. CBOW predicts a target word based on surrounding context words, while Skip-gram generates context words from a target word, with each model having distinct training methodologies. The document highlights that Skip-gram is predictive and works well with smaller datasets, whereas GloVe is count-based and requires a larger corpus for effective training.

Uploaded by

atkalajadu69
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

NLP Assignment

The document outlines two NLP problems: Continuous Bag of Words (CBOW) and the comparison between Skip-gram and GloVe models. CBOW predicts a target word based on surrounding context words, while Skip-gram generates context words from a target word, with each model having distinct training methodologies. The document highlights that Skip-gram is predictive and works well with smaller datasets, whereas GloVe is count-based and requires a larger corpus for effective training.

Uploaded by

atkalajadu69
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

NLP Assignment

Name: Ridhyal Chauhan

Registration No: RA2211056010047


Problem-1: Continuous Bag of Words (CBOW)

(a) List of Target Words for Each Context Window


Given the sentence: "The quick brown fox jumps over the lazy dog."

With a context window size of 2, the target words and their corresponding context windows
are:

Context Window Target Word


[The, brown] quick
[quick, fox] brown
[brown, jumps] fox
[fox, over] jumps
[jumps, the] over
[over, lazy] the
[the, dog] lazy

(b) How CBOW Works


The CBOW model predicts a target word using the surrounding context words. The steps
involved are:
1. **Input Representation**: Each context word is converted into a one-hot encoded vector
or an embedding vector.
2. **Averaging the Context Vectors**: The embeddings of context words are averaged or
summed to form a single vector.
3. **Feeding into a Neural Network**: This averaged vector is passed through a neural
network (usually a single hidden layer).
4. **Output Layer (Softmax Function)**: The network outputs probabilities for all words in
the vocabulary, and the most probable word is chosen as the predicted target word.
5. **Backpropagation & Training**: The model adjusts weights based on prediction errors
to improve accuracy over multiple iterations.
Problem-2: Skip-gram vs GloVe Model

(a) Skip-gram Model (Word2Vec) Processing the Sentence


Given the sentence: "Natural language processing is amazing."

The Skip-gram model predicts context words given a target word. The steps are:

1. Target Word Selection: A word is chosen as the center (target) word.


2. Context Window Definition: With a window size of 2, it considers two words before and
after the target word.
3. Prediction Pairs Generation: The model generates training pairs in the form (target word,
context word).

For example, with a window size of 2, the Skip-gram model generates pairs like:

Target Word Context Words


Natural (language, processing)
language (Natural, processing, is)
processing (language, is, amazing)
is (processing, amazing)
amazing (is)

(b) Difference Between Skip-gram and GloVe


The Skip-gram and GloVe models differ in their approach to learning word embeddings.

1. Skip-gram Model (Word2Vec)


- Predicts context words given a target word.
- Trained using local context windows.
- Maximizes the probability of seeing correct context words for a target word.
- Performs well with small datasets and infrequent words.

2. GloVe Model
- Uses a word co-occurrence matrix instead of predicting context words.
- Trained using a global co-occurrence matrix.
- Factorizes the matrix to capture word relationships.
- Requires a large corpus for effective training.

In summary, Skip-gram is a **predictive model**, while GloVe is a **count-based model**.


Skip-gram learns embeddings through context prediction, whereas GloVe captures word
relationships by analyzing word co-occurrences across the entire corpus.

You might also like