0% found this document useful (0 votes)
41 views3 pages

Week 9

Uploaded by

Netaji Gandi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views3 pages

Week 9

Uploaded by

Netaji Gandi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

DEEP LEARNING WEEK 9

1. Which of the following is a disadvantage of one hot encoding?


a) It requires a large amount of memory to store the vectors
b) It can result in a high-dimensional sparse representation
c) It cannot capture the semantic similarity between words
d) All of the above
Answer: d) All of the above
Explanation: One hot encoding has several disadvantages. It requires a large amount of
memory to store the vectors, it can result in a high-dimensional sparse representation, and it
cannot capture the semantic similarity between words. .
2. Which of the following is true about the input representation in the CBOW model?
a. Each word is represented as a one-hot vector
b. Each word is represented as a continuous vector
c. Each word is represented as a sequence of one-hot vectors
d. Each word is represented as a sequence of continuous vectors
Answer: a. Each word is represented as a one-hot vector
Solution: In the CBOW model, each word in the context is represented as a one-hot vector,
which is then multiplied by a weight matrix to obtain a continuous vector representation.
These vector representations are then averaged to obtain a single vector representation of the
context.
3. Which of the following is an advantage of the CBOW model compared to the Skip-gram
model?
a. It is faster to train
b. It requires less memory
c. It performs better on rare words
d. All of the above
Answer: a) It is faster to train
Solution: The CBOW model is faster to train than the Skip-gram model because it involves
predicting a single target word given its context, whereas the Skip-gram model involves
predicting multiple context words given a single target word.
4. Which of the following is an advantage of using the skip-gram method over the bag-of-words
approach?
a) The skip-gram method is faster to train
b) The skip-gram method performs better on rare words
c) The bag-of-words approach is more accurate
d) The bag-of-words approach is better for short texts
Answer: b)
Solution: The skip-gram method performs better on rare words.
5. What is the role of the softmax function in the skip-gram method?
a) To calculate the dot product between the target word and the context words
b) To transform the dot product into a probability distribution
c) To calculate the distance between the target word and the context words

1
d) To adjust the weights of the neural network during training
Answer: b) To transform the dot product into a probability distribution
Solution: The softmax function is used in the skip-gram method to transform the dot
product between the target word and the context words into a probability distribution. This
distribution represents the likelihood of seeing each context word given the target word, and
is used to train the model by minimizing the cross-entropy loss between the predicted and
actual distributions.
6. Suppose we are learning the representations of words using Glove representations. If we
observe that the cosine similarity between two representations vi and vj for words ‘i’ and ‘j’
is very high. which of the following statements is true?( parameter bi = 0.02 and bj = 0.05

a)Xij = 0.03.
b)Xij = 0.8.
c)Xij = 0.35.
d)Xij = 0.

Answer: b)
Solution: Since the word representations are similar we know viT vj is high but
viT vj = Xij − bi − bj . Hence Xij is high but the only high value for Xij is 0.8
7. We add incorrect pairs into our corpus to maximize the probability of words that occur in
the same context and minimize the probability of words that occur in different contexts.
This technique is called-

a)Hierarchical softmax
b)Contrastive estimation
c)Negative sampling
d)Glove representations

Answer: c)
Solution: The process of adding incorrect pair to the training set is called negative sampling.
8. What is the computational complexity of computing the softmax function in the output layer
of a neural network?
a) O(n)
b) O(n2 )
c) O(nlogn)
d) O(logn)
Answer: a)
Explanation: The computational complexity of computing the softmax function in the
output layer of a neural network is O(n), where n is the number of output classes.
9. How does Hierarchical Softmax reduce the computational complexity of computing the
softmax function?
a) It replaces the softmax function with a linear function
b) It uses a binary tree to approximate the softmax function
c) It uses a heuristic to compute the softmax function faster

2
d) It does not reduce the computational complexity of computing the softmax function
Answer: b)
Explanation: Hierarchical Softmax uses a binary tree to approximate the softmax function.
This reduces the computational complexity of computing the softmax function from O(n) to
O(log n).
10. What is the disadvantage of using Hierarchical Softmax?
a) It requires more memory to store the binary tree
b) It is slower than computing the softmax function directly
c) It is less accurate than computing the softmax function directly
d) It is more prone to overfitting than computing the softmax function directly
Answer: a)
Explanation: The disadvantage of using Hierarchical Softmax is that it requires more
memory to store the binary tree. This can be a problem when dealing with large datasets or
models with a large number of output classes.

You might also like