Unit - 4 DL
Unit - 4 DL
Saves Time:
3. Improved Accuracy:
The pre-trained model may not perform well if the new task domain is too
different.
2. Overfitting:
Unit 4
Introduction to Natural Language Processing (NLP)
What is NLP?
Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that
focuses on enabling computers to understand, interpret, and respond to human
language in a meaningful way. It combines computational linguistics, machine
learning, and deep learning techniques to process and analyze text and speech
data.
2. Semantics:
3. Pragmatics:
4. Morphology:
2. Tokenization:
5. Sentiment Analysis:
6. Machine Translation:
7. Text Summarization:
8. Speech Recognition:
9. Language Generation:
Applications of NLP
1. Chatbots and Virtual Assistants:
2. Search Engines:
Google and Bing use NLP for understanding queries and ranking results.
3. Sentiment Analysis:
4. Translation Tools:
Tools like Google Translate rely on NLP for accurate language translation.
5. Healthcare:
6. Document Summarization:
Approaches to NLP
1. Rule-Based Methods:
2. Machine Learning:
3. Deep Learning:
Challenges in NLP
1. Ambiguity:
2. Context Understanding:
4. Domain-Specific Language:
5. Low-Resource Languages:
2. spaCy:
4. Gensim:
5. TextBlob:
Future of NLP
1. Improved Contextual Understanding:
More advanced models like GPT-4 and BERT improve context handling.
2. Multilingual NLP:
4. Ethical NLP:
Dimensions are typically derived from features like terms, context, or co-
occurrence frequencies.
2. Semantic Similarity:
3. Applications:
Information retrieval.
Document clustering.
Vector Creation:
Vector representation:
"cat" → [1, 0, 1]
"dog" → [1, 0, 1]
"fish" → [0, 1, 0]
2. Document Representation:
Two documents:
Term frequency:
D1: [1, 1, 1]
D2: [0, 1, 1]
Advantages of VSM
1. Simple and Effective:
2. Language Agnostic:
Limitations of VSM
1. High Dimensionality:
2. No Contextual Understanding:
3. Assumes Independence:
2. Contextual Models:
2. Document Clustering:
3. Semantic Analysis:
4. Recommender Systems:
Objective
The Skip-Gram model, introduced as part of Word2Vec, aims to predict the
context (surrounding words) given a target word.
Architecture
Input: A single target word (e.g., "dog").
Core Idea: Words that appear in similar contexts will have similar vector
representations.
Training Steps
1. Input Representation:
2. Projection Layer:
WW
3. Output Layer:
4. Optimization:
Advantages
Captures semantic similarity well.
Objective
The CBOW model predicts a target word based on its context words.
Architecture
Input: Context words (a set of surrounding words).
Core Idea: Words in similar contexts are likely to have similar meanings.
Training Steps
1. Input Representation:
2. Projection Layer:
3. Output Layer:
4. Optimization:
Advantages
Faster to train than Skip-Gram.
Objective
GloVe is a count-based method that constructs word vectors using the co-
occurrence statistics of words in a corpus.
Core Idea
Words that co-occur frequently in a corpus will have similar representations. For
example:
Key Features
Matrix Construction:
Matrix Factorization:
Objective Function:
Advantages
Combines local (context-based) and global (corpus-wide) information.
c. Downstream Tasks
Evaluate embeddings based on their performance in tasks like:
Text classification.
Sentiment analysis.
5. Applications
b. Analogy Reasoning
Knowledge extraction: Identify relationships in large datasets.
c. Sentiment Analysis
Represent words in sentiment analysis models to classify text polarity.
d. Machine Translation
Word embeddings help align representations of similar words across
languages.
Comparison of Methods
Feature Skip-Gram CBOW GloVe
1. Image Segmentation
Types:
1. Semantic Segmentation:
2. Instance Segmentation:
2. U-Net:
3. Mask R-CNN:
4. DeepLab:
2. Autonomous Vehicles:
3. Satellite Imagery:
model = unet()
model.summary()
2. Object Detection
Combines region proposal networks (RPNs) with CNNs for faster object
detection.
2. Retail:
3. Healthcare:
2. Attention Mechanism:
Allows the model to focus on specific parts of the image while generating
each word.
3. Vision-Language Transformers:
Models like CLIP and BLIP utilize transformers for improved image-text
understanding.
2. Social Media:
3. E-Commerce:
Comparison of Tasks
Generate textual
Image Sentences or Encoder-Decoder,
descriptions for
Captioning phrases Attention, Transformers
images
Conclusion
Deep learning has enabled significant advancements in computer vision tasks like
image segmentation, object detection, and automatic image captioning. These
tasks find applications in autonomous vehicles, healthcare, e-commerce, and
accessibility technologies. Modern architectures, including transformers, continue
to push the boundaries of these applications.
1. Generator:
2. Discriminator:
2. Synthetic Image:
4. Feedback:
Applications of GANs
1. Image Generation:
3. Super-Resolution:
4. Text-to-Image:
import tensorflow as tf
from tensorflow.keras.layers import Dense, Reshape, Flatten,
Conv2D, Conv2DTranspose, LeakyReLU
from tensorflow.keras.models import Sequential
# Generator model
def build_generator():
model = Sequential([
Dense(256, activation="relu", input_dim=100),
LeakyReLU(0.2),
Dense(512),
LeakyReLU(0.2),
Dense(1024),
LeakyReLU(0.2),
Dense(28 * 28 * 1, activation="tanh"),
Reshape((28, 28, 1))
])
return model
# Discriminator model
def build_discriminator():
model = Sequential([
Flatten(input_shape=(28, 28, 1)),
Dense(512),
# Compile GAN
generator = build_generator()
discriminator = build_discriminator()
discriminator.compile(optimizer="adam", loss="binary_crossent
ropy", metrics=["accuracy"])
Training GANs:
How It Works
1. Feature Extraction:
Use a CNN (e.g., ResNet, Inception) to extract spatial features from video
frames.
3. Text Generation:
2. Feature Extraction:
3. Sequence Processing:
4. Caption Generation:
Applications of Video-to-Text
1. Video Summarization:
2. Accessibility:
3. Content Recommendation:
import tensorflow as tf
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding, T
video_to_text_model = build_video_to_text_model(vocab_size=10
000)
Challenges in Video-to-Text
1. Temporal Dependencies:
2. Dataset Complexity:
Conclusion
GANs excel in generating realistic images and videos, finding applications in
data augmentation, content creation, and super-resolution tasks.
2. Channel Attention:
3. Temporal Attention:
4. Self-Attention:
1. Self-Attention
Computes attention scores between every pair of input elements.
2. Spatial Attention
Focuses on specific spatial regions of an image.
3. Channel Attention
Determines which feature maps (channels) are important.
4. Multi-Head Attention
Divides the input into multiple subspaces and computes attention for each
subspace.
How it Works:
Applications:
How it Works:
Applications:
How it Works:
Applications:
How it Works:
Applications:
5. Attention U-Net
Overview:
How it Works:
Applications:
2. Object Detection:
3. Image Segmentation:
5. Super-Resolution:
6. Anomaly Detection:
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, LayerNorm
alization, MultiHeadAttention, Dropout
from tensorflow.keras.models import Model
class VisionTransformer(Model):
def __init__(self, num_patches, projection_dim, num_head
s, transformer_units, num_classes):
super(VisionTransformer, self).__init__()
self.num_patches = num_patches
self.projection_dim = projection_dim
self.class_token = self.add_weight(shape=(1, 1, proje
ction_dim), initializer="random_normal")
self.position_embedding = self.add_weight(shape=(1, n
um_patches + 1, projection_dim), initializer="random_normal")
3. Versatility:
2. Large Datasets:
Conclusion
Attention mechanisms have significantly advanced computer vision, enabling
state-of-the-art performance in tasks like image classification, object detection,
and segmentation. While models like Vision Transformers and DETR lead the
way, hybrid approaches combining CNNs with attention mechanisms (e.g., CBAM,
SENet) continue to be effective for resource-constrained applications.