Instant download Transformers for Natural Language Processing and Computer Vision, Third Edition Denis Rothman pdf all chapter
Instant download Transformers for Natural Language Processing and Computer Vision, Third Edition Denis Rothman pdf all chapter
com
https://textbookfull.com/product/transformers-for-natural-
language-processing-and-computer-vision-third-edition-denis-
rothman/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/feature-extraction-and-image-
processing-for-computer-vision-fourth-edition-aguado/
textboxfull.com
https://textbookfull.com/product/feature-extraction-and-image-
processing-for-computer-vision-4th-edition-mark-nixon/
textboxfull.com
https://textbookfull.com/product/natural-language-processing-for-
electronic-design-automation-mathias-soeken/
textboxfull.com
https://textbookfull.com/product/natural-language-processing-1st-
edition-jacob-eisenstein/
textboxfull.com
https://textbookfull.com/product/image-processing-and-computer-vision-
in-ios-oge-marques/
textboxfull.com
Transformers for Natural
Language Processing and
Computer Vision
Third Edition
Denis Rothman
BIRMINGHAM—MUMBAI
Transformers for Natural Language Processing
and Computer Vision
Third Edition
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-80512-872-4
www.packt.com
Contributors
Denis then authored an AI resource optimizer for IBM and luxury brands,
leading to an Advanced Planning and Scheduling (APS) solution used
worldwide.
https://www.packt.link/Transformers
Contents
Preface
Who this book is for
What this book covers
To get the most out of this book
Get in touch
1. What Are Transformers?
How constant time complexity O(1) changed our lives forever
O(1) attention conquers O(n) recurrent methods
Attention layer
Recurrent layer
The magic of the computational time complexity of an
attention layer
Computational time complexity with a CPU
Computational time complexity with a GPU
Computational time complexity with a TPU
TPU-LLM
A brief journey from recurrent to attention
A brief history
From one token to an AI revolution
From one token to everything
Foundation Models
From general purpose to specific tasks
The role of AI professionals
The future of AI professionals
What resources should we use?
Decision-making guidelines
The rise of transformer seamless APIs and assistants
Choosing ready-to-use API-driven libraries
Choosing a cloud platform and transformer model
Summary
Questions
References
Further reading
2. Getting Started with the Architecture of the Transformer Model
The rise of the Transformer: Attention Is All You Need
The encoder stack
Input embedding
Positional encoding
Sublayer 1: Multi-head attention
Sublayer 2: Feedforward network
The decoder stack
Output embedding and position encoding
The attention layers
The FFN sublayer, the post-LN, and the linear layer
Training and performance
Hugging Face transformer models
Summary
Questions
References
Further reading
3. Emergent vs Downstream Tasks: The Unseen Depths of Transformers
The paradigm shift: What is an NLP task?
Inside the head of the attention sublayer of a transformer
Exploring emergence with ChatGPT
Investigating the potential of downstream tasks
Evaluating models with metrics
Accuracy score
F1-score
MCC
Human evaluation
Benchmark tasks and datasets
Defining the SuperGLUE benchmark tasks
Running downstream tasks
The Corpus of Linguistic Acceptability (CoLA)
Stanford Sentiment TreeBank (SST-2)
Microsoft Research Paraphrase Corpus (MRPC)
Winograd schemas
Summary
Questions
References
Further reading
4. Advancements in Translations with Google Trax, Google Translate,
and Gemini
Defining machine translation
Human transductions and translations
Machine transductions and translations
Evaluating machine translations
Preprocessing a WMT dataset
Preprocessing the raw data
Finalizing the preprocessing of the datasets
Evaluating machine translations with BLEU
Geometric evaluations
Applying a smoothing technique
Translations with Google Trax
Installing Trax
Creating the Original Transformer model
Initializing the model using pretrained weights
Tokenizing a sentence
Decoding from the Transformer
De-tokenizing and displaying the translation
Translation with Google Translate
Translation with a Google Translate AJAX API Wrapper
Implementing googletrans
Translation with Gemini
Gemini’s potential
Summary
Questions
References
Further reading
5. Diving into Fine-Tuning through BERT
The architecture of BERT
The encoder stack
Preparing the pretraining input environment
Pretraining and fine-tuning a BERT model
Fine-tuning BERT
Defining a goal
Hardware constraints
Installing Hugging Face Transformers
Importing the modules
Specifying CUDA as the device for torch
Loading the CoLA dataset
Creating sentences, label lists, and adding BERT tokens
Activating the BERT tokenizer
Processing the data
Creating attention masks
Splitting the data into training and validation sets
Converting all the data into torch tensors
Selecting a batch size and creating an iterator
BERT model configuration
Loading the Hugging Face BERT uncased base model
Optimizer grouped parameters
The hyperparameters for the training loop
The training loop
Training evaluation
Predicting and evaluating using the holdout dataset
Exploring the prediction process
Evaluating using the Matthews correlation coefficient
Matthews correlation coefficient evaluation for the whole
dataset
Building a Python interface to interact with the model
Saving the model
Creating an interface for the trained model
Interacting with the model
Summary
Questions
References
Further reading
6. Pretraining a Transformer from Scratch through RoBERTa
Training a tokenizer and pretraining a transformer
Building KantaiBERT from scratch
Step 1: Loading the dataset
Step 2: Installing Hugging Face transformers
Step 3: Training a tokenizer
Step 4: Saving the files to disk
Step 5: Loading the trained tokenizer files
Step 6: Checking resource constraints: GPU and CUDA
Step 7: Defining the configuration of the model
Step 8: Reloading the tokenizer in transformers
Step 9: Initializing a model from scratch
Exploring the parameters
Step 10: Building the dataset
Step 11: Defining a data collator
Step 12: Initializing the trainer
Step 13: Pretraining the model
Step 14: Saving the final model (+tokenizer + config) to disk
Step 15: Language modeling with FillMaskPipeline
Pretraining a Generative AI customer support model on X data
Step 1: Downloading the dataset
Step 2: Installing Hugging Face transformers
Step 3: Loading and filtering the data
Step 4: Checking Resource Constraints: GPU and CUDA
Step 5: Defining the configuration of the model
Step 6: Creating and processing the dataset
Step 7: Initializing the trainer
Step 8: Pretraining the model
Step 9: Saving the model
Step 10: User interface to chat with the Generative AI agent
Further pretraining
Limitations
Next steps
Summary
Questions
References
Further reading
7. The Generative AI Revolution with ChatGPT
GPTs as GPTs
Improvement
Diffusion
New application sectors
Self-service assistants
Development assistants
Pervasiveness
The architecture of OpenAI GPT transformer models
The rise of billion-parameter transformer models
The increasing size of transformer models
Context size and maximum path length
From fine-tuning to zero-shot models
Stacking decoder layers
GPT models
OpenAI models as assistants
ChatGPT provides source code
GitHub Copilot code assistant
General-purpose prompt examples
Getting started with ChatGPT – GPT-4 as an assistant
1. GPT-4 helps to explain how to write source code
2. GPT-4 creates a function to show the YouTube
presentation of GPT-4 by Greg Brockman on March 14,
2023
3. GPT-4 creates an application for WikiArt to display
images
4. GPT-4 creates an application to display IMDb
reviews
5. GPT-4 creates an application to display a newsfeed
6. GPT-4 creates a k-means clustering (KMC) algorithm
Getting started with the GPT-4 API
Running our first NLP task with GPT-4
Steps 1: Installing OpenAI and Step 2: Entering the API
key
Step 3: Running an NLP task with GPT-4
Key hyperparameters
Running multiple NLP tasks
Retrieval Augmented Generation (RAG) with GPT-4
Installation
Document retrieval
Augmented retrieval generation
Summary
Questions
References
Further reading
8. Fine-Tuning OpenAI GPT Models
Risk management
Fine-tuning a GPT model for completion (generative)
1. Preparing the dataset
1.1. Preparing the data in JSON
1.2. Converting the data to JSONL
2. Fine-tuning an original model
3. Running the fine-tuned GPT model
4. Managing fine-tuned jobs and models
Before leaving
Summary
Questions
References
Further reading
9. Shattering the Black Box with Interpretable Tools
Transformer visualization with BertViz
Running BertViz
Step 1: Installing BertViz and importing the modules
Step 2: Load the models and retrieve attention
Step 3: Head view
Step 4: Processing and displaying attention heads
Step 5: Model view
Step 6: Displaying the output probabilities of attention
heads
Streaming the output of the attention heads
Visualizing word relationships using attention scores
with pandas
exBERT
Interpreting Hugging Face transformers with SHAP
Introducing SHAP
Explaining Hugging Face outputs with SHAP
Transformer visualization via dictionary learning
Transformer factors
Introducing LIME
The visualization interface
Other interpretable AI tools
LIT
PCA
Running LIT
OpenAI LLMs explain neurons in transformers
Limitations and human control
Summary
Questions
References
Further reading
10. Investigating the Role of Tokenizers in Shaping Transformer Models
Matching datasets and tokenizers
Best practices
Step 1: Preprocessing
Step 2: Quality control
Step 3: Continuous human quality control
Word2Vec tokenization
Case 0: Words in the dataset and the dictionary
Case 1: Words not in the dataset or the dictionary
Case 2: Noisy relationships
Case 3: Words in a text but not in the dictionary
Case 4: Rare words
Case 5: Replacing rare words
Exploring sentence and WordPiece tokenizers to understand the
efficiency of subword tokenizers for transformers
Word and sentence tokenizers
Sentence tokenization
Word tokenization
Regular expression tokenization
Treebank tokenization
White space tokenization
Punkt tokenization
Word punctuation tokenization
Multi-word tokenization
Subword tokenizers
Unigram language model tokenization
SentencePiece
Byte-Pair Encoding (BPE)
WordPiece
Exploring in code
Detecting the type of tokenizer
Displaying token-ID mappings
Analyzing and controlling the quality of token-ID
mappings
Summary
Questions
References
Further reading
11. Leveraging LLM Embeddings as an Alternative to Fine-Tuning
LLM embeddings as an alternative to fine-tuning
From prompt design to prompt engineering
Fundamentals of text embedding with NLKT and Gensim
Installing libraries
1. Reading the text file
2. Tokenizing the text with Punkt
Preprocessing the tokens
3. Embedding with Gensim and Word2Vec
4. Model description
5. Accessing a word and vector
6. Exploring Gensim’s vector space
7. TensorFlow Projector
Implementing question-answering systems with embedding-based
search techniques
1. Installing the libraries and selecting the models
2. Implementing the embedding model and the GPT model
2.1 Evaluating the model with a knowledge base: GPT
can answer questions
2.2 Add a knowledge base
2.3 Evaluating the model without a knowledge base:
GPT cannot answer questions
3. Prepare search data
4. Search
5. Ask
5.1.Example question
5.2.Troubleshooting wrong answers
Transfer learning with Ada embeddings
1. The Amazon Fine Food Reviews dataset
1.2. Data preparation
2. Running Ada embeddings and saving them for future
reuse
3. Clustering
3.1. Find the clusters using k-means clustering
3.2. Display clusters with t-SNE
4. Text samples in the clusters and naming the clusters
Summary
Questions
References
Further reading
12. Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-
4
Getting started with cutting-edge SRL
Entering the syntax-free world of AI
Defining SRL
Visualizing SRL
SRL experiments with ChatGPT with GPT-4
Basic sample
Difficult sample
Questioning the scope of SRL
The challenges of predicate analysis
Redefining SRL
From task-specific SRL to emergence with ChatGPT
1. Installing OpenAI
2. GPT-4 dialog function
3. SRL
Sample 1 (basic)
Sample 2 (basic)
Sample 3 (basic)
Sample 4 (difficult)
Sample 5 (difficult)
Sample 6 (difficult)
Summary
Questions
References
Further reading
13. Summarization with T5 and ChatGPT
Designing a universal text-to-text model
The rise of text-to-text transformer models
A prefix instead of task-specific formats
The T5 model
Text summarization with T5
Hugging Face
Selecting a Hugging Face transformer model
Initializing the T5-large transformer model
Getting started with T5
Exploring the architecture of the T5 model
Summarizing documents with T5-large
Creating a summarization function
A general topic sample
The Bill of Rights sample
A corporate law sample
From text-to-text to new word predictions with OpenAI ChatGPT
Comparing T5 and ChatGPT’s summarization methods
Pretraining
Specific versus non-specific tasks
Summarization with ChatGPT
Summary
Questions
References
Further reading
14. Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2
Architecture
Pathways
Client
Resource manager
Intermediate representation
Compiler
Scheduler
Executor
PaLM
Parallel layer processing that increases training speed
Shared input-output embeddings, which saves memory
No biases, which improves training stability
Rotary Positional Embedding (RoPE) improves model
quality
SwiGLU activations improve model quality
PaLM 2
Improved performance, faster, and more efficient
Scaling laws, optimal model size, and the number of
parameters
State-of-the-art (SOA) performance and a new training
methodology
Assistants
Gemini
Google Workspace
Google Colab Copilot
Vertex AI PaLM 2 interface
Vertex AI PaLM 2 assistant
Vertex AI PaLM 2 API
Question answering
Question-answer task
Summarization of a conversation
Sentiment analysis
Multi-choice problems
Code
Fine-tuning
Creating a bucket
Fine-tuning the model
Summary
Questions
References
Further reading
15. Guarding the Giants: Mitigating Risks in Large Language Models
The emergence of functional AGI
Cutting-edge platform installation limitations
Auto-BIG-bench
WandB
When will AI agents replicate?
Function: `create_vocab`
Process:
Function: `scrape_wikipedia`
Process:
Function: `create_dataset`
Process:
Classes: `TextDataset`, `Encoder`, and `Decoder`
Function: `count_parameters`
Function: `main`
Process:
Saving and Executing the Model
Risk management
Hallucinations and memorization
Memorization
Risky emergent behaviors
Disinformation
Influence operations
Harmful content
Privacy
Cybersecurity
Risk mitigation tools with RLHF and RAG
1. Input and output moderation with transformers and a rule
base
2. Building a knowledge base for ChatGPT and GPT-4
Adding keywords
3. Parsing the user requests and accessing the KB
4. Generating ChatGPT content with a dialog function
Token control
Moderation
Summary
Questions
References
Further reading
16. Beyond Text: Vision Transformers in the Dawn of Revolutionary AI
From task-agnostic models to multimodal vision transformers
ViT – Vision Transformer
The basic architecture of ViT
Step 1: Splitting the image into patches
Step 2: Building a vocabulary of image patches
Step 3: The transformer
Vision transformers in code
A feature extractor simulator
The transformer
Configuration and shapes
CLIP
The basic architecture of CLIP
CLIP in code
DALL-E 2 and DALL-E 3
The basic architecture of DALL-E
Getting started with the DALL-E 2 and DALL-E 3 API
Creating a new image
Creating a variation of an image
From research to mainstream AI with DALL-E
GPT-4V, DALL-E 3, and divergent semantic association
Defining divergent semantic association
Creating an image with ChatGPT Plus with DALL-E
Implementing the GPT-4V API and experimenting with DAT
Example 1: A standard image and text
Example 2: Divergent semantic association, moderate
divergence
Example 3: Divergent semantic association, high
divergence
Summary
Questions
References
Further Reading
17. Transcending the Image-Text Boundary with Stable Diffusion
Transcending image generation boundaries
Part I: Defining text-to-image with Stable Diffusion
1. Text embedding using a transformer encoder
2. Random image creation with noise
3. Stable Diffusion model downsampling
4. Decoder upsampling
5. Output image
Running the Keras Stable Diffusion implementation
Part II: Running text-to-image with Stable Diffusion
Generative AI Stable Diffusion for a Divergent Association
Task (DAT)
Part III: Video
Text-to-video with Stability AI animation
Text-to-video, with a variation of OpenAI CLIP
A video-to-text model with TimeSformer
Preparing the video frames
Putting the TimeSformer to work to make predictions on the
video frames
Summary
Questions
References
Further reading
18. Hugging Face AutoTrain: Training Vision Models without Coding
Goal and scope of this chapter
Getting started
Uploading the dataset
No coding?
Training models with AutoTrain
Deploying a model
Running our models for inference
Retrieving validation images
The program will now attempt to classify the validation
images. We will see how a vision transformer reacts to
this image.
Inference: image classification
Validation experimentation on the trained models
ViTForImageClassification
SwinForImageClassification 1
BeitForImage Classification
SwinForImageClassification 2
ConvNextForImageClassification
ResNetForImageClassification
Trying the top ViT model with a corpus
Summary
Questions
References
Further reading
19. On the Road to Functional AGI with HuggingGPT and its Peers
Defining F-AGI
Installing and importing
Validation set
Level 1 image: easy
Level 2 image: difficult
Level 3 image: very difficult
HuggingGPT
Level 1: Easy
Level 2: Difficult
Level 3: Very difficult
CustomGPT
Google Cloud Vision
Level 1: Easy
Level 2: Difficult
Level 3: Very difficult
Model chaining: Chaining Google Cloud Vision to ChatGPT
Model Chaining with Runway Gen-2
Midjourney: Imagine a ship in the galaxy
Gen-2: Make this ship sail the sea
Summary
Questions
References
Further Reading
20. Beyond Human-Designed Prompts with Generative Ideation
Part I: Defining generative ideation
Automated ideation architecture
Scope and limitations
Part II: Automating prompt design for generative image design
ChatGPT/GPT-4 HTML presentation
ChatGPT with GPT-4 provides the text for the
presentation
ChatGPT with GPT-4 provides a graph in HTML to
illustrate the presentation
Llama 2
A brief introduction to Llama 2
Implementing Llama 2 with Hugging Face
Midjourney
Discord API for Midjourney
Microsoft Designer
Part III: Automated generative ideation with Stable Diffusion
1. No prompt: Automated instruction for GPT-4
2. Generative AI (prompt generation) using ChatGPT with
GPT-4
3. and 4. Generative AI with Stable Diffusion and displaying
images
The future is yours!
The future of development through VR-AI
The groundbreaking shift: Parallelization of
development through the fusion of VR and AI
Opportunities and risks
Summary
Questions
References
Further reading
Appendix: Answers to the Questions
Other Books You May Enjoy
Index
Preface
For the past few years, we have been witnessing the expansion of social
networks versus physical encounters, e-commerce versus physical
shopping, digital newspapers, streaming versus physical theaters, remote
doctor consultations versus physical visits, remote work instead of on-site
tasks, and similar trends in hundreds more domains. This digital activity is
now increasingly driven by transformer copilots in hundreds of
applications.
Think of how many humans it would take to control the content of the
billions of messages posted on social networks per day to decide if they are
legal and ethical before extracting the information they contain.
You will learn the architecture of the Original Transformer, Google BERT,
GPT-4, PaLM 2, T5, ViT, Stable Diffusion, and several other models. You
will fine-tune transformers, train models from scratch, and learn to use
powerful APIs.
You will keep close to the market and its demand for language
understanding in many fields, such as media, social media, and research
papers, for example. You will learn how to improve Generative AI models
with Retrieval Augmented Generation (RAG), embedding-based
searches, prompt engineering, and automated ideation with AI-generated
prompts.
Throughout the book, you will work hands-on with Python, PyTorch, and
TensorFlow. You will be introduced to the key AI language understanding
neural network models. You will then learn how to explore and implement
transformers.
You will learn the skills required not only to adapt to the present market but
also to acquire the vision to face innovative projects and AI evolutions. This
book aims to give readers both the knowledge and the vision to select the
right models and environment for any given project.
Who this book is for
This book is not an introduction to Python programming or machine
learning concepts. Instead, it focuses on deep learning for machine
translation, speech-to-text, text-to-speech, language modeling, question
answering, and many more NLP domains, as well as computer vision
multimodal tasks.
Readers who can benefit the most from this book are:
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com