Fine-tuned vs RAG Short Notes ?

Deep learning

Uploaded by

ranupamgupta013

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

26 views

Fine-tuned vs RAG Short Notes ?

Deep learning

Uploaded by

ranupamgupta013

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 25

Fine-tuned LLM vs RAG Short Notes #Fine-tuning a large language model (LLM) involves adapting a pre-trained model to a specific task or domain by updating its weights using a new dataset. This process is resource- intensive but enables the model to better handle specialized tasks or respond to domain-specific queries. Here's a step- by-step explanation: 1, Understand the Requirements Before fine-tuning, determine: Objective: Why fine-tune the model? Examples include sentiment analysis, summarization, or domain-specific generation, Dataset: Ensure you have a high-quality, task-specific dataset. Resources: Fine-tuning requires substantial computational power (e.g., GPUs, TPUs). 2. Prepare the EnvironmentHardware: Use a machine with multiple GPUs or TPUs. Framework: Install a deep learning framework like PyTorch or TensorFlow. Libraries: Install necessary libraries such as Hugging Face's transformers or accelerate. pip install transformers datasets accelerate 3. Select the Pre-trained Model Choose an appropriate pre-trained LLM from a library like Hugging Face Model Hub (e.9., GPT, BERT, TS). Considerations: Select a model that aligns with your task (eg., TS for summarization, GPT for generation). Model Size: Larger models provide better performance but require more resources. 4. Prepare the DatasetYour dataset should be: Task-Specific: Include input-output pairs relevant to the task. Cleaned: Remove irrelevant or noisy data. Tokenized: Use the same tokenizer as the pre-trained model. Example for text-to-text tasks: from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained(''gpt2") encoded_dataset = dataset.map(lambda x: tokenizer(x['text"'], truncation=True, padding="max_length"), batched=True) S. Define the Training Pipeline Set up the model for fine-tuning: Load Pre-trained Model: Use a model compatible with your task. Define Loss Function; Use CrossEntropyLoss for classification tasks or a task-specific loss.Choose Optimizer: Commonly used optimizers include AdamW. Scheduler: Use learning rate schedulers like linear decay with warm-up. from transformers import AutoModelForCausallLM model = AutoModelForCausallM.from_pretrained('gpt2") 6. Training Configuration Define hyperparameters: Batch Size: Balance batch size with available GPU memory. Learning Rate: Use a small learning rate (e.g., Se-S). Epochs: Train for enough epochs to reach convergence but avoid overfitting. Gradient Accumulation: Use if batch size is limited by memory.7, Leverage Accelerated Training Use libraries like Hugging Face's Accelerate for distributed training. Example: from transformers import Trainer, TrainingArguments training_args = TrainingArguments( output_dir=" /results", evaluation_strategy="epoch", learning_rate=Se-S, per_device_train_batch_size=4, per_device_eval_batch_size=4, num_train_epochs=3, save_steps=10_000, save_total_limit=2, fpl6=True, # Use mixed precision for faster training ) trainer = Trainer model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, ) trainer.trainQ8. Monitor Training Validation Loss: Monitor to prevent overfitting. Metrics: Track task-specific metrics (e.g., BLEU for translation, Fl-score for classification). 9, Save the Fine-tuned Model After training: model.save_pretrained(’'/fine_tuned_model") tokenizer.save_pretrained(''/fine_tuned_model") 10, Evaluate the Model Test the model on unseen data to assess performance. Use evaluation scripts tailored to the task.II, Optimize and Deploy Quantization: Reduce model size and inference time using techniques like ONNX or TensorRT. Deployment: Serve the model using Flask, FastAPI, or a cloud service (e.g., AWS, GCP). Example with Flask: from Hask import Flask, request, jsonify from transformers import AutoTokenizer, AutoModelForCausallLM app = Flask(__name__) model = ||_AutoModelForCausall M.from_pretrained(’'/fine_tuned_model") tokenizer = ||_AutoTokenizer.from_pretrained(’ /fine_tuned_model") @app.route("/generate", methods=["POST']) def generateQ): data = request,json inputs = tokenizer(datal"text"], return_tensors='pt") outputs = model.generateCinputsl' input_ids''], max_length=S0) return jsonifyCf response’: tokenizer.decode(outputs[0], skip_special_tokens=True) $)app.runO) 12, Maintain and Update Regularly evaluate and fine-tune the model with new data to ensure optimal performance as requirements evolve. By following these steps, you can effectively fine-tune an LLM for your specific needs. RAG: Creating a Retrieval-Augmented Generation (RAG) model involves combining a retriever component, which fetches || relevant information from a knowledge base, and a generator component, which uses the retrieved context to generate responses. This is particularly useful when dealing with domain-specific data or when the knowledge exceeds the model's capacity. Below is a detailed step-by-step guide to creating a RAG model and training it on a particular dataset. 1, Understand RAG Architecture4 RAG model has two main components: Retriever: Extracts relevant documents or knowledge snippets based on the input query. Generator: Generates answers or content using the query and the retrieved context. 2. Prerequisites Programming Language: Python. Framework: Hugging Face's transformers and datasets libraries, along with FAISS Cfor retrieval). Hardware: A GPU/TPU-enabled system is recommended for efficient training. Install required libraries: pip install transformers datasets faiss-cpu accelerate3. Prepare the Dataset Format: Organize your data into two parts: 1, Knowledge Base (KB): Contains all possible context snippets (e.g., documents, sentences). 2. Query-Answer Pairs: Training dataset with input queries and corresponding answers. For example, in JSON format: c "knowledge_base’: [ Cid": "I", "text": "Python is a versatile programming language." , Cid": "2", "text": "It is widely used in data science and Al."? J, "query_answer_pairs": [ E’query’: "What is Python?", "answer’: " programming language." ? J f Python is a versatile Load the data:from datasets import Dataset knowledge_base = Dataset.from_dict(£"text"; ["Python is a versatile programming language.", "It is widely used in data science and Al."]}) qguery_answer_pairs = Dataset.from_dict(E'query": ['What is Python?"], "answer": ["Python is a versatile programming language."]3) 4, Build the Retriever The retriever indexes the knowledge base and retrieves relevant snippets for a given query. FAISS (Facebook Al Similarity Search) is commonly used for this. | 41 Tokenize the Knowledge Base Use a pre-trained tokenizer to encode the knowledge base. from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('bert-base- uncased") knowledge_base = knowledge_base.map(lambda x: E'embeddings": tokenizer(x['text"], truncation=True, padding=True, return_tensors="np")["input_ids'']})4.2 Index the Knowledge Base Build a FAISS index for fast retrieval. import faiss import numpy as np # Convert embeddings to numpy arrays embeddings = np.arrayC[np.mean(e, axis=0) for e in knowledge_basel embeddings’ ]]) index = faiss.IndexFlatL2Cembeddings.shapel!]) index.add(embeddings) 4.3 Query the Retriever Retrieve the top-k relevant documents for a query: def retrieveCquery, top_k=S): | query_embedding = tokenizer(query, return_tensors="np'') C"input_ids"]mean(axis=) distances, indices = index.search(query_embedding, top_k) return [knowledge_baseLiJl'text'] for i in indices0]] # Example retrieved_docs = retrieve('What is Python?") print(retrieved_docs)S. Build the Generator The generator uses the query and retrieved documents to generate responses. S.1 Load a Pre-trained Generator Select a generation model such as TS, BART, or GPT. from transformers import AutoModelForSeq2SeqLM generator = AutoModelForSeg2SeqLM from_pretrained(''facebook/bart- base") $.2 Prepare Input for the Generator Combine the query and retrieved context into a single input | for the generator. def prepare_input(query, retrieved_docs): context = " "join(retrieved_docs) input_text = f'Query: Equery? Context: Econtext}" return tokenizerCinput_text, return_tensors="pt', truncation=True, padding=True) input_data = prepare_inputC'What is Python?", retrieved_docs)6. Train the RAG Model Fine-tune the generator using the guery-answer pairs with retrieved context. 6.1 Define Training Pipeline Use Hugging Face's Trainer API for training. from transformers import TrainingArguments, Trainer def preprocess_function(examples): retrieved_docs = Lretrieve(q) for q in examplesl'guery']] inputs = [prepare_input(q, docs) ["input_ids"] for q, docs in 2ipCexamplesl' guery''], retrieved_docs)] targets = tokenizer(examples[ answer’), truncation=True, | padding=True) ["input_ids"] return €"input_ids': inputs, "labels": targets} tokenized_data = query_answer_pairs.map(preprocess_function, batched=True) training_args = TrainingArgumentsC output_dir="/rag_model”, evaluation_strategy="epoch", learning_rate=Se-S, per_device_train_batch_size=8, num_train_epochs=3,Save_steps=10_000, save_total_limit=2, tpl6=True # Use mixed precision for faster training ) trainer = Trainer( model=generator, args=training_args, train_dataset=tokenized_datal'train"], eval_dataset=tokenized_datal'test''], ) trainer.trainO) 7, Evaluate the Model After training, evaluate the model using unseen queries to verify its performance. def generate_response(query): retrieved_docs = retrieve(query) input_data = prepare_input(query, retrieved_docs) output = generator.generateCinput_datal''input_ids"], max_length=50) return tokenizer.decode(output[0], skip_special_tokens=True) print (generate_responseC'What is Python?"))8. Optimize and Deploy 8.1 Optimize for Inference Convert the model to a format like ONNX for faster inference. pip install onnx transformerslonnx] 3.2 Deploy Use a web framework like Flask or FastAPl to serve the RAG model. Example: from flask import Flask, request, jsonify app = Flask(__name__) @app.route'/rag", methods=["POST"]) def rag_endpointQ): data = request,json response = generate_response(datalguery'']) return jsonify(€'response': response }) app.runO)9. Maintain and Update Periodically update the knowledge base and retrain the retriever to incorporate new data, ensuring that the RAG model remains up-to-date. By following these steps, you can create and train a RAG model on your specific dataset for tasks such as question answering, document retrieval, or domain-specific chat applications. Fine-tuned vs RAG The choice between using a fine-tuned model and a Retrieval-Augmented Generation (RAG) system depends on || the nature of the problem, the data, and your goals. Below is a detailed explanation of when to use each approach: When to Use a Fine-Tuned Model A fine-tuned model is a pre-trained model (e.g. GPT, TS, BERT) that has been specifically adjusted to perform well on a particular task using a labeled dataset.Use Cases for Fine-Tuning 1, Domain-Specific Tasks with Limited Context Size: When your task involves answering questions, generating content, or classification on a small to medium-sized dataset. Example: Classifying medical texts or generating chatbot responses in a closed domain like banking. 2. Well-Defined and Repetitive Tasks: For tasks with clear patterns and predictable outputs, where the model can learn to mimic these patterns. Example: Converting product descriptions into summaries. 3, When Data Is Fully Labeled: If you have a dataset with input-output pairs for supervised training. Example: Translating text, summarizing documents, or predicting customer sentiment.4. No Requirement for External Knowledge: If the task relies only on the information contained in the fine-tuned model's weights. Example: Sentiment analysis, code generation for simple algorithms. S. Model Deployment in Controlled Environments: When you're confident that the fine-tuned model will perform well in your use case without needing external knowledge. Example: Predicting financial trends using historical data. Advantages of Fine-Tuning: Performance: Can achieve high accuracy for specific tasks when trained with sufficient data, Efficiency: Simpler architecture; no need to maintain external retrieval systems. Self-Contained; Does not rely on external data or knowledge bases, making it easier to deploy.Challenges of Fine-Tuning: Limited Knowledge: The model cannot access updated or external knowledge after training. Data Dependency: Requires large, high-quality labeled datasets for fine-tuning. Costly Updates: Retraining is necessary whenever new data is introduced. When to Use a RAG Model A Retrieval-Augmented Generation (RAG) model combines a || retriever (e.g., FAISS, Elasticsearch) to fetch external context and a generator (e.g., GPT, BART) to generate answers based on the retrieved context. Use Cases for RAG 1. Tasks Requiring Up-to-Date Information: When the knowledge required to answer questions frequently changes or is too large to be stored in the model's weights. Example: Answering questions about current events, companypolicies, or legal updates. 2. Large Knowledge Base: When the domain-specific knowledge exceeds the capacity of a fine-tuned model. Example: Technical support systems for complex products, where the knowledge base contains hundreds of thousands of documents. 3. Open-Domain Question Answering: For generating responses in scenarios where the possible questions span a wide range of topics. Example: A chatbot for customer queries across various industries. 4, Resource-Constrained Fine-Tuning: When fine-tuning a large model is infeasible due to hardware or data constraints.Example: Using RAG to leverage external documents without retraining the generator. S. Dynamic or Contextual Knowledge Retrieval: When answers depend on context retrieved from specific data sources (eg., databases, APIs, or documents). Example: Personalized recommendations or context-aware assistants. 6. Tasks Requiring Interpretability: When you need transparency about where the information comes from. Example: In healthcare or legal applications, the retriever can show the source of the information. Advantages of RAG: Scalability: Can handle massive, dynamic knowledge bases.Up-to-Date: Easily updated by modifying the retriever's indexed knowledge base. Interpretability: Retrieved documents can justify or support generated answers. Cost Efficiency: No need to fine-tune the generator for every dataset; update only the knowledge base. Challenges of RAG: Complexity: Requires maintaining both a retriever and generator, making the system harder to manage. Dependency on Retriever: Performance depends heavily on the retriever's ability to fetch relevant documents. | Inference Latency: Retrieving documents can add significant time to the inference process. Knowledge Base Maintenance: Keeping the knowledge base accurate and comprehensive is crucial. Key DifferencesWhen to Use Both Together In some cases, you can combine both approaches: Fine-Tune the Generator in a RAG System: Fine-tune the generator on your specific domain to improve its ability to work with retrieved knowledge. Example: A chatbot for legal advice where the generator is fine-tuned on legal terminology while still retrieving documents dynamically. By carefully assessing your task's requirements, data characteristics, and resource availability, you can choose between fine-tuning, RAG, or a hybrid approach for optimal results.

Offline Practice Booklet 2
100% (1)
Offline Practice Booklet 2
26 pages
How To Create A Private ChatGPT With Your Own Data
No ratings yet
How To Create A Private ChatGPT With Your Own Data
11 pages
21. Deep learning for industries
No ratings yet
21. Deep learning for industries
45 pages
building RAG apps
No ratings yet
building RAG apps
32 pages
AI Concepts and Viva Prep Updated
No ratings yet
AI Concepts and Viva Prep Updated
16 pages
Hugging Face
100% (1)
Hugging Face
11 pages
Parameter Efficient Fine
No ratings yet
Parameter Efficient Fine
14 pages
Medical Text Classifier GabrieldeOlaguibel
No ratings yet
Medical Text Classifier GabrieldeOlaguibel
12 pages
Full Fine-Tuning, PEFT, Prompt Engineering, or RAG
No ratings yet
Full Fine-Tuning, PEFT, Prompt Engineering, or RAG
23 pages
FineTune OPUS MT Engine
No ratings yet
FineTune OPUS MT Engine
9 pages
Pre-Trained Models: Objectives
No ratings yet
Pre-Trained Models: Objectives
12 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
8 pages
Over Description About The Model
No ratings yet
Over Description About The Model
3 pages
dl lab_merged (2)
No ratings yet
dl lab_merged (2)
60 pages
PDL Final Assignment-3 Aryan
No ratings yet
PDL Final Assignment-3 Aryan
8 pages
Building Deep Neural Network
No ratings yet
Building Deep Neural Network
17 pages
Little Guide To Building Large Language Models in 2024
100% (1)
Little Guide To Building Large Language Models in 2024
65 pages
Keras For Beginners: Implementing A Recurrent Neural Network
No ratings yet
Keras For Beginners: Implementing A Recurrent Neural Network
13 pages
Chapter04 - Getting Started With Neural Networks
No ratings yet
Chapter04 - Getting Started With Neural Networks
9 pages
www.anyscale.com
No ratings yet
www.anyscale.com
78 pages
Introducing Transformers Agents 20
No ratings yet
Introducing Transformers Agents 20
8 pages
cl12_huggingface
No ratings yet
cl12_huggingface
34 pages
Bert T
No ratings yet
Bert T
2 pages
Final Code
No ratings yet
Final Code
16 pages
566f0619-9145-4b8f-b12b-cb8a5b0cd30d
No ratings yet
566f0619-9145-4b8f-b12b-cb8a5b0cd30d
17 pages
RLDL128
No ratings yet
RLDL128
73 pages
Deep Learning With PyTorch
No ratings yet
Deep Learning With PyTorch
19 pages
Little Guide To Building Large Language Models in 2024
No ratings yet
Little Guide To Building Large Language Models in 2024
65 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
Code Explanation
No ratings yet
Code Explanation
8 pages
Centralized_LLM_Fine-tuning
No ratings yet
Centralized_LLM_Fine-tuning
4 pages
Rag
No ratings yet
Rag
10 pages
15 Ways to Lower LLM Costs
No ratings yet
15 Ways to Lower LLM Costs
17 pages
NN From Scratch
No ratings yet
NN From Scratch
5 pages
Image Classification With Convolutional Neural Networks: Plotting
No ratings yet
Image Classification With Convolutional Neural Networks: Plotting
16 pages
Sentence Embedding Code
No ratings yet
Sentence Embedding Code
9 pages
Cache-Augmented Generation (CAG) in LLMs_ A Step-by-Step Tutorial _ by Ronan Takizawa _ Jan, 2025 _ Medium
No ratings yet
Cache-Augmented Generation (CAG) in LLMs_ A Step-by-Step Tutorial _ by Ronan Takizawa _ Jan, 2025 _ Medium
15 pages
Creating an AI that can operate without an internet connection
No ratings yet
Creating an AI that can operate without an internet connection
11 pages
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
No ratings yet
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
12 pages
Chapter07 Working-With-Keras
No ratings yet
Chapter07 Working-With-Keras
12 pages
Deep Neural Network Application
No ratings yet
Deep Neural Network Application
17 pages
Module V
No ratings yet
Module V
19 pages
CV lab manual (1)
No ratings yet
CV lab manual (1)
126 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
Text Classification_movie Review_news Wires
No ratings yet
Text Classification_movie Review_news Wires
5 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
Fine-Tune & Evaluate LLMs in 2024 With Amazon SageMaker
No ratings yet
Fine-Tune & Evaluate LLMs in 2024 With Amazon SageMaker
12 pages
LAB SHEET 1 Basics
No ratings yet
LAB SHEET 1 Basics
5 pages
Cv prince
No ratings yet
Cv prince
120 pages
rag
No ratings yet
rag
20 pages
Deep Learning
No ratings yet
Deep Learning
46 pages
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
100% (1)
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
39 pages
Cad and Dog 2
No ratings yet
Cad and Dog 2
5 pages
Building Finetuning Aimodels
No ratings yet
Building Finetuning Aimodels
41 pages
Lesson1 Notes Fastai
No ratings yet
Lesson1 Notes Fastai
18 pages
Langchain Onepager
No ratings yet
Langchain Onepager
1 page
Deep Learning Record
No ratings yet
Deep Learning Record
70 pages
Code
No ratings yet
Code
10 pages
dlweek7
No ratings yet
dlweek7
9 pages
AI driven system design Notes ?
No ratings yet
AI driven system design Notes ?
85 pages
Data Engineering - Dimensional Modelling
No ratings yet
Data Engineering - Dimensional Modelling
52 pages
120 Deep Learning Important Questions + Answers ?
No ratings yet
120 Deep Learning Important Questions + Answers ?
68 pages
Git Notes ?-1
No ratings yet
Git Notes ?-1
71 pages
Notes of Deep learning top architectures_
No ratings yet
Notes of Deep learning top architectures_
13 pages

Fine-tuned vs RAG Short Notes ?

Uploaded by

Fine-tuned vs RAG Short Notes ?

Uploaded by

You might also like