SlideShare a Scribd company logo
Welcome to ServerlessToronto.org
1
Introduce Yourself:
- Where from? Why are you here?
- Looking for work or Offering work?
Help us serve you better: bit.ly/slsto
An Evening with Mark Ryan and Jerry Liu
• 6:00 - 6:10 Networking & Opening remarks
• 6:10 - 6:35 Mark Ryan: The LLM Landscape
• 6:35 - 7:15 Jerry Liu: Solving Core Challenges
in RAG Pipelines
• 7:15 - 7:45 Q&A
• 7:45 - 8:00 Manning Publications raffle
Why this Generative AI Talk?
2
1. Navigating the Tsunami: Understand the sweeping
changes the "GenAI tsunami“ brings to industries and jobs.
2. Situational Awareness: Learn from AI leaders Mark Ryan
& Jerry Liu to gain a strategic view of the LLM and RAG
landscape.
3. Career Transformation: Learn to position yourself as the
architect of automation rather than its subject.
4. Practical Advice: Acquire actionable strategies to apply
Generative AI within your enterprise.
5. Interactive Learning: Engage in live Q&A to discuss and
clarify your AI dilemmas with experts.
Battle of Waterloo
What is Serverless Toronto about?
3
Serverless became New Agile & Mindset
#1 We started as Back-
end FaaS Developers
who enjoyed 'gluing
together' other people's
APIs and Managed
Services
#3 We're obsessed
with creating business
value (meaningful
Products), focusing on
Outcomes/Impact –
NOT Outputs
#2 We build bridges
between Serverless
Community (“Dev leg”),
and Front-end, Voice-First
& UX folks (“UX leg”)
#4 Achieve agility NOT by
“sprinting” faster but working
smarter (by using bigger
building blocks & less Ops)
1
2
3
4
Serverless is a State of Mind…
4
Way too often, we – the IT folks,
have obsession with “pimping up
our cars” (infrastructure / code /
pipelines) instead of “driving
business” forward & taking them
places ☺
... It is a way to focus on business value.
5
It can be applied to any Tech stack, even On-Prem
Jared Short:
1. If the platform has it, use it
2. If the market has it, buy it
3. If you can reconsider requirements, do it
4. If you have to build it, own it.
Ben Kehoe: Serverless is about how you make
decisions, not about your choices.
Upcoming ServerlessToronto.org Meetups
6
Friday Lunch & Learn, April 19 Monday evening, May 6
Summer 2024
Knowledge Sponsor
1. Go to www.manning.com
2. Select *any* e-Book, Video course, or liveProject you want!
3. Add it to your shopping cart (no more than 1 item in the cart)
4. Raffle winners will send me the emails (used in Manning portal),
5. So the publisher can move it to your Dashboard – as if purchased.
Fill out the Survey to win: bit.ly/slsto
8
Feature Presentations:
LLM Landscape
A Journey Through A Year of Evolution
Mark Ryan
Developer Knowledge Platform AI Lead, Google Cloud
ryanmark2014@gmail.com
Generative AI
Milestones
Major Generative AI Milestones: Part 1
Jun 2017
Attention Is All You Need:
Seminal paper from Google
that introduced transformers
Oct 2018
BERT: Google
transformer-based
language model
Feb 2019
GPT-2: OpenAI LLM
May 2020
GPT-3: OpenAI LLM
Aug 2021
Codex: OpenAI code model
Apr 2022
DALLE 2: OpenAI image
model
Jan 2021
DALLE: OpenAI image model
May 2021
LaMDA: Google LLM
May 2022
Imagen: Google image model
PaLM: Google LLM
Gato: DeepMind multimodal model
Aug 2022
Stable Diffusion: Image
model
Major Generative AI Milestones: Part 2
Nov 2022
ChatGPT: Consumer chat from
OpenAI initially featuring GPT 3.5
Feb 2023
Bard: Consumer chat from
Google
Mar 2023
GPT-4: OpenAI flagship model
ChatGPT Plugins: Connect to
third-party applications
Apr 2023
CodeWhisperer:
AWS AI coding
assistant
July 2023
Llama 2: Meta open source LLM licensed for
commercial use.
Code Interpreter: OpenAI integrated sandbox
environment for data upload and analysis
Aug 2023
Duet AI: AI Assistant for Google
Cloud, including chat in console,
and general purpose (VSCode)
and SQL (Big Query) code
completion/interpretation
Sept 2023
DALLE 3: OpenAI image
model
Dec 2023
Gemini: Google flagship
multimodal (text / image /
video) models
Feb 2024
Gemini Pro 1.5: 1M context
multimodal model
Gemma: Google open model
Sora: OpenAI text to video
May 2023
Vertex AI Gen AI: including curated set
of Google, third-party, and open models
PaLM 2: Google flagship model
Nov 2023
Q: AWS chatbot
Grok: X chatbot
Mar 2024
Claude 3: Anthropic flagship models
Devin: Cognition SWE AI
Ecosystem and
Vendor Landscape
The Emerging LLM Ecosystem
Examples Description Use Case
Vector
databases
● Pinecone
● Chroma
● Vertex AI Vector
Search
Store and find associations
between embeddings,
high-dimensional vector
representations of data
Grounding LLM responses in a
set of documents (example of
RAG)
Encapsulated
coding
environments
OpenAI Code Interpreter /
Advanced Data Analysis
Upload datasets & ask questions
to get visualizations and code
running in a limited Python
instance
Ad hoc data analysis
Plugins /
extensions
● ChatGPT plugins /
GPTs
● Vertex AI extensions
Connect LLMs to third-party /
external applications
Access current data / query &
modify data that is external to
the LLM
LLM app
development
frameworks
● LangChain
● LlamaIndex
● Autogen
LLM-centric framework to manage
workflow (data sources, agents,
models, etc)
Assembling LLM-based
applications
Generative AI Landscape by Vendor
Vendor Prod. Suite
Assistance
Developer / Ops
Assistant
Consumer
Chat
Enterprise Gen
AI
Dev / Hobbyist
Gen AI
Open Foundation
Models
Google Gemini for
Google
Workspace
Duet AI for
Google Cloud
Gemini Vertex AI Google AI for
Developers
Gemma
Microsoft CoPilot 365 Github Copilot Bing Chat Azure OpenAI
OpenAI ChatGPT ChatGPT ChatGPT
Enterprise
ChatGPT
AWS ● Q
● CodeWhisperer
Bedrock / Titan
Anthropic Claude 3* Claude 3*
Meta Llama 2*
Mistral Mixtral 8x7B*
Twitter: @MarkRyanMkm
LinkedIn:
www.linkedin.com/in/mark
-ryan-31826743
YouTube:
@markryan2475
RAG in 2024
Jerry Liu, LlamaIndex co-founder/CEO
LlamaIndex: Context Augmentation for your LLM app
Paradigms for inserting knowledge
Retrieval Augmentation - Fix the model, put context into the prompt
LLM
Before college the two main
things I worked on, outside of
school, were writing and
programming. I didn't write
essays. I wrote what
beginning writers were
supposed to write then, and
probably still are: short
stories. My stories were awful.
They had hardly any plot, just
characters with strong
feelings, which I imagined
made them deep...
Input Prompt
Here is the context:
Before college the
two main things…
Given the context,
answer the following
question:
{query_str}
Paradigms for inserting knowledge
Fine-tuning - baking knowledge into the weights of the network
LLM
Before college the two main things
I worked on, outside of school,
were writing and programming. I
didn't write essays. I wrote what
beginning writers were supposed to
write then, and probably still are:
short stories. My stories were
awful. They had hardly any plot,
just characters with strong feelings,
which I imagined made them
deep...
RLHF, Adam, SGD, etc.
RAG Stack
Current RAG Stack for building a QA System
Vector
Database
Doc
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
LLM
Data Ingestion / Parsing Data Querying
5 Lines of Code in LlamaIndex!
Current RAG Stack (Data Ingestion/Parsing)
Vector
Database
Doc
Chunk
Chunk
Chunk
Chunk
Process:
● Split up document(s) into even chunks.
● Each chunk is a piece of raw text.
● Generate embedding for each chunk (e.g.
OpenAI embeddings, sentence_transformer)
● Store each chunk into a vector database
Current RAG Stack (Querying)
Vector
Database
Chunk
Chunk
Chunk
LLM
Process:
● Find top-k most similar chunks from vector
database collection
● Plug into LLM response synthesis module
Current RAG Stack (Querying)
Vector
Database
Chunk
Chunk
Chunk
LLM
Process:
● Find top-k most similar chunks from vector
database collection
● Plug into LLM response synthesis module
Retrieval Synthesis
Response Synthesis
Create and refine
Response Synthesis
Tree Summarize
Quickstart
https://colab.research.google.com/drive/1knQpGJLHj-LTTHqlZhgcjDH5F_nJIiY0?
usp=sharing
Challenges with “Naive” RAG
RAG
Data Parsing & Ingestion Data Querying
Index
Data
Data Parsing +
Ingestion
Retrieval
LLM +
Prompts
Response
Naive RAG
PyPDF
Sentence
Splitting
Chunk Size 256
Simple QA
Prompt
Dense Retrieval
Top-k = 5
Index
Data
Data Parsing +
Ingestion
Retrieval
LLM +
Prompts
Response
Easy to Prototype, Hard to Productionize
Naive RAG approaches tend to work well for simple questions over a simple,
small set of documents.
● “What are the main risk factors for Tesla?” (over Tesla 2021 10K)
● “What did the author do during his time at YC?” (Paul Graham essay)
Easy to Prototype, Hard to Productionize
But productionizing RAG over more questions and a larger set of data is hard!
Failure Modes:
● Response Quality: Bad Retrieval, Bad Response Generation
● Hard to Improve: Too many parameters to tune
● Systems: Latency, Cost, Security
Easy to Prototype, Hard to Productionize
But productionizing RAG over more questions and a larger set of data is hard!
Failure Modes:
● Response Quality: Bad Retrieval, Bad Response Generation
● Hard to Improve: Too many parameters to tune
● Systems: Latency, Cost, Security
Challenges with Naive RAG (Response Quality)
● Bad Retrieval
○ Low Precision: Not all chunks in retrieved set are relevant
■ Hallucination + Lost in the Middle Problems
○ Low Recall: Now all relevant chunks are retrieved.
■ Lacks enough context for LLM to synthesize an answer
○ Outdated information: The data is redundant or out of date.
Challenges with Naive RAG (Response Quality)
● Bad Retrieval
○ Low Precision: Not all chunks in retrieved set are relevant
■ Hallucination + Lost in the Middle Problems
○ Low Recall: Now all relevant chunks are retrieved.
■ Lacks enough context for LLM to synthesize an answer
○ Outdated information: The data is redundant or out of date.
● Bad Response Generation
○ Hallucination: Model makes up an answer that isn’t in the context.
○ Irrelevance: Model makes up an answer that doesn’t answer the question.
○ Toxicity/Bias: Model makes up an answer that’s harmful/offensive.
Difference with Traditional Software
Data Extract Response
Traditional software is defined by a set of programmatic rules.
Given an input, you can easily reason about the expected output.
Transform Load
Difference with Traditional Software
AI-powered software is defined by a
black-box set of parameters.
It is really hard to reason about what the
function space looks like.
The model parameters are tuned, the
surrounding parameters (prompt templates)
are not.
Index
Data
Data Parsing +
Ingestion
Retrieval
LLM +
Prompts
Response
Difference with Traditional Software
If one component of the system is a
black-box, all components of the system
become black boxes.
The more components, the more parameters
you have to tune.
Index
Data
Data Parsing +
Ingestion
Retrieval
LLM +
Prompts
Response
Difference with Traditional Software
If one component of the system is a
black-box, all components of the system
become black boxes.
Every parameter affects the performance of
the end system.
Index
Data
Data Parsing +
Ingestion
Retrieval
LLM +
Prompts
Response
RAG
There’s Too Many Parameters
Every parameter affects the performance of
the entire RAG pipeline.
Which parameters should a user tune?
There’s too many options!
Index
Data
Data Parsing +
Ingestion
Retrieval
LLM +
Prompts
Response
Which PDF parser
should I use?
How do I chunk my
documents?
How do I process
embedded tables and
charts?
Which embedding
model should I use?
What retrieval
parameters should I
use?
Dense retrieval or
sparse?
Which LLM should I
use?
Mapping Pain Points to Solutions
Solution
Categorize by pain point, and establish best practices
Solution
Categorize by pain point, and establish best practices
“Seven Failure Points When Engineering a Retrieval
Augmented Generation System”, Barnett et al.
Solution
Categorize by pain point, and establish best practices
“12 RAG Pain Points and Proposed Solutions”, by Wenqi Glantz
Pain Points
Response Quality Related
1. Context Missing in the Knowledge
Base
2. Context Missing in the Initial
Retrieval Pass
3. Context Missing After Reranking
4. Context Not Extracted
5. Output is in Wrong Format
6. Output has Incorrect Level of
Specificity
7. Output is Incomplete
Pain Points
Scalability
8. Can't Scale to Larger Data Volumes
11. Rate-Limit Errors
Security
12. LLM Security
Use Case Specific
9. Ability to QA Tabular Data
10. Ability to Parse PDFs
Pain Points
Scalability
8. Can't Scale to Larger Data Volumes
11. Rate-Limit Errors
Security
12. LLM Security
Use Case Specific
9. Ability to QA Tabular Data
10. Ability to Parse PDFs
Let’s figure out solutions
1. Context Missing in the Knowledge Base
Clean your data: Pick a good
document parser (more on this
later!)
Add in Metadata: inject global
context to each chunk
Keep your data updated: Setup a
recurring data ingestion pipeline.
Upsert documents to prevent
duplicates.
2. Context Missing in the Initial Retrieval Pass
Solution: Hyperparameter tuning for chunk
size and top-k
Solution: Reranking
Source: ColBERT
3. Context Missing After Reranking
Solution: try out fancier retrieval methods
(small-to-big, auto-merging, auto-retrieval,
ensembling, …)
Solution: fine-tune your embedding models
to task-specific data
4. Context is there, but not extracted by the LLM
The context is there,
but the LLM doesn’t
understand it.
“Lost in the middle”
Problems.
https://x.com/GregKamradt/status/1722386725635580292?s=20
4. Context is there, but not extracted by the LLM
Solution: Prompt Compression
(LongLLMLingua)
Solution: LongContextReorder LongLLMLingua by Jiang et al.
4. Context is there, but not extracted by the LLM
Solution: Prompt Compression
(LongLLMLingua)
Solution: LongContextReorder LongLLMLingua by Jiang et al.
5. Output is in Wrong Format
A lot of use cases require outputting the
answer in JSON format.
Solutions:
Better text prompting/output parsing
Use OpenAI function calling + JSON mode
Use token-level prompting (LMQL, Guidance)
Source: Guidance
7. Incomplete Answer
What if you have a complex multi-part
question?
Naive RAG is primarily good for answering
simple questions about specific facts.
7. Incomplete Answer
Solution: Add Agentic Reasoning
Agents? RAG
Query Response
Simple
Lower Cost
Lower Latency
Advanced
Higher Cost
Higher Latency
Routing
One-Shot Query
Planning
Tool Use
ReAct
Dynamic
Planning +
Execution
8. Scaling your Data Pipeline
Pain points:
● Processing thousands/millions of docs is slow
● How do we efficiently handle document updates?
8. Scaling your Data Pipeline
Pain points:
● Processing thousands/millions of docs is slow
● How do we efficiently handle document updates?
Reference Production Ingestion Stack
● Parallelize document processing
● HuggingFace TEI
● RabbitMQ Message Queue
● AWS EKS clusters
https://github.com/run-llama/llamaindex_aws_ingestion
10. Proper RAG over Complex Documents
How do we model complex docs
with embedded tables?
RAG with naive chunking +
retrieval → leads to hallucinations!
Embedded Table
Advanced Retrieval:
Embedded Tables
Advanced Retrieval:
Embedded Tables
Instead: model data hierarchically.
Index tables/figures by their
summaries.
The only missing component:
how do I parse out the tables from
the data?
Most PDF Parsing is Inadequate
Extracts into a
messy format that is
impossible to pass
down into more
advanced
ingestion/retrieval
algorithms.
Introducing LlamaParse
A genAI-native parser
designed to let you build
RAG over complex
documents
https://github.com/run-llama/llam
a_parse
Introducing LlamaParse
Capabilities
✅ Extracts tables / charts
✅ Input natural language parsing
instructions
✅JSON mode
✅Image Extraction
✅Support for ~10+ document types
(.pdf, .pptx, .docx, .xml)
Current PDFReader Llama Parse
LlamaParse Results
The best parser at table extraction == the only parser for advanced RAG
Expanded: https://drive.google.com/file/d/1fyQAg7nOtChQzhF2Ai7HEeKYYqdeWsdt/view?usp=sharing
Steerability
Default (no instructions)
Steerability
With Instructions
What’s next for RAG: Agents?
RAG
Query Response
From RAG to Agents
From RAG to Agents
Agents? RAG
Query Response
Agents?
Agents?
Agents? RAG
Query Response
From RAG to Agents
Agents?
Agent Definition: Using LLMs for automated reasoning and tool selection
RAG is just one Tool: Agents can decide to use RAG with other tools
Agents?
From Simple to Advanced Agents
Simple
Lower Cost
Lower Latency
Advanced
Higher Cost
Higher Latency
Routing
One-Shot Query
Planning
Tool Use
ReAct
Dynamic
Planning +
Execution
Routing
Simplest form of agentic
reasoning.
Given user query and set of
choices, output subset of
choices to route query to.
Routing
Use Case: Joint QA and
Summarization
Guide
Query Planning
Break down query into
parallelizable sub-queries.
Each sub-query can be
executed against any set of
RAG pipelines
Uber 10-K chunk 4
top-2
Uber 10-K chunk 8
Lyft 10-K chunk 4
Lyft 10-K chunk 8
Compare revenue growth of
Uber and Lyft in 2021
Uber 10-K
Lyft 10-K
Describe revenue growth
of Uber in 2021
Describe revenue
growth of Lyft in 2021
top-2
Query Planning
Example: Compare
revenue of Uber and Lyft in
2021
Query Planning Guide
Uber 10-K chunk 4
top-2
Uber 10-K chunk 8
Lyft 10-K chunk 4
Lyft 10-K chunk 8
Compare revenue growth of
Uber and Lyft in 2021
Uber 10-K
Lyft 10-K
Describe revenue growth
of Uber in 2021
Describe revenue
growth of Lyft in 2021
top-2
Tool Use
Use an LLM to call an API
Infer the parameters of that
API
Tool Use
In normal RAG you just pass
through the query.
But what if you used the
LLM to infer all the
parameters for the API
interface?
A key capability in many QA
use cases (auto-retrieval,
text-to-SQL, and more)
This is cool but
● How can an agent tackle sequential multi-part problems?
● How can an agent maintain state over time?
This is cool but
● How can an agent tackle sequential multi-part problems?
○ Let’s make it loop
● How can an agent maintain state over time?
○ Let’s add basic memory
Data Agents - Core Components
Agent Reasoning Loop
● ReAct Agent (any LLM)
● OpenAI Agent (only OAI)
Tools
Query Engine Tools (RAG
pipeline)
LlamaHub Tools (30+ tools to
external services)
ReAct: Reasoning + Acting with LLMs
Source: https://react-lm.github.io/
ReAct: Reasoning + Acting with LLMs
Add a loop around
query
decomposition + tool
use
ReAct: Reasoning + Acting with LLMs
Superset of query
planning + routing
capabilities.
ReAct + RAG Guide
Can we make this even better?
● Stop being so short-sighted - plan ahead at each step
● Parallelize execution where we can
LLMCompiler
Kim et al. 2023
An agent compiler
for parallel
multi-function
planning +
execution.
LLMCompiler
Plan out steps
beforehand, and
replan as necessary
LLMCompiler Agent
Tree-based Planning
Tree of Thoughts
(Yao et al. 2023)
Reasoning via
Planning (Hao et al.
2023)
Language Agent
Tree Search (Zhou
et al. 2023)
Additional Requirements
● Observability: see the full trace of the agent
○ Observability Guide
● Control: Be able to guide the intermediate steps of an agent step-by-step
○ Lower-Level Agent API
● Customizability: Define your own agentic logic around any set of tools.
○ Custom Agent Guide
○ Custom Agent with Query Pipeline Guide
Additional Requirements
Possible through our
query pipeline syntax
Query Pipeline Guide
What’s next for RAG: Long Contexts?
Is RAG Dead?
https://x.com/Francis_YAO_/status/1759962812229800012?s=20
Gemini 1.5 Pro has a 1-10M
context window.
What does this mean for RAG?
Our Position
1. Frameworks are valuable whether or not RAG lives or dies
2. Certain RAG concepts will go away, but others will remain and evolve
Long Context LLMs will Solve the Following
1. Developers will worry less about tuning chunking algorithms
2. Developers will need to spend less time tuning retrieval and
chain-of-thought over single documents
3. Summarization will be easier
4. Personalized memory will be better and easier to build
Some Challenges Remain
1. 10M tokens is not enough for large document corpuses (hundreds of
MB, GB)
2. Embedding models are lagging behind in context length
3. Cost and Latency
4. A KV Cache takes up a significant amount of GPU memory, and has
sequential dependencies
New RAG Architectures
1. Small to Big Retrieval over Documents
2. Intelligent Routing for Latency/Cost Tradeoffs
3. Retrieval Augmented KV Caching
Small to Big Retrieval over Documents
Intelligent Routing for Latency/Cost Tradeoffs
Retrieval Augmented KV Caching
www.ServerlessToronto.org
Reducing the gap between IT and Business needs
Ad

More Related Content

What's hot (20)

MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
ssuser4edc93
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
Aggregage
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
Knoldus Inc.
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
Databricks
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
Maxim Salnikov
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
Julien SIMON
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
David Rostcheck
 
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptxNeo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
Avinash Patil
 
Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing
Vishal Sharma
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf
Dung Hoang
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
Databricks
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with Transparency
Debmalya Biswas
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
ssuser4edc93
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
Aggregage
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
Knoldus Inc.
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
Databricks
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
Maxim Salnikov
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
Julien SIMON
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
David Rostcheck
 
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptxNeo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
Avinash Patil
 
Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing
Vishal Sharma
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf
Dung Hoang
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
Databricks
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with Transparency
Debmalya Biswas
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann
 

Similar to All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (LlamaIndex).pdf (20)

What Web Framework To Use?
What Web Framework To Use?What Web Framework To Use?
What Web Framework To Use?
Kasra Khosravi
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
RAG Techniques – for engineering student
RAG Techniques – for engineering studentRAG Techniques – for engineering student
RAG Techniques – for engineering student
ÑïshĶãrsʜ Shäh
 
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Tobias Schneck
 
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Tobias Schneck
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
Andrew Musselman
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
Stanford University
 
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsBreaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Jason Anderson
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
rohitcse52
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startup
Dzung Nguyen
 
'The Art & Science of LLM Reliability - Building Trustworthy AI Systems' by M...
'The Art & Science of LLM Reliability - Building Trustworthy AI Systems' by M...'The Art & Science of LLM Reliability - Building Trustworthy AI Systems' by M...
'The Art & Science of LLM Reliability - Building Trustworthy AI Systems' by M...
Daniel Zivkovic
 
Fed Up Of Framework Hype Dcphp
Fed Up Of Framework Hype DcphpFed Up Of Framework Hype Dcphp
Fed Up Of Framework Hype Dcphp
Tony Bibbs
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
NadinaLisbon1
 
wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expe...
wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expe...wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expe...
wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expe...
Svetlin Ivanov
 
Cinci ug-january2011-anti-patterns
Cinci ug-january2011-anti-patternsCinci ug-january2011-anti-patterns
Cinci ug-january2011-anti-patterns
Steven Smith
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 
What Web Framework To Use?
What Web Framework To Use?What Web Framework To Use?
What Web Framework To Use?
Kasra Khosravi
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
RAG Techniques – for engineering student
RAG Techniques – for engineering studentRAG Techniques – for engineering student
RAG Techniques – for engineering student
ÑïshĶãrsʜ Shäh
 
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Tobias Schneck
 
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Tobias Schneck
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
Andrew Musselman
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
Stanford University
 
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsBreaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Jason Anderson
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
rohitcse52
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startup
Dzung Nguyen
 
'The Art & Science of LLM Reliability - Building Trustworthy AI Systems' by M...
'The Art & Science of LLM Reliability - Building Trustworthy AI Systems' by M...'The Art & Science of LLM Reliability - Building Trustworthy AI Systems' by M...
'The Art & Science of LLM Reliability - Building Trustworthy AI Systems' by M...
Daniel Zivkovic
 
Fed Up Of Framework Hype Dcphp
Fed Up Of Framework Hype DcphpFed Up Of Framework Hype Dcphp
Fed Up Of Framework Hype Dcphp
Tony Bibbs
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
NadinaLisbon1
 
wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expe...
wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expe...wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expe...
wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expe...
Svetlin Ivanov
 
Cinci ug-january2011-anti-patterns
Cinci ug-january2011-anti-patternsCinci ug-january2011-anti-patterns
Cinci ug-january2011-anti-patterns
Steven Smith
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 
Ad

More from Daniel Zivkovic (20)

AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdfAI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
Daniel Zivkovic
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Daniel Zivkovic
 
Opinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & BuildersOpinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & Builders
Daniel Zivkovic
 
Google Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionGoogle Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data edition
Daniel Zivkovic
 
Conversational Document Processing AI with Rui Costa
Conversational Document Processing AI with Rui CostaConversational Document Processing AI with Rui Costa
Conversational Document Processing AI with Rui Costa
Daniel Zivkovic
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
Daniel Zivkovic
 
Gojko's 5 rules for super responsive Serverless applications
Gojko's 5 rules for super responsive Serverless applicationsGojko's 5 rules for super responsive Serverless applications
Gojko's 5 rules for super responsive Serverless applications
Daniel Zivkovic
 
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettRetail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Daniel Zivkovic
 
What's new in Serverless at AWS?
What's new in Serverless at AWS?What's new in Serverless at AWS?
What's new in Serverless at AWS?
Daniel Zivkovic
 
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersIntro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Daniel Zivkovic
 
Empowering Developers to be Healthcare Heroes
Empowering Developers to be Healthcare HeroesEmpowering Developers to be Healthcare Heroes
Empowering Developers to be Healthcare Heroes
Daniel Zivkovic
 
Get started with Dialogflow & Contact Center AI on Google Cloud
Get started with Dialogflow & Contact Center AI on Google CloudGet started with Dialogflow & Contact Center AI on Google Cloud
Get started with Dialogflow & Contact Center AI on Google Cloud
Daniel Zivkovic
 
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Daniel Zivkovic
 
Smart Cities of Italy: Integrating the Cyber World with the IoT
Smart Cities of Italy: Integrating the Cyber World with the IoTSmart Cities of Italy: Integrating the Cyber World with the IoT
Smart Cities of Italy: Integrating the Cyber World with the IoT
Daniel Zivkovic
 
Running Business Analytics for a Serverless Insurance Company - Joe Emison & ...
Running Business Analytics for a Serverless Insurance Company - Joe Emison & ...Running Business Analytics for a Serverless Insurance Company - Joe Emison & ...
Running Business Analytics for a Serverless Insurance Company - Joe Emison & ...
Daniel Zivkovic
 
This is my Architecture to prevent Cloud Bill Shock
This is my Architecture to prevent Cloud Bill ShockThis is my Architecture to prevent Cloud Bill Shock
This is my Architecture to prevent Cloud Bill Shock
Daniel Zivkovic
 
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customersLunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Daniel Zivkovic
 
Azure for AWS & GCP Pros: Which Azure services to use?
Azure for AWS & GCP Pros: Which Azure services to use?Azure for AWS & GCP Pros: Which Azure services to use?
Azure for AWS & GCP Pros: Which Azure services to use?
Daniel Zivkovic
 
Serverless Evolution during 3 years of Serverless Toronto
Serverless Evolution during 3 years of Serverless TorontoServerless Evolution during 3 years of Serverless Toronto
Serverless Evolution during 3 years of Serverless Toronto
Daniel Zivkovic
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Daniel Zivkovic
 
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdfAI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
Daniel Zivkovic
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Daniel Zivkovic
 
Opinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & BuildersOpinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & Builders
Daniel Zivkovic
 
Google Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionGoogle Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data edition
Daniel Zivkovic
 
Conversational Document Processing AI with Rui Costa
Conversational Document Processing AI with Rui CostaConversational Document Processing AI with Rui Costa
Conversational Document Processing AI with Rui Costa
Daniel Zivkovic
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
Daniel Zivkovic
 
Gojko's 5 rules for super responsive Serverless applications
Gojko's 5 rules for super responsive Serverless applicationsGojko's 5 rules for super responsive Serverless applications
Gojko's 5 rules for super responsive Serverless applications
Daniel Zivkovic
 
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettRetail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Daniel Zivkovic
 
What's new in Serverless at AWS?
What's new in Serverless at AWS?What's new in Serverless at AWS?
What's new in Serverless at AWS?
Daniel Zivkovic
 
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersIntro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Daniel Zivkovic
 
Empowering Developers to be Healthcare Heroes
Empowering Developers to be Healthcare HeroesEmpowering Developers to be Healthcare Heroes
Empowering Developers to be Healthcare Heroes
Daniel Zivkovic
 
Get started with Dialogflow & Contact Center AI on Google Cloud
Get started with Dialogflow & Contact Center AI on Google CloudGet started with Dialogflow & Contact Center AI on Google Cloud
Get started with Dialogflow & Contact Center AI on Google Cloud
Daniel Zivkovic
 
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Daniel Zivkovic
 
Smart Cities of Italy: Integrating the Cyber World with the IoT
Smart Cities of Italy: Integrating the Cyber World with the IoTSmart Cities of Italy: Integrating the Cyber World with the IoT
Smart Cities of Italy: Integrating the Cyber World with the IoT
Daniel Zivkovic
 
Running Business Analytics for a Serverless Insurance Company - Joe Emison & ...
Running Business Analytics for a Serverless Insurance Company - Joe Emison & ...Running Business Analytics for a Serverless Insurance Company - Joe Emison & ...
Running Business Analytics for a Serverless Insurance Company - Joe Emison & ...
Daniel Zivkovic
 
This is my Architecture to prevent Cloud Bill Shock
This is my Architecture to prevent Cloud Bill ShockThis is my Architecture to prevent Cloud Bill Shock
This is my Architecture to prevent Cloud Bill Shock
Daniel Zivkovic
 
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customersLunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Daniel Zivkovic
 
Azure for AWS & GCP Pros: Which Azure services to use?
Azure for AWS & GCP Pros: Which Azure services to use?Azure for AWS & GCP Pros: Which Azure services to use?
Azure for AWS & GCP Pros: Which Azure services to use?
Daniel Zivkovic
 
Serverless Evolution during 3 years of Serverless Toronto
Serverless Evolution during 3 years of Serverless TorontoServerless Evolution during 3 years of Serverless Toronto
Serverless Evolution during 3 years of Serverless Toronto
Daniel Zivkovic
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Daniel Zivkovic
 
Ad

Recently uploaded (20)

Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Rock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning JourneyRock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning Journey
Lynda Kane
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Salesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docxSalesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docx
José Enrique López Rivera
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Rock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning JourneyRock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning Journey
Lynda Kane
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Salesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docxSalesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docx
José Enrique López Rivera
 

All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (LlamaIndex).pdf

  • 1. Welcome to ServerlessToronto.org 1 Introduce Yourself: - Where from? Why are you here? - Looking for work or Offering work? Help us serve you better: bit.ly/slsto An Evening with Mark Ryan and Jerry Liu • 6:00 - 6:10 Networking & Opening remarks • 6:10 - 6:35 Mark Ryan: The LLM Landscape • 6:35 - 7:15 Jerry Liu: Solving Core Challenges in RAG Pipelines • 7:15 - 7:45 Q&A • 7:45 - 8:00 Manning Publications raffle
  • 2. Why this Generative AI Talk? 2 1. Navigating the Tsunami: Understand the sweeping changes the "GenAI tsunami“ brings to industries and jobs. 2. Situational Awareness: Learn from AI leaders Mark Ryan & Jerry Liu to gain a strategic view of the LLM and RAG landscape. 3. Career Transformation: Learn to position yourself as the architect of automation rather than its subject. 4. Practical Advice: Acquire actionable strategies to apply Generative AI within your enterprise. 5. Interactive Learning: Engage in live Q&A to discuss and clarify your AI dilemmas with experts. Battle of Waterloo
  • 3. What is Serverless Toronto about? 3 Serverless became New Agile & Mindset #1 We started as Back- end FaaS Developers who enjoyed 'gluing together' other people's APIs and Managed Services #3 We're obsessed with creating business value (meaningful Products), focusing on Outcomes/Impact – NOT Outputs #2 We build bridges between Serverless Community (“Dev leg”), and Front-end, Voice-First & UX folks (“UX leg”) #4 Achieve agility NOT by “sprinting” faster but working smarter (by using bigger building blocks & less Ops) 1 2 3 4
  • 4. Serverless is a State of Mind… 4 Way too often, we – the IT folks, have obsession with “pimping up our cars” (infrastructure / code / pipelines) instead of “driving business” forward & taking them places ☺
  • 5. ... It is a way to focus on business value. 5 It can be applied to any Tech stack, even On-Prem Jared Short: 1. If the platform has it, use it 2. If the market has it, buy it 3. If you can reconsider requirements, do it 4. If you have to build it, own it. Ben Kehoe: Serverless is about how you make decisions, not about your choices.
  • 6. Upcoming ServerlessToronto.org Meetups 6 Friday Lunch & Learn, April 19 Monday evening, May 6 Summer 2024
  • 7. Knowledge Sponsor 1. Go to www.manning.com 2. Select *any* e-Book, Video course, or liveProject you want! 3. Add it to your shopping cart (no more than 1 item in the cart) 4. Raffle winners will send me the emails (used in Manning portal), 5. So the publisher can move it to your Dashboard – as if purchased. Fill out the Survey to win: bit.ly/slsto
  • 9. LLM Landscape A Journey Through A Year of Evolution Mark Ryan Developer Knowledge Platform AI Lead, Google Cloud [email protected]
  • 11. Major Generative AI Milestones: Part 1 Jun 2017 Attention Is All You Need: Seminal paper from Google that introduced transformers Oct 2018 BERT: Google transformer-based language model Feb 2019 GPT-2: OpenAI LLM May 2020 GPT-3: OpenAI LLM Aug 2021 Codex: OpenAI code model Apr 2022 DALLE 2: OpenAI image model Jan 2021 DALLE: OpenAI image model May 2021 LaMDA: Google LLM May 2022 Imagen: Google image model PaLM: Google LLM Gato: DeepMind multimodal model Aug 2022 Stable Diffusion: Image model
  • 12. Major Generative AI Milestones: Part 2 Nov 2022 ChatGPT: Consumer chat from OpenAI initially featuring GPT 3.5 Feb 2023 Bard: Consumer chat from Google Mar 2023 GPT-4: OpenAI flagship model ChatGPT Plugins: Connect to third-party applications Apr 2023 CodeWhisperer: AWS AI coding assistant July 2023 Llama 2: Meta open source LLM licensed for commercial use. Code Interpreter: OpenAI integrated sandbox environment for data upload and analysis Aug 2023 Duet AI: AI Assistant for Google Cloud, including chat in console, and general purpose (VSCode) and SQL (Big Query) code completion/interpretation Sept 2023 DALLE 3: OpenAI image model Dec 2023 Gemini: Google flagship multimodal (text / image / video) models Feb 2024 Gemini Pro 1.5: 1M context multimodal model Gemma: Google open model Sora: OpenAI text to video May 2023 Vertex AI Gen AI: including curated set of Google, third-party, and open models PaLM 2: Google flagship model Nov 2023 Q: AWS chatbot Grok: X chatbot Mar 2024 Claude 3: Anthropic flagship models Devin: Cognition SWE AI
  • 14. The Emerging LLM Ecosystem Examples Description Use Case Vector databases ● Pinecone ● Chroma ● Vertex AI Vector Search Store and find associations between embeddings, high-dimensional vector representations of data Grounding LLM responses in a set of documents (example of RAG) Encapsulated coding environments OpenAI Code Interpreter / Advanced Data Analysis Upload datasets & ask questions to get visualizations and code running in a limited Python instance Ad hoc data analysis Plugins / extensions ● ChatGPT plugins / GPTs ● Vertex AI extensions Connect LLMs to third-party / external applications Access current data / query & modify data that is external to the LLM LLM app development frameworks ● LangChain ● LlamaIndex ● Autogen LLM-centric framework to manage workflow (data sources, agents, models, etc) Assembling LLM-based applications
  • 15. Generative AI Landscape by Vendor Vendor Prod. Suite Assistance Developer / Ops Assistant Consumer Chat Enterprise Gen AI Dev / Hobbyist Gen AI Open Foundation Models Google Gemini for Google Workspace Duet AI for Google Cloud Gemini Vertex AI Google AI for Developers Gemma Microsoft CoPilot 365 Github Copilot Bing Chat Azure OpenAI OpenAI ChatGPT ChatGPT ChatGPT Enterprise ChatGPT AWS ● Q ● CodeWhisperer Bedrock / Titan Anthropic Claude 3* Claude 3* Meta Llama 2* Mistral Mixtral 8x7B*
  • 17. RAG in 2024 Jerry Liu, LlamaIndex co-founder/CEO
  • 19. Paradigms for inserting knowledge Retrieval Augmentation - Fix the model, put context into the prompt LLM Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep... Input Prompt Here is the context: Before college the two main things… Given the context, answer the following question: {query_str}
  • 20. Paradigms for inserting knowledge Fine-tuning - baking knowledge into the weights of the network LLM Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep... RLHF, Adam, SGD, etc.
  • 22. Current RAG Stack for building a QA System Vector Database Doc Chunk Chunk Chunk Chunk Chunk Chunk Chunk LLM Data Ingestion / Parsing Data Querying 5 Lines of Code in LlamaIndex!
  • 23. Current RAG Stack (Data Ingestion/Parsing) Vector Database Doc Chunk Chunk Chunk Chunk Process: ● Split up document(s) into even chunks. ● Each chunk is a piece of raw text. ● Generate embedding for each chunk (e.g. OpenAI embeddings, sentence_transformer) ● Store each chunk into a vector database
  • 24. Current RAG Stack (Querying) Vector Database Chunk Chunk Chunk LLM Process: ● Find top-k most similar chunks from vector database collection ● Plug into LLM response synthesis module
  • 25. Current RAG Stack (Querying) Vector Database Chunk Chunk Chunk LLM Process: ● Find top-k most similar chunks from vector database collection ● Plug into LLM response synthesis module Retrieval Synthesis
  • 30. RAG Data Parsing & Ingestion Data Querying Index Data Data Parsing + Ingestion Retrieval LLM + Prompts Response
  • 31. Naive RAG PyPDF Sentence Splitting Chunk Size 256 Simple QA Prompt Dense Retrieval Top-k = 5 Index Data Data Parsing + Ingestion Retrieval LLM + Prompts Response
  • 32. Easy to Prototype, Hard to Productionize Naive RAG approaches tend to work well for simple questions over a simple, small set of documents. ● “What are the main risk factors for Tesla?” (over Tesla 2021 10K) ● “What did the author do during his time at YC?” (Paul Graham essay)
  • 33. Easy to Prototype, Hard to Productionize But productionizing RAG over more questions and a larger set of data is hard! Failure Modes: ● Response Quality: Bad Retrieval, Bad Response Generation ● Hard to Improve: Too many parameters to tune ● Systems: Latency, Cost, Security
  • 34. Easy to Prototype, Hard to Productionize But productionizing RAG over more questions and a larger set of data is hard! Failure Modes: ● Response Quality: Bad Retrieval, Bad Response Generation ● Hard to Improve: Too many parameters to tune ● Systems: Latency, Cost, Security
  • 35. Challenges with Naive RAG (Response Quality) ● Bad Retrieval ○ Low Precision: Not all chunks in retrieved set are relevant ■ Hallucination + Lost in the Middle Problems ○ Low Recall: Now all relevant chunks are retrieved. ■ Lacks enough context for LLM to synthesize an answer ○ Outdated information: The data is redundant or out of date.
  • 36. Challenges with Naive RAG (Response Quality) ● Bad Retrieval ○ Low Precision: Not all chunks in retrieved set are relevant ■ Hallucination + Lost in the Middle Problems ○ Low Recall: Now all relevant chunks are retrieved. ■ Lacks enough context for LLM to synthesize an answer ○ Outdated information: The data is redundant or out of date. ● Bad Response Generation ○ Hallucination: Model makes up an answer that isn’t in the context. ○ Irrelevance: Model makes up an answer that doesn’t answer the question. ○ Toxicity/Bias: Model makes up an answer that’s harmful/offensive.
  • 37. Difference with Traditional Software Data Extract Response Traditional software is defined by a set of programmatic rules. Given an input, you can easily reason about the expected output. Transform Load
  • 38. Difference with Traditional Software AI-powered software is defined by a black-box set of parameters. It is really hard to reason about what the function space looks like. The model parameters are tuned, the surrounding parameters (prompt templates) are not. Index Data Data Parsing + Ingestion Retrieval LLM + Prompts Response
  • 39. Difference with Traditional Software If one component of the system is a black-box, all components of the system become black boxes. The more components, the more parameters you have to tune. Index Data Data Parsing + Ingestion Retrieval LLM + Prompts Response
  • 40. Difference with Traditional Software If one component of the system is a black-box, all components of the system become black boxes. Every parameter affects the performance of the end system. Index Data Data Parsing + Ingestion Retrieval LLM + Prompts Response RAG
  • 41. There’s Too Many Parameters Every parameter affects the performance of the entire RAG pipeline. Which parameters should a user tune? There’s too many options! Index Data Data Parsing + Ingestion Retrieval LLM + Prompts Response Which PDF parser should I use? How do I chunk my documents? How do I process embedded tables and charts? Which embedding model should I use? What retrieval parameters should I use? Dense retrieval or sparse? Which LLM should I use?
  • 42. Mapping Pain Points to Solutions
  • 43. Solution Categorize by pain point, and establish best practices
  • 44. Solution Categorize by pain point, and establish best practices “Seven Failure Points When Engineering a Retrieval Augmented Generation System”, Barnett et al.
  • 45. Solution Categorize by pain point, and establish best practices “12 RAG Pain Points and Proposed Solutions”, by Wenqi Glantz
  • 46. Pain Points Response Quality Related 1. Context Missing in the Knowledge Base 2. Context Missing in the Initial Retrieval Pass 3. Context Missing After Reranking 4. Context Not Extracted 5. Output is in Wrong Format 6. Output has Incorrect Level of Specificity 7. Output is Incomplete
  • 47. Pain Points Scalability 8. Can't Scale to Larger Data Volumes 11. Rate-Limit Errors Security 12. LLM Security Use Case Specific 9. Ability to QA Tabular Data 10. Ability to Parse PDFs
  • 48. Pain Points Scalability 8. Can't Scale to Larger Data Volumes 11. Rate-Limit Errors Security 12. LLM Security Use Case Specific 9. Ability to QA Tabular Data 10. Ability to Parse PDFs
  • 49. Let’s figure out solutions
  • 50. 1. Context Missing in the Knowledge Base Clean your data: Pick a good document parser (more on this later!) Add in Metadata: inject global context to each chunk Keep your data updated: Setup a recurring data ingestion pipeline. Upsert documents to prevent duplicates.
  • 51. 2. Context Missing in the Initial Retrieval Pass Solution: Hyperparameter tuning for chunk size and top-k Solution: Reranking Source: ColBERT
  • 52. 3. Context Missing After Reranking Solution: try out fancier retrieval methods (small-to-big, auto-merging, auto-retrieval, ensembling, …) Solution: fine-tune your embedding models to task-specific data
  • 53. 4. Context is there, but not extracted by the LLM The context is there, but the LLM doesn’t understand it. “Lost in the middle” Problems. https://x.com/GregKamradt/status/1722386725635580292?s=20
  • 54. 4. Context is there, but not extracted by the LLM Solution: Prompt Compression (LongLLMLingua) Solution: LongContextReorder LongLLMLingua by Jiang et al.
  • 55. 4. Context is there, but not extracted by the LLM Solution: Prompt Compression (LongLLMLingua) Solution: LongContextReorder LongLLMLingua by Jiang et al.
  • 56. 5. Output is in Wrong Format A lot of use cases require outputting the answer in JSON format. Solutions: Better text prompting/output parsing Use OpenAI function calling + JSON mode Use token-level prompting (LMQL, Guidance) Source: Guidance
  • 57. 7. Incomplete Answer What if you have a complex multi-part question? Naive RAG is primarily good for answering simple questions about specific facts.
  • 58. 7. Incomplete Answer Solution: Add Agentic Reasoning Agents? RAG Query Response Simple Lower Cost Lower Latency Advanced Higher Cost Higher Latency Routing One-Shot Query Planning Tool Use ReAct Dynamic Planning + Execution
  • 59. 8. Scaling your Data Pipeline Pain points: ● Processing thousands/millions of docs is slow ● How do we efficiently handle document updates?
  • 60. 8. Scaling your Data Pipeline Pain points: ● Processing thousands/millions of docs is slow ● How do we efficiently handle document updates? Reference Production Ingestion Stack ● Parallelize document processing ● HuggingFace TEI ● RabbitMQ Message Queue ● AWS EKS clusters https://github.com/run-llama/llamaindex_aws_ingestion
  • 61. 10. Proper RAG over Complex Documents
  • 62. How do we model complex docs with embedded tables? RAG with naive chunking + retrieval → leads to hallucinations! Embedded Table Advanced Retrieval: Embedded Tables
  • 63. Advanced Retrieval: Embedded Tables Instead: model data hierarchically. Index tables/figures by their summaries. The only missing component: how do I parse out the tables from the data?
  • 64. Most PDF Parsing is Inadequate Extracts into a messy format that is impossible to pass down into more advanced ingestion/retrieval algorithms.
  • 65. Introducing LlamaParse A genAI-native parser designed to let you build RAG over complex documents https://github.com/run-llama/llam a_parse
  • 66. Introducing LlamaParse Capabilities ✅ Extracts tables / charts ✅ Input natural language parsing instructions ✅JSON mode ✅Image Extraction ✅Support for ~10+ document types (.pdf, .pptx, .docx, .xml)
  • 68. LlamaParse Results The best parser at table extraction == the only parser for advanced RAG Expanded: https://drive.google.com/file/d/1fyQAg7nOtChQzhF2Ai7HEeKYYqdeWsdt/view?usp=sharing
  • 71. What’s next for RAG: Agents?
  • 73. From RAG to Agents Agents? RAG Query Response Agents? Agents?
  • 74. Agents? RAG Query Response From RAG to Agents Agents? Agent Definition: Using LLMs for automated reasoning and tool selection RAG is just one Tool: Agents can decide to use RAG with other tools Agents?
  • 75. From Simple to Advanced Agents Simple Lower Cost Lower Latency Advanced Higher Cost Higher Latency Routing One-Shot Query Planning Tool Use ReAct Dynamic Planning + Execution
  • 76. Routing Simplest form of agentic reasoning. Given user query and set of choices, output subset of choices to route query to.
  • 77. Routing Use Case: Joint QA and Summarization Guide
  • 78. Query Planning Break down query into parallelizable sub-queries. Each sub-query can be executed against any set of RAG pipelines Uber 10-K chunk 4 top-2 Uber 10-K chunk 8 Lyft 10-K chunk 4 Lyft 10-K chunk 8 Compare revenue growth of Uber and Lyft in 2021 Uber 10-K Lyft 10-K Describe revenue growth of Uber in 2021 Describe revenue growth of Lyft in 2021 top-2
  • 79. Query Planning Example: Compare revenue of Uber and Lyft in 2021 Query Planning Guide Uber 10-K chunk 4 top-2 Uber 10-K chunk 8 Lyft 10-K chunk 4 Lyft 10-K chunk 8 Compare revenue growth of Uber and Lyft in 2021 Uber 10-K Lyft 10-K Describe revenue growth of Uber in 2021 Describe revenue growth of Lyft in 2021 top-2
  • 80. Tool Use Use an LLM to call an API Infer the parameters of that API
  • 81. Tool Use In normal RAG you just pass through the query. But what if you used the LLM to infer all the parameters for the API interface? A key capability in many QA use cases (auto-retrieval, text-to-SQL, and more)
  • 82. This is cool but ● How can an agent tackle sequential multi-part problems? ● How can an agent maintain state over time?
  • 83. This is cool but ● How can an agent tackle sequential multi-part problems? ○ Let’s make it loop ● How can an agent maintain state over time? ○ Let’s add basic memory
  • 84. Data Agents - Core Components Agent Reasoning Loop ● ReAct Agent (any LLM) ● OpenAI Agent (only OAI) Tools Query Engine Tools (RAG pipeline) LlamaHub Tools (30+ tools to external services)
  • 85. ReAct: Reasoning + Acting with LLMs Source: https://react-lm.github.io/
  • 86. ReAct: Reasoning + Acting with LLMs Add a loop around query decomposition + tool use
  • 87. ReAct: Reasoning + Acting with LLMs Superset of query planning + routing capabilities. ReAct + RAG Guide
  • 88. Can we make this even better? ● Stop being so short-sighted - plan ahead at each step ● Parallelize execution where we can
  • 89. LLMCompiler Kim et al. 2023 An agent compiler for parallel multi-function planning + execution.
  • 90. LLMCompiler Plan out steps beforehand, and replan as necessary LLMCompiler Agent
  • 91. Tree-based Planning Tree of Thoughts (Yao et al. 2023) Reasoning via Planning (Hao et al. 2023) Language Agent Tree Search (Zhou et al. 2023)
  • 92. Additional Requirements ● Observability: see the full trace of the agent ○ Observability Guide ● Control: Be able to guide the intermediate steps of an agent step-by-step ○ Lower-Level Agent API ● Customizability: Define your own agentic logic around any set of tools. ○ Custom Agent Guide ○ Custom Agent with Query Pipeline Guide
  • 93. Additional Requirements Possible through our query pipeline syntax Query Pipeline Guide
  • 94. What’s next for RAG: Long Contexts?
  • 96. Our Position 1. Frameworks are valuable whether or not RAG lives or dies 2. Certain RAG concepts will go away, but others will remain and evolve
  • 97. Long Context LLMs will Solve the Following 1. Developers will worry less about tuning chunking algorithms 2. Developers will need to spend less time tuning retrieval and chain-of-thought over single documents 3. Summarization will be easier 4. Personalized memory will be better and easier to build
  • 98. Some Challenges Remain 1. 10M tokens is not enough for large document corpuses (hundreds of MB, GB) 2. Embedding models are lagging behind in context length 3. Cost and Latency 4. A KV Cache takes up a significant amount of GPU memory, and has sequential dependencies
  • 99. New RAG Architectures 1. Small to Big Retrieval over Documents 2. Intelligent Routing for Latency/Cost Tradeoffs 3. Retrieval Augmented KV Caching
  • 100. Small to Big Retrieval over Documents
  • 101. Intelligent Routing for Latency/Cost Tradeoffs
  • 103. www.ServerlessToronto.org Reducing the gap between IT and Business needs