0% found this document useful (0 votes)
36 views

Google Search Tips

This document discusses using Apache Airflow to build and deploy applications using large language models (LLMs). It outlines how Airflow can help with ingesting data from various sources, processing data pipelines on a schedule or ad-hoc, handling retries and dependencies, and monitoring models at scale. The document then presents a real use case of using an LLM to answer questions by ingesting documentation into an embedding model and vector database accessed by prompts. Airflow is well-suited for orchestrating the data pipelines and feedback loops involved in such an application.

Uploaded by

NoorAhmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Google Search Tips

This document discusses using Apache Airflow to build and deploy applications using large language models (LLMs). It outlines how Airflow can help with ingesting data from various sources, processing data pipelines on a schedule or ad-hoc, handling retries and dependencies, and monitoring models at scale. The document then presents a real use case of using an LLM to answer questions by ingesting documentation into an embedding model and vector database accessed by prompts. Airflow is well-suited for orchestrating the data pipelines and feedback loops involved in such an application.

Uploaded by

NoorAhmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Building and

deploying LLM
applications with
Apache Airflow
Julian LaNeve Kaxil Naik
Senior Product Manager @ Apache Airflow Committer & PMC Member
Astronomer Director of Eng @ Astronomer
Agenda

Why Airflow should be at the centre of LLMOps?

Real Use-case & reference architecture

Next Steps: Community collaboration


Generative AI:
A Creative New World

A powerful new class of large


language models is making it
possible for machines to
write, code, draw, and create
with credible and sometimes
superhuman results.
Normally, for ML, you need to…

Ingest Data Train Model Prediction


…but…but
nownow
you
youcan
can:

You hit a pre-trained model


instead of your own model

Ingest Data Train Model Prediction

Less Data
Going from “Idea to Production” with LLM Apps involves
solving a lot of data engineering problems:

■ Ingestion from several sources


■ Day 2 operations on data pipelines
■ Data preparation
■ Data privacy
■ Data freshness
■ Model deployment & monitoring
■ Scaling Models
■ Experimentation & fine-tuning
■ Feedback Loops
Typical Architecture for Q&A use-case using LLM

Document Loading Splitting Storage Retrieval Output

URLs

PDFs
Relevant
Splits

Prompt
Splits LLM <Answer>
Vectorstore

Query
Database <Question>

Legacy Data Store

Source: https://python.langchain.com/docs/use_cases/question_answering/
Airflow is a Natural Fit…

Python Native Common Interface Document Parsing


The language of data scientists and Between Data Engineering, Data Decorator and pythonic interfaces
ML engineers. Science, ML Engineering and for standard LLM tools
Operations.

Monitoring & Alerting Extensible Ingestion


Built in features for logging, Standardize custom operators and Extract and load data into
monitoring and alerting to external templates for common DS tasks vectordbs and other destinations
systems. across the organization.

Pluggable Compute Data Agnostic Day 2 Ops


GPUs, Kubernetes, EC2, VMs etc. But data aware. Handle retries, dependencies, and
all other day 2 ops associated with
data pipelines
Let’s Talk About a
Real Use Case
Problem Statement:

We have customers, employees, and community members


that ask questions about our product with answers that
exist across several sources of documentation.

How do we provide an easy interface for folks to


get their questions answered without adding
further strain to the team?
Data Ingestion, Processing, and Embedding

GitHub Docs (.md)


issues files
🦜🔗 LangChain
Pre-process and split into
chunks Embed chunks Write to Weaviate

Docs (.md) Slack


files Messages

■ Airflow gives a framework to load data ■ After content is split into chunks, each
from APIs & other sources into LangChain chunk is embedded into vectors (semantic
representations)
■ LangChain helps pre-process and split
documents into smaller chunks ■ Those vectors are written to Weaviate for
depending on content type later retrieval
Prompt Orchestration and Answering

🦜🔗LangChain

Rewording 1
Web App

Original Prompt Rewording 2


User Asks
a Question

Slack Bot
Rewording 3
Combine docs
and make final
🦜
🔗
Reword to get more Vector DB search
related documents with prompts LLM call to
answer

Users can interact with UI or ■ Original prompt gets reworded 3x using gpt-3.5-turbo
Slack Bot; they both use the
same API ■ Answer is generated by combining docs from each prompt
and making a gpt-4 call

■ State is stored in Firestore and prompt tracing is done through


LangSmith
LLM & Product Feedback Loops

On schedule

If good answer, write to


vector DB to use in future
answers
🦜🔗 LangChain
Fetch new runs: input, Classify Q&A according
User Rates
output, and user feedback to helpfulness,
Answer
relevance, and public
🦜 If good answer, mark as
good to show on Ask Astro

🔗 homepage

When a user submits feedback, it ■ Airflow DAGs process feedback async to evaluate answers on helpfulness,, relevance,
gets stored in Firestore and and publicness
LangSmith for later use
■ If answer is good, it gets stored in Weaviate and can be used as a source for future
questions

■ UI also shows the most recent good prompts on the homepage


Running this in production meant:

■ Experimenting with different sources of data to ingest

■ Running the pipelines on a schedule and ad-hoc

■ Running the same workloads with variable chunking


strategies

■ Needing to retry tasks due to finicky python libraries and


unreliable external services

■ Giving different parts of the workload variable compute

■ Creating standard interfaces to interact with external


systems
Running this in production meant:

■ Experimenting with different sources of data to ingest

■ Running the pipelines on a schedule and ad-hoc Which is


■ Running the same workloads with variable chunking
strategies
what
■ Needing to retry tasks due to finicky python libraries and
Airflow’s
unreliable external services great at!
■ Giving different parts of the workload variable compute

■ Creating standard interfaces to interact with external


systems
ask.astronomer.io

github.com/astronomer/ask-astro
a16z’s Emerging LLM App Stack
Legend
Contextual Data Pipelines Embedding Model Vector Database
(Databricks, Airflow, (OpenAI, Cohere, (Pinecone, Weaviate,
data Unstructured, etc.) Hugging Face) Chroma, pgvector)
Gray boxes show key components of the stack, with leading tools /
systems listed. Arrows show the flow of data through the stack.

Contextual data provided by app developers to condition


LLM outputs

Prompts and few-shot examples that are sent to the LLM

Queries submitted by users


Prompt Playground APIs/Plugins
Few-shot (OpenAI, nat.dev, (Serp, Wolfram,
Output returned to users

examples Humanloop) Zapier, etc.)


Orchestration
(Python/DIY,
LangChain,
LlamaIndex, LLM APIs and Hosting
ChatGPT)
LLM Cache
Query Proprietary API Open API
(Redis, SQLite, (OpenAI, Anthropic) (Hugging Face, Replicate)
GPTCache)

Cloud Provider Opinionated Cloud


Logging/LLMops
(AWS, GCP, Azure, (Databricks, Anyscale,
(Weights & Biases, MLflow,
App Hosting PromptLayer, Helicone)
Coreweave) Mosaic, Modal, Runpod)
Output (Vercel, Steamship,
Streamlit, Modal)
Validation
(Guardrails, Rebuff,
Guidance, LMQL)
AskAstro has a few parts of this…
Legend
Contextual Data Pipelines Embedding Model Vector Database
(Databricks, Airflow, (OpenAI, Cohere, (Pinecone, Weaviate,
data Unstructured, etc.) Hugging Face) Chroma, pgvector)
Gray boxes show key components of the stack, with leading tools /
systems listed. Arrows show the flow of data through the stack.

Contextual data provided by app developers to condition


LLM outputs

Prompts and few-shot examples that are sent to the LLM

Queries submitted by users


Prompt Playground APIs/Plugins
Few-shot (OpenAI, nat.dev, (Serp, Wolfram,
Output returned to users

examples Humanloop) Zapier, etc.)


Orchestration
(Python/DIY,
LangChain,
LlamaIndex, LLM APIs and Hosting
ChatGPT)
LLM Cache
Query Proprietary API Open API
(Redis, SQLite, (OpenAI, Anthropic) (Hugging Face, Replicate)
GPTCache)

Cloud Provider Opinionated Cloud


Logging/LLMops
(AWS, GCP, Azure, (Databricks, Anyscale,
(Weights & Biases, MLflow,
App Hosting PromptLayer, Helicone)
Coreweave) Mosaic, Modal, Runpod)
Output (Vercel, Steamship,
Streamlit, Modal)
Validation
(Guardrails, Rebuff,
Guidance, LMQL)
…but there’s even more to consider.

Data Governance
■ How do you account for private data?
■ How do you provide transparency into data lineage?
Airflow is
foundational
Fine Tuning
■ Does it improve results?
to best
■ How much does it cost? practices for
Feedback Loops
all of this.
■ Semantic cache for correct responses
■ Ranking sources based on accuracy and ranking accordingly
■ Prompt clustering – what are people asking?
Thanks to the AskAstro Team:

Philippe Gagnon Michael Gregory


Community Collaboration

Patterns and
Providers Interfaces
Use Cases
What are all the providers the ecosystem needs?

pgvector
What’s the
interface that
feels right for
LLMOps?
What’s the
interface that
feels right for
LLMOps?
Patterns

■ Do you use one task to ingest and write?

What are the ■ Can you use dynamic task mapping to break it out?
best practices
■ Do you write to disk?
for building
pipelines for ■ Can you store embedding values in XCOMs?

LLM Apps? ■ How do you reconcile Airflow orchestration with


prompt orchestration?
Let’s do this all in the open source!

You might also like