100% found this document useful (2 votes)
385 views

Generative AI - POC - Readout

The document provides a summary of a proof of concept (POC) evaluation of generative AI technologies. Key points: - The POC tested question answering capabilities on both structured (database) and unstructured (text) data across multiple cloud platforms. - For structured data, building custom solutions requires high pre-work like metadata and will need an additional intelligence layer at scale. Buy options like CoPilots show promise. - For unstructured data, GPT 3.5, 4 and PaLM models provided good accuracy but building custom solutions requires significant context and data curation. Microsoft 365 CoPilot could address many current use cases. - Overall, buy options like CoPil

Uploaded by

Bill Green
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
385 views

Generative AI - POC - Readout

The document provides a summary of a proof of concept (POC) evaluation of generative AI technologies. Key points: - The POC tested question answering capabilities on both structured (database) and unstructured (text) data across multiple cloud platforms. - For structured data, building custom solutions requires high pre-work like metadata and will need an additional intelligence layer at scale. Buy options like CoPilots show promise. - For unstructured data, GPT 3.5, 4 and PaLM models provided good accuracy but building custom solutions requires significant context and data curation. Microsoft 365 CoPilot could address many current use cases. - Overall, buy options like CoPil

Uploaded by

Bill Green
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Generative AI

POC – Readout
& Demo

1
Table of Contents

1 Gen AI – Build Option POC - Evaluation Summary

2 Unstructured Data Use Case – POC Deep Dive & Demo

3 Structured Data Use Case – POC Deep Dive & Demo

2
Table of Contents

1 Gen AI – Build Option POC - Evaluation Summary

2 Unstructured Data Use Case – POC Deep Dive & Demo

3 Structured Data Use Case – POC Deep Dive & Demo

3
Objective of the POC
Recap on why we set out to do the POC

1. EA team had compiled an initial list of Gen AI


use cases from business teams across sales,
marketing, customer service, product
engineering and IT (a couple),

2. To be able to support those use cases, it was


imperative for the IT team to ramp up quickly
on Gen AI technology and learn about the
offerings available from the three cloud
providers + learn how to set up Gen AI
technology custom development solutions +
learn on effort needed, skills, approach, best
practices, etc,

3. Alongside the POC, the team has also been


reviewing marketplace updates which will be
part of the input gathered for the build v/s buy
considerations.

4
Broad category of Generative AI Use Cases
Most of the initial use cases of Gen AI fall in these categories

5
POC was specifically about Information Retrieval use case
We tested LLMs across various technology platforms Product
Manuals

Q&A against Unstructured data

BA&R Finance Data in Caspian

Q&A against Structured data

6
Comparison of Gen AI Deployment Approaches
Build v/s Buy deployment options are being considered

Buy Options tend to have lower overall TCO & faster Time to Market than Build Options

7
Key Takeaways from the Gen AI POC

Information Retrieval on Structured Data Information Retrieval on Unstructured Data

• Defer Build Option decision to 2024 Q2


• GPT 3.5/4 model combination delivered reasonable
accuracy (4/8 questions) • Defer Build Option decision to 2024 Q1
• High degree of pre-work required to improve accuracy of • GPT 3.5, 4 & PaLM models delivered good accuracy
generated SQL → Field descriptions, curated business (11-12/15 questions)
context & other metadata • Building blocks from POC can be used for scale-out
• Additional Intelligence layer will be required during Scale-
out to derive relevant domain tables to address questions
• Evaluate M365 CoPilot – potential to
address 80% of current Business use cases
• Evaluate CoPilots & other Buy Options
• Seek.ai
• Snowflake CoPilot (in private preview) • Build option still relevant for Competitor
• PowerBI CoPilot (in private preview) data use case
• ThoughtSpot Sage – GA in Dec
• ChatGPT Enterprise (once Snowflake is available endpoint)
• Evaluate Domain based Buy Options for
• Target POC for at least Seek.ai & Snowflake niche high value use cases as required
CoPilot

Next Steps … Evaluate Buy Options including CoPilots before deciding on path forward

8
Generative AI Technology Evaluation as of Nov 2023
A Solution selection is dependent on more than just a selection of a Large Language Model

Information Retrieval against Structured Data Information Retrieval against Unstructured Data

Build Option Build Option

• Azure GPT 3.5/4 & Google PaLM


• GPT 3.5/4 performed best (4/8) performed best (11-12 / 15)
• Significant Business Context, Rules
• PaLM & GPT 3.5 Model Cost per query
curation & Dictionary required
comparable; PaLM cheapest in
• Sizable Prompt Engineering required to overall end-to-end pipeline cost
improve accuracy
• Azure Cognitive Search, ChromaDB
• High Prompt Tuning Effort + Query Cost (Cloud specific embeddings) & Gen AI
App Builder are preferred retrievers

Buy Option Buy Option

• Lot of CoPilot options being released • Microsoft 365 CoPilot showing lot of
into Private Preview promise for Q&A, summarization
against Sharepoint
• Hold lot of promise for faster time to
market with lower deployment effort • Q&A, Summarization capabilities
embedded in Office 365 suite
• License fee considerations v/s TCO
for Build option needs vetting • Min 300 seat license commitment
($30/usr/month or $9K/month)

9
Gen AI Tech Eval – Build – Information Retrieval on Unstructured Data

8 Combinations evaluated for Each Cloud Provider.


Top 2 options against each Cloud Provider showcased below

• Overall GPT 3.5 & PaLM were best


performing considering combination of
Cost & performance
• GPT 4 had best pipeline performance
• Titan & Claude pipeline performance
were skewed because of AWS Textract
limitations

** All analysis and recommendations are based on a short 6 weeks POC using PDF product manuals data in varied formats and validation being done on sample of 10-15 business
questions. A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

10
Gen AI Tech Eval – Build – Information Retrieval on Structured Data
• GPT 3.5/4 yielded reasonable accuracy (4/8) on
combination of performance + Cost for Complex
query generation
• All models work well with Simple Queries
• Models except GPT struggled with Text-SQL step
for complex queries
• Significant Business Context & Prompt
Engineering required to improve accuracy
• Underlying Data nuances contributed to
* 3/4 = one of the questions produces relevant output with a reproducibility factor of 2/3 inaccurate query generation

** All analysis and recommendations are based on a short 4 weeks POC leveraging raw data directly from data warehouse and validation being done on sample of 8-10 business questions.
A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

11
Table of Contents

1 Gen AI – Build Option POC - Evaluation Summary

2 Unstructured Data Use Case – POC Deep Dive & Demo

3 Structured Data Use Case – POC Deep Dive & Demo

12
Conceptual Architecture

● Knowledge base creation/updation is an occasional event trigger which is initiated whenever a


file(s) is updated in the data store.

● Retrieval & Generation pipeline is triggered whenever there is a trigger in the UI.

● User Interface is based on Streamlit and is designed with a conversational interface.

13
Prompt Template

14
Architecture - Azure

15
Architecture - GCP

16
Architecture - AWS

17
Pipeline Combinations for POC

18
UI Walkthrough

19
Retriever & Model performance evaluation approach

Retriever Response Retriever Full Pipeline


Model Response Category* Model Scores
Category Scores Scores

Model gives default message 1 1


Answer not present in Manual - 1
Model hallucinates answer 0 0

Model able to generate correct answer 1 1

Retriever Able to Retrieve 1


Model not able to generate correct
0 0
answer or gives default message
Answer Present in Manual

Model hallucinates answer 0 0


Retriever Unable to Retrieve 0
Model gives default message 1 0

* Default Message : ‘Sorry, Manual does not have the information you are looking for’

20
Cost Metrics
LLM API Cost (Closed Out of Box Models)

Retriever API Cost (Use Cloud Retrieval Services) Cloud Model Pricing Cost for 1000 Queries

Per 1000 tokens


Cloud Retriever Pricing Cost for 1000 Queries AWS Titan $2.15
Input:$0.0013, Output:$0.0017

AWS Kendra $1.125/hr $27.00 Per 1000 tokens:


AWS Anthropic $27.36
Input:$0.01102, Output:$0.03268
Azure Cognitive Search $0.34/hr $8.20
Per 1000 tokens:
Azure GPT-3.5 $3.25
GCP Gen AI App $12/1000 queries $12.00 Input:$0.0023, Output:$0.0010

Per 1000 tokens:


Azure GPT-4 $75.00
Input:$0.0450, Output:$0.0300

Retriever Deployment Cost (Bring your own Vector DB) GCP PaLM (text-bison)
$0.0005/1000 characters
$3.00
($0.002 Per 1000 tokens)
Cost for
Machine Cost
Cloud Retriever Machine Type 1000
(On-Demand)
Queries LLM Deployment Cost (Open Source Models)

AWS Chroma DB r5.xlarge $0.26/hr $6.400 Machine Cost (On- Cost for 1000
Cloud Model Machine Type
Demand) Queries
Azure Chroma DB Standard D4 V3 $0.376 $9.024
AWS LLaMA g5.xlarge $1.10/hr $26.56
GCP Chroma DB e2-standard-4 $0.14/hr $3.360
GCP LLaMA g2-standard-8 $0.87/hr $20.88

Azure LLaMA Standard NC16as T4 $1.96/hr $47.00

* All the calculations are done considering 1.5k tokens or 6k characters for both input and output
21
Evaluation Summary - Azure

Data Extraction & Ingestion (One time activity) Retrieval Augmented Generation (RAG) Recommended pipeline

Data Vector Database / Retrieval Model Full pipeline


S No. Cloud Vector Embedding Retrieval System LLM Model Cost per Query($)
Extraction Retrieval Warehouse Performance Performance Performance

1 Azure GPT - 3.5 12/15 11/15 11/15 0.0160

2 Azure - Cognitive Search Cognitive Search GPT - 4 12/15 13/15 12/15 0.0877

3 Azure Llama2 - 7B 12/15 1/15 1/15 0.0598

4 Azure GPT - 3.5 11/15 12/15 11/15 0.0168


ChromaDB
Azure - Form HuggingFace - all- ChromaDB
5 Azure (HuggingFace GPT - 4 11/15 11/15 10/15 0.0885
Recognizer MiniLM-L6-v2 (HuggingFace Embedding)
Embedding)
6 Azure Llama2 - 7B 11/15 3/15 3/15 0.0606

7 Azure GPT - 3.5 13/15 10/15 10/15 0.0169


Azure OpenAI ChromaDB ChromaDB
8 Azure GPT - 4 13/15 12/15 12/15 0.0887
text-embedding-ada-002 (OpenAI Embedding) (OpenAI Embedding)
9 Azure Llama2 - 7B 13/15 1/15 1/15 0.0607

● Azure Cognitive Search and ChromaDB (OpenAI Embedding) are preferred retrievers because of precision and consistent performance.
● Azure OpenAI GPT-4 outperforms other models because of its concise and accurate response quality. However, GPT-3.5 is also on par with GPT-4 and cost-
effectiveness makes it a viable and preferable choice.
● Llama produces poor response due to limitations in context length. We could only pass 1 chunk to the model. Hence retriever performance for Llama is marked as 1.
● LLaMA has an additional overhead of deployment and maintenance.
● ChromaDB also has deployment overhead but is comparatively simpler to manage.
** All analysis and recommendations are based on a short 6 weeks POC using PDF product manuals data in varied formats and validation being done on sample 10-15 business questions.
A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

22
Evaluation Summary - GCP

Data Extraction & Ingestion (One time activity) Retrieval Augmented Generation (RAG) Recommended pipeline
Data Vector Database / Retrieval Retrieval Model Full pipeline Cost per Query
S No. Cloud Vector Embedding Retrieval System LLM Model
Extraction Warehouse Performance Performance Performance ($)

1 GCP PaLM 11/15 8/15 8/15 0.017


- Gen AI App Builder Gen AI App Builder
2 GCP Llama2 - 7B 11/15 10/15 8/15 0.035

3 GCP PaLM 12/15 9/15 9/15 0.007


HuggingFace - all- ChromaDB ChromaDB
Document AI
MiniLM-L6-v2 (HuggingFace Embedding) (HuggingFace Embedding)
4 GCP Llama2 - 7B 12/15 9/15 8/15 0.025

5 GCP PaLM 13/15 11/15 11/15 0.007


ChromaDB ChromaDB
PaLM embedding
(PaLM Embedding) (PaLM Embedding)
6 GCP Llama2 - 7B 13/15 7/15 7/15 0.026

● Gen AI App Builder and ChromaDB (PaLM Embedding) are preferred retrievers because of precision and consistent performance.
● PaLM performs better because of the quality and tone of the response. LLaMA sometimes produces garbage responses which additionally requires post processing.
● LLaMA has an additional overhead of deployment and maintenance.
● ChromaDB also has deployment overhead but is comparatively simpler to manage.
● Costs of running Chroma Db & UI services on GCP lower than Azure by a factor of 3-5

** All analysis and recommendations are based on a short 6 weeks POC using PDF product manuals data in varied formats and validation being done on sample 10-15 business questions.
A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

23
Evaluation Summary - AWS

Data Extraction & Ingestion (One time activity) Retrieval Augmented Generation (RAG) Recommended pipeline

Data Vector Database / Retrieval Model Full pipeline Cost per


S No. Cloud Vector Embedding Retrieval System LLM Model
Extraction Retrieval Warehouse Performance Performance Performance Query($)

1 AWS Titan 7/15 3/15 3/15 0.035


2 AWS - Kendra Kendra Anthropic 7/15 7/15 3/15 0.060
3 AWS Llama2 - 7B 7/15 3/15 4/15 0.060
4 AWS Titan 9/15 4/15 4/15 0.014
ChromaDB
AWS - HuggingFace - all- ChromaDB
5 AWS (HuggingFace Anthropic 9/15 8/15 2/15 0.039
Textract MiniLM-L6-v2 (HuggingFace Embedding)
Embedding)
6 AWS Llama2 - 7B 9/15 7/15 5/15 0.038
7 AWS Titan 9/15 3/15 2/15 0.014
ChromaDB ChromaDB
8 AWS Titan embedding Anthropic 9/15 3/15 2/15 0.039
(Titan Embedding) (Titan Embedding)
9 AWS Llama2 - 7B 9/15 4/15 4/15 0.038

● AWS Textract appears to have parsing limitations for vertical text orientations & multi column PDF’s which affected parsing quality & overall pipeline performance results
● The best performing pipeline is with HuggingFace embedding , Chroma as retriever and Titan as LLM
● Kendra Retriever performs comparably to ChromaDB, The pipeline with Kendra and Titan LLM could be improved by increasing the Top K retrieved chunks.
● The higher numbers of anthropic does not suggest a significantly good performance, it is due to the underlying evaluation assumptions and model’s tendency to hallucinate.
● LLaMA has an additional overhead of deployment and maintenance.
● ChromaDB also has deployment overhead but is comparatively simpler to manage.

** All analysis and recommendations are based on a short 6 weeks POC using PDF product manuals data in varied formats and validation being done on sample 10-15 business questions.
A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

24
Use Case Heat Map & LLM Model Ranking

Model Performance
* Model Performance rankings are subjective and can be improved with refinements.
** LLaMA in a customized deployment requires an extra level of quality control and post-processing to mitigate toxic, harmful, and biased content.

Model Cost
* Model pricings were calculated assuming the Use case involves 1.5k token per API call.
** LLaMA is deployed in a VM and is accessed via endpoints. The cost of LLaMA is calculated per day and then extrapolated to per query.

• Overall GPT 3.5 (11/15) & PaLM (11/15) were best performing considering combination of Cost & performance
• GPT 4 had best overall pipeline performance (12/15)
• Titan & Claude overall pipeline performance were skewed because of AWS Textract limitations

All analysis and recommendations are based on a short 6 weeks POC using PDF product manuals data in varied formats and validation being done on sample of 10-15 business questions.
A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

25
26
Improving the Performance of RAG System

27
Roadmap for SBD

28
Table of Contents

1 Gen AI – Build Option POC - Evaluation Summary

2 Unstructured Data Use Case – POC Deep Dive & Demo

3 Structured Data Use Case – POC Deep Dive & Demo

29
Conceptual Architecture

● Table-to-Insight generation is a combination of Table-to-Text


and Table-to-Chart generation

● Table-to-Chart generation involves code generation which is


executed on the table to generate the required chart

30
Prompt Template

31
Architecture – Azure

Insights Pro App Modules (python)

1. Text to SQL converter -


analyze user's natural language
query. find out relevant DB/table
name and generate SQL query using
LLM.

2. SQL executor -
execute SQL on client's DB and store
resulting dataset in cloud storage
(Blob).

3. Data processor -
generate insights in natural
language from structured data using
LLM and respond back to the UI.

32
Architecture – AWS

Insights Pro App Modules (python)

1. Text to SQL converter -


analyze user's natural language
query. find out relevant DB/table
name and generate SQL query using
LLM.

2. SQL executor -
execute SQL on client's DB and store
resulting dataset in cloud storage
(S3).

3. Data processor -
generate insights in natural
language from structured data using
LLM and respond back to the UI.

33
Architecture – GCP

Insights Pro App Modules (python)

1. Text to SQL converter -


analyze user's natural language
query. find out relevant DB/table
name and generate SQL query using
LLM.

2. SQL executor -
execute SQL on client's DB and store
resulting dataset in cloud storage
(GCS).

3. Data processor -
generate insights in natural
language from structured data using
LLM and respond back to the UI.

34
Pipeline combinations for POC

Note

* Claude-v2 is used from the Anthropic model family because its superiority in code generation capabilities

** text-bison-32k is used from the PaLM model family because of its higher token size and better code generation capabilities.

All analysis and recommendations are based on a short 4 weeks POC leveraging raw data directly from data warehouse and validation being done on sample of 8-10 business questions.
A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

35
Benchmarking

* 3/4 = one of the questions produces relevant output with a reproducibility factor of 2/3

All analysis and recommendations are based on a short 4 weeks POC leveraging raw data directly from data warehouse and validation being done on sample of 8-10 business questions.
A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

36
LLM Model Evaluation Summary

Serial Experiment Track 1 Track2(a) Track 2(b) Track3 Avg. cost


Cloud Accuracy Comments
No. No. (Cloud) (Query to SQL) (Table to Chart Type) (Chart Type Code & Plot) (Summary) per query

Performance on simple query is good


1 AWS 1 Anthropic Titan Anthropic Titan 0/8 ~$0.14 with clean data and more descriptive
data dictionary

Performance on simple query is good


with clean data and more descriptive
2 AWS 2 Anthropic Anthropic Anthropic Anthropic 0/8 ~$0.24 data dictionary

Performance on simple query is good


with clean data and more descriptive
3 Azure 1 Azure OpenAI - gpt 3.5 Azure OpenAI - gpt 3.5 Azure OpenAI - gpt 3.5 Azure OpenAI - gpt 3.5 0/8 ~$0.07 data dictionary

4 Azure 2 Azure OpenAI - gpt 4 Azure OpenAI - gpt 3.5 Azure OpenAI - gpt 4 Azure OpenAI - gpt 3.5 4/8 ~$0.54 Performing relatively better .

Performs in case of simpler queries


5 GCP 1 text-bison text-bison text-bison text-bison 1/8 ~$0.03

code-bison performs similar to text-


6 GCP 2 code-bison text-bison code-bison text-bison 1/8 ~$0.03 bison

All analysis and recommendations are based on a short 4 weeks POC leveraging raw data directly from data warehouse and validation being done on sample of 8-10 business questions.
A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

37
Query to Insights – Evaluation Summary

• GPT 3.5 / 4 were the only combination that yielded reasonable performance (4/8)
combined with cost for questions translating to complex queries
• DQ issues contributed to inaccurate query generation
• All models work well with Simple queries (1 table)
• All models except GPT struggled on Complex SQL query generation
• Significant Business Context & Prompt Engineering will be required to deliver on
improved accuracy for other models for Text-SQL task
• Additional Intelligence layer will be required for Scale-out to help filter in on
relevant tables to address question prior alongside metadata & business context as
part of input prompts
• Cost per Query can be considerable because of Token size from inclusion of
Instructions, Data Dictionary, Guidelines & Business Context

** All analysis and recommendations are based on a short 4 weeks POC leveraging raw data directly from data warehouse and validation being done on sample of 8-10 business questions.
A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

38
Augmented Prompt for Text-to-SQL Step

39
40
Improving LLM Performance for Querying on Structured Data

41
Summary of LLM Learnings - Recommendations for SBD

❑ Large Language Models offer multitude of capabilities for natural language and code generation. Contextualizing LLMs for a specific
use case needs initial effort however possible long-term benefits

❑ It is important to define a focused use case to test the potential of how LLMs can we used to generate business impact

❑ Various services offered by different cloud providers are at different levels of maturity but are rapidly evolving

❑ Choice of the right service should be done based on current requirements, foreseeable benefits and cost implications

❑ Data Dictionary must be descriptive with column names and descriptions. Collibra should be updated

❑ Data should be clean. It not ideal to carry out data cleaning and processing through LLMs.

All analysis and recommendations are based on a short 4 weeks POC leveraging raw data directly from data warehouse and validation being done on sample 8-10 business questions.
A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

42
Known Limitations of all LLM based Services

❑ Quality of suggestions: They may depend on the volume of training dataset for that language

❑ Security and privacy concerns: Since the models are trained on publicly available data, the models could inadvertently make
suggestions that contain security vulnerabilities or were meant to be private

❑ Don’t fully understand context: While the AI has been trained to understand context, it may not be as capable as a human
developer in fully understanding the high-level objectives of a complex project

❑ Dependent on comments and naming: The AI can provide more accurate suggestions when given detailed comments and
descriptive variable names

❑ Lack of creative problem solving: Unlike a human developer, the tool cannot produce innovative solutions or creatively solve
problems.

❑ Inefficient for large data and context: The models may not be optimized for navigating and understanding large codebases. It’s
most effective at suggesting code for small tasks.

Therefore, human developers need to learn to use the power of LLMs in a creative way using a combination of services and models to
best suit their requirements

All analysis and recommendations are based on a short 4 weeks POC leveraging raw data directly from data warehouse and validation being done on sample 8-10 business questions.
A full-fledged implementation and business validation is recommended for a closely defined use case before setting up for enterprise grade scale.

43
Roadmap for SBD

44
Appendix

Use Case 1 – Information Retrieval on Unstructured Data

45
Enterprise Grade Setup - Evaluating Other Solutions & Products

Features Customized Solution using LLMs Search - MS 365 CoPilot Glean

Interface for Enterprise Interactive conversational agent for


Pre-built interface using MS 365 Pre-built interface for enterprise search
Users enterprise use

Enables search on MS products like


Multiple data sources for unstructured
Data Types Sharepoint documents, Word, Excel, Enables search on enterprise data
data including Data Lakes etc.
Powerpoint, etc.

Learning through User Feedback loop can setup to enhance


User feedback is not incorporated User feedback is not incorporated
Feedback model prompts, inputs and outputs

Customized setup takes longer, expedite


Speed of Setup Immediate. No setup required. Immediate. No setup required.
using accelerators

Integration with Custom Built Can be integrated seamlessly with other


Not possible Not possible
Business Applications applications

Conversational agent for users to


Potential Applications Individual productivity enhancement tool Enterprise search
summarize information

Costs depend on usage/ number of


Licensing Costs Per user, included in MS license Per user/ Enterprise license
tokens

46
Appendix

Use Case 2 – Information Retrieval on Structured Data

47
Enterprise Grade Setup – Evaluating Other Options

48
Architecture for Various Options

49
Enterprise Grade Setup – Evaluating Other Solutions & Products
Features Customized Solution using LLMs ChatGPT - Advanced Data Analysis Power BI CoPilot ThoughtSpot Sage

Power Users - Power BI Dashboard and


Interface for Enterprise Interactive conversational agent for Interactive conversational agent for narrative Insights
Pre-built sophisticated user interface
Users enterprise use personal use General Users - Static reports with
visuals and insights

Multiple data sources and tables Can handle curated data supported in
Data Size Multiple data sources and tables A single document upto 512MB
prepared into relevant views the ThoughtSpot data model

Feedback loop can setup to


Learning through User Feedback is not used for enterprise No automated way of capturing
enhance model prompts, inputs and Limited learning through feedback
Feedback grade model fine-tuning feedback and learning from it
outputs

Fast. Backend data preparation, Fast. Depends on the level of details


Customized setup takes longer,
Speed of Setup Immediate. No setup required. connection with Power BI, and required data dictionary and business
expedite using accelerators
dashboard creation. synonyms.

Integration with
Can be integrated seamlessly with
Custom Built Business Not possible Not possible through the interface Not possible
other applications
Applications

Deep dive analysis through Deep dive analysis through Narrative summarization of visuals and Deep dive analysis through
Types of Insights
conversations conversations data conversations

Multiple users, same data source, Multiple users, independent data Multiple users, same data source, Multiple users, same data source,
Scalability
concurrent access to insights sources, concurrent access to insights independent access to insights concurrent access to insights

Costs depend on usage/ number of Per user or power users can generate
Licensing Costs Per user Per user
tokens reports

Full control over each component of


Whitebox Approach User is interacting directly with LLM Limited control over insights and outputs NA
the solution

50
Custom Solution using LLMs

51
Snowflake Cortex

52
PowerBI CoPilot

53
ThoughtSpot Sage

54
Snowflake CoPilot Snapshot

55
Model Specific Fine-tuning: AWS Claude Model

Prompt Element Description


"\n\nHuman:" formatting Two new lines followed by "Human:"
Task context Give Claude context about the role it should take on or what goals and overarching tasks you want it to undertake
with the prompt
Tone context (optional) If important to the interaction, tell Claude what tone it should use.
Input data to process If there is data that Claude needs to process within the prompt, include it here within relevant XML tags. Feel free
to include multiple pieces of data but be sure to enclose each in its own set of XML tags.
Provide examples Provide Claude with at least one example of an ideal response that it can emulate. Encase this in
<example></example> XML tags. Feel free to provide multiple examples. If you do provide multiple examples,
give Claude context about what it is an example of, and enclose each example in its own set of XML tags.
Detailed task description and Expand on the specific tasks you want Claude to do, as well as any rules that Claude might have to follow. This is
rules also where you can give Claude an "out" if it doesn't have an answer or doesn't know.
Immediate task description or "Remind" Claude or tell Claude exactly what it's expected to immediately do to fulfill the prompt's task.
request
Precognition (thinking step by For tasks with multiple steps, it's good to tell Claude to think step by step before giving an answer. Sometimes, you
step) (optional) might have to even say "Before you give your answer..." just to make sure Claude does this first.
Output formatting (optional) If there is a specific way you want Claude's response formatted, clearly tell Claude what that format is.
"\n\nAssistant:" formatting Two new lines followed by "Assistant:". This is also where you could speak on Claude's behalf to help it start its
response.

56

You might also like