0% found this document useful (0 votes)

73 views21 pages

How To Use Llama 2 With An API On AWS To Power Your AI Apps

This document provides steps to deploy a Llama 2 conversational AI model on AWS Sagemaker and create an API using AWS Lambda and API Gateway to interact with the model. It covers setting up a Sagemaker domain, deploying the Llama 2 model, writing Lambda function code to call the model endpoint, and testing the API.

Uploaded by

Angelo Pallanca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views21 pages

How To Use Llama 2 With An API On AWS To Power Your AI Apps

Uploaded by

Angelo Pallanca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

How to use Llama 2 with an API

on AWS to power your AI apps

Step 0: Log-in or Sign up for an AWS account

1. Go to https://aws.amazon.com/ and log-in or sign up

for an account

2. If you sign up for a new account, you will

automatically be given Free Tier access, which does
provide some Sagemaker credits, but keep an eye out
on them since depending on your server selection the
bill can get absurdly high
Part I — Hosting the Model

Step 1: Go to AWS Sagemaker

Once you are in your AWS Dashboard, search for AWS Sagemaker in
the search bar, and click on it to go to AWS Sagemaker

AWS Sagemaker is AWS’s solution for deploying and hosting

Machine Learning models.
Step 2: Set up a domain on AWS Sagemaker

1. Click on Domains on the left sidebar

2. Click on Create a Domain

3. Make sure the Quick Setup box is selected

4. Fill out the form below with a domain name of your choosing and
the rest of options filled out like you see in the screenshot.

If you are new to this, choose create a new role in the Execution
role category. Otherwise pick a role that you may have created
before.

5. Click Submit on the form to create your domain

6. When the domain is finished being created, you will be shown this
screen

Note down the user name you see here as it will be needed to deploy
our model in the next step

If your domain had an error being created, it is likely due to user

permission’s or VPC configuration.

Both topics are complex in and of themselves, so if you would like

some help, feel free to book a call with us at www.woyera.com
Step 3: Start a Sagemaker Studio Session

1. Click on the Studio link in the left sidebar once your

domain is finished being created

2. Select the domain name and the user profile you selected
previously and click Open Studio

This will take you to a Jupyter lab studio session that looks like this
Step 4: Select the Llama-2–7b-chat model

We are going to deploy the chat optimized & 7 billion parameter

version of the llama 2 model.

There is a more powerful 70b model, which is much more robust,

for demo purposes it will be too costly so we will go with the
smaller model

1. Click on Models, notebooks, solutions in the left

side bar under the SageMaker Jumpstart tab
2. Search for the Llama 2 model in the search bar. We are looking
for the 7b chat model. Click on the model

If you do not see this model then you may need to shut down and
restart your studio session

3. This will take you to the model page. You can change the
deployments settings as best suited to your use case but we will just
proceed with the default Sagemaker settings and Deploy the model
as is
The 70B version needs a powerful server so your deployment might
error out if your account does not have access to it. In this case,
submit a request to AWS service quotas.

4. Wait 5–10 minutes for deployment to finish and the confirmation

screen to be shown

Note down the model’s Endpoint name since you will need it to
use the model with an API.

And with that, you are now done with Part I of hosting the model.
Have a beverage or snack of your choice to celebrate!
Part II — Use the model with an API

Step 1: Go to AWS Lambda to create a Lambda Function

A lambda function will be used to call your LLM model’s endpoint

1. Search for the Lambda service in the AWS console

search bar and click on the Lambda service

2. Click on Create function

3. Enter a proper function name (doesn’t matter what),
choose Python 3.10 as the runtime and the x86_64 architecture.
Then click on Create Function
Step 2: Specify your model’s endpoint point

Enter the LLM model’s endpoint name from the last step of Part
I as an environment variable

1. Click on the Configuration tab in your newly created model

2. Click on Environment variables and click on Edit

3. Click on Add environment variable on the next screen

4. Enter ENDPOINT_NAME as the key and your model’s

endpoint name as the value. Click Save

You can actually add anything for the key you wish but it will need to
match up with what we write in our code to call the function later
Step 3: Write the code that will call the Llama model

1. Go back to the Code tab and copy and paste the following code
there

import os
import io
import boto3
import json

# grab environment variables

ENDPOINT_NAME = os.environ['ENDPOINT_NAME']
runtime= boto3.client('runtime.sagemaker')

def lambda_handler(event, context):

response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='application/json',
Body=event['body'],
CustomAttributes="accept_eula=true")

result = json.loads(response['Body'].read().decode())

return {
"statusCode": 200,
"body": json.dumps(result)
}

2. Click Deploy after the code is successfully inserted

Step 4: Connect your new Lambda function to AWS API
Gateway
1. Go to your Lambda function’s home screen and
click Add Trigger
2. Select the API Gateway menu item in the Add trigger dialog

3. Fill out the API Gateway dialog as follows and click on Add
4. After the API endpoint has been successfully created, you can
view the API URL under the Configuration tab
and Triggers sidebar
Step 5: Test your brand spanking new LLM API

1. Make a POST or GET request to your API URL with

the following JSON body

{
"inputs": [
[
{"role": "system", "content": "You are chat bot who writes songs"},
{"role": "user", "content": "Write a rap about Barbie"}
]
],
"parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}

The inputs key stores an array with each element being 1

conversation between you and the chat bot.

The conversation is represented by a nested JSON object that

contains role which signifies the “person” in the conversation
and content which is the prompt

2. Check the response status code and the response JSON from
the API. The status code should be 200 and the response JSON will
be like the following
[{"generation": {"role": "assistant",
"content": " Yo, listen up, I got a story to tell
'Bout a doll that's fly, and she's doin' well
Barbie, Barbie, she's the queen of the scene
From Malibu to Kenya, she's always lookin' supreme
She's got the style, she's got the grace
She's got the swag, she's got the pace
She's got the cars, she's got the cash
She's livin' large, she's got the flash
She's got the perfect body, she's got the perfect face
She's got the perfect hair, she's got the perfect pace
She's got the perfect life, she's got the perfect vibe
She's got it all, she's got the perfect ride
She's got Ken by her side, he's her main man
He's got the skills, he's got the plan
They're livin' large, they're got the fame
They're the perfect couple, they're got the game
So listen up,"}}]

You can use the following Python code to test the API. Replace the
value for the api_url with the API Url that you created in the last
step 4

import requests

api_url = 'https://spip03jtgd.execute-api.us-east-1.amazonaws.com/default/call-bloom-llm'

json_body = {
"inputs": [
[
{"role": "system", "content": "You are chat bot who writes songs"},
{"role": "user", "content": "Write a rap about Barbie"}
]
],
"parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}

r = requests.post(api_url, json=json_body)

print(r.json())

Potential Errors

You might receive a few errors in this scenario:

1. Permissions: if your role does not have

permissions to use the Sagemaker invoke endpoint
policy, then you will not be able to call the endpoint.
AWS permissions can get confusing so schedule a call
with us if you would like us to walk you through them

2. Timeout: depending on your prompt and variables,

you may receive a timed out error. Unlike
permissions, this is an easy fix. Click
on Configuration, General, and Edit
Timeout and set the timeout value to more seconds

SEC401 GSEC Index PDF
No ratings yet
SEC401 GSEC Index PDF
33 pages
Python Boto3 Task
No ratings yet
Python Boto3 Task
35 pages
SignalR on .NET 6 - the Complete Guide
From Everand
SignalR on .NET 6 - the Complete Guide
Fiodar Sazanavets
No ratings yet
Agent Ai
No ratings yet
Agent Ai
30 pages
IBM WebSphere Application Server Interview Questions You'll Most Likely Be Asked
From Everand
IBM WebSphere Application Server Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
445 Photographers in Midjourney 5.2
100% (2)
445 Photographers in Midjourney 5.2
76 pages
Steps To Create and Deploy Our YOLO Model On AWS Sagemaker
No ratings yet
Steps To Create and Deploy Our YOLO Model On AWS Sagemaker
3 pages
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
Swift For Dummies
From Everand
Swift For Dummies
Jesse Feiler
No ratings yet
Javascript For Beginners: Your Guide For Learning Javascript Programming in 24 Hours
From Everand
Javascript For Beginners: Your Guide For Learning Javascript Programming in 24 Hours
John Maldonado
3/5 (13)
Learn Jmeter in 24 Hours
From Everand
Learn Jmeter in 24 Hours
Nordeen Alex
No ratings yet
JavaScript Essentials For Dummies
From Everand
JavaScript Essentials For Dummies
Paul McFedries
No ratings yet
cc manual
No ratings yet
cc manual
19 pages
Using Aws Lambda
No ratings yet
Using Aws Lambda
4 pages
Creating Lambda
No ratings yet
Creating Lambda
5 pages
W5-Lambda APIGateway
No ratings yet
W5-Lambda APIGateway
28 pages
API Gateway, Cognito and Node.js Lambdas
From Everand
API Gateway, Cognito and Node.js Lambdas
Matthew Casperson
5/5 (1)
5CS022 - AWS Lambda REST API
No ratings yet
5CS022 - AWS Lambda REST API
22 pages
Lab 3
No ratings yet
Lab 3
70 pages
Meta Releases Prompt Engineering Guide
No ratings yet
Meta Releases Prompt Engineering Guide
11 pages
AWS Certified Developer Associate (DVA-C01) Practice Test
From Everand
AWS Certified Developer Associate (DVA-C01) Practice Test
iCertify Training
No ratings yet
Visual Basic 6.0 Programming By Examples: 7 Windows Application Examples
From Everand
Visual Basic 6.0 Programming By Examples: 7 Windows Application Examples
Sergey Skudaev
3/5 (2)
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
From Everand
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
Jens Boje
No ratings yet
AWS Lamda
No ratings yet
AWS Lamda
15 pages
Umbraco User's Guide
From Everand
Umbraco User's Guide
Nik Wahlberg
4/5 (1)
Java: Tips and Tricks to Programming Code with Java: Java Computer Programming, #2
From Everand
Java: Tips and Tricks to Programming Code with Java: Java Computer Programming, #2
Charlie Masterson
No ratings yet
Java: Tips and Tricks to Programming Code with Java
From Everand
Java: Tips and Tricks to Programming Code with Java
Charlie Masterson
No ratings yet
How to Write a Bulk Emails Application in Vb.Net and Mysql: Step by Step Fully Working Program
From Everand
How to Write a Bulk Emails Application in Vb.Net and Mysql: Step by Step Fully Working Program
Lotfi Ferchichi
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Step by Step: Fault-tolerant, Scalable, Secure AWS Web Stack
From Everand
Step by Step: Fault-tolerant, Scalable, Secure AWS Web Stack
Savitra Sirohi
No ratings yet
AWS_SageMaker_with_Python
No ratings yet
AWS_SageMaker_with_Python
6 pages
Assignment 2
No ratings yet
Assignment 2
30 pages
C++ All-in-One For Dummies
From Everand
C++ All-in-One For Dummies
John Paul Mueller
4/5 (1)
Microsoft Power Platform Up and Running: Learn to Analyze Data, Create Solutions, Automate Processes, and Develop Virtual Agents with Low Code Programming (English Edition)
From Everand
Microsoft Power Platform Up and Running: Learn to Analyze Data, Create Solutions, Automate Processes, and Develop Virtual Agents with Low Code Programming (English Edition)
Robert Rybaric
5/5 (1)
MATLAB For Dummies
From Everand
MATLAB For Dummies
Jim Sizemore
No ratings yet
Spring Boot and Single-Page Applications: Securing Your API with a Single-Page Application Frontend - Second Edition
From Everand
Spring Boot and Single-Page Applications: Securing Your API with a Single-Page Application Frontend - Second Edition
Jens Boje
No ratings yet
AWS in Action Part -2: Real-world Solutions for Cloud Professionals
From Everand
AWS in Action Part -2: Real-world Solutions for Cloud Professionals
Poonam Devi
No ratings yet
Intermediate Load Runner With Oracle/Apex Concepts.
From Everand
Intermediate Load Runner With Oracle/Apex Concepts.
Rohan Gordon
No ratings yet
Ex. 13 - Lambda in AWS
No ratings yet
Ex. 13 - Lambda in AWS
3 pages
Save Your Time with VBA!: 5 Quality VBA Books In One Package!
From Everand
Save Your Time with VBA!: 5 Quality VBA Books In One Package!
Andrei Besedin
No ratings yet
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
From Everand
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
Adam Freeman
No ratings yet
Salesforce Developer Interview Questions: 1.0, #1
From Everand
Salesforce Developer Interview Questions: 1.0, #1
SFDC TELUGU
No ratings yet
Excel Custom Functions: Straight to the Point
From Everand
Excel Custom Functions: Straight to the Point
Suat M Ozgur
No ratings yet
[English (auto-generated)] Gen AI Project Using Llama3.1 _ End to End Gen AI Project [DownSub.com]
No ratings yet
[English (auto-generated)] Gen AI Project Using Llama3.1 _ End to End Gen AI Project [DownSub.com]
29 pages
ConfigMgr - An Administrator's Guide to Deploying Applications using PowerShell
From Everand
ConfigMgr - An Administrator's Guide to Deploying Applications using PowerShell
Owen Smith
5/5 (1)
C++ For Dummies
From Everand
C++ For Dummies
Stephen R. Davis
3/5 (1)
The Definitive Guide to Getting Started with OpenCart 2.x
From Everand
The Definitive Guide to Getting Started with OpenCart 2.x
iSenseLabs
No ratings yet
AWS - Lambda
No ratings yet
AWS - Lambda
39 pages
aws LAMBDA
No ratings yet
aws LAMBDA
5 pages
2019_0709-SRV_Slide-Deck
No ratings yet
2019_0709-SRV_Slide-Deck
63 pages
Developing a Using Snatchbot
No ratings yet
Developing a Using Snatchbot
37 pages
Javascript Concepts: 1St Edition
From Everand
Javascript Concepts: 1St Edition
Mohammed Ashequr Rahman
No ratings yet
Computer Productivity Book 3. Use AutoHotKey to License & Deploy Your Scripts to Sell: AutoHotKey productivity, #3
From Everand
Computer Productivity Book 3. Use AutoHotKey to License & Deploy Your Scripts to Sell: AutoHotKey productivity, #3
Max Drake
No ratings yet
Mastering CryENGINE
From Everand
Mastering CryENGINE
Sascha Gundlach
No ratings yet
IBM WEBSPHERE Frequently Asked Questions
From Everand
IBM WEBSPHERE Frequently Asked Questions
equitypress
1/5 (1)
Ex. 15 - Load Balancer using AWS
No ratings yet
Ex. 15 - Load Balancer using AWS
3 pages
Quick Guide for Creating Wordpress Websites, Creating EPUB E-books, and Overview of Some eFax, VOIP and SMS Services
From Everand
Quick Guide for Creating Wordpress Websites, Creating EPUB E-books, and Overview of Some eFax, VOIP and SMS Services
Dr. Hedaya Mahmood Alasooly
No ratings yet
Custom LLM
No ratings yet
Custom LLM
11 pages
OpenCart Tips and Tricks
From Everand
OpenCart Tips and Tricks
iSenseLabs
No ratings yet
Using AWS Lambda with S3 Bucket and API Gateway
No ratings yet
Using AWS Lambda with S3 Bucket and API Gateway
4 pages
Workflow For A Function Calling Agent - LlamaIndex
No ratings yet
Workflow For A Function Calling Agent - LlamaIndex
5 pages
Vue.js: Tools & Skills
From Everand
Vue.js: Tools & Skills
James Hibbard
No ratings yet
Talking AI
No ratings yet
Talking AI
3 pages
Key Fundraising Principles For Trustees: Read Our Fundraising Guidance at WWW - Gov.uk/charity-Commission
No ratings yet
Key Fundraising Principles For Trustees: Read Our Fundraising Guidance at WWW - Gov.uk/charity-Commission
1 page
Optim
No ratings yet
Optim
24 pages
Taking Responsibility For Our Charity's Fundraising: A Checklist For Trustees
No ratings yet
Taking Responsibility For Our Charity's Fundraising: A Checklist For Trustees
3 pages
3FH16815K0800610R Major Gift Fundraising
No ratings yet
3FH16815K0800610R Major Gift Fundraising
48 pages
Ariadne Infopack English
No ratings yet
Ariadne Infopack English
30 pages
Structuring Not-For-Profit Operations in The Uk: The Charity First Series
No ratings yet
Structuring Not-For-Profit Operations in The Uk: The Charity First Series
29 pages
BigData Cs-704 Practical
No ratings yet
BigData Cs-704 Practical
28 pages
Conversational AI Powered Chatbot Using Lex and AWS
0% (1)
Conversational AI Powered Chatbot Using Lex and AWS
6 pages
AWS to Azure services comparison
No ratings yet
AWS to Azure services comparison
26 pages
Priyank_Soni_5Yrs_Resume
No ratings yet
Priyank_Soni_5Yrs_Resume
1 page
Module-02
No ratings yet
Module-02
70 pages
BROUCHURE v16
No ratings yet
BROUCHURE v16
12 pages
External Configuration 5.2.5 - PDF
No ratings yet
External Configuration 5.2.5 - PDF
689 pages
AWS Storage Services Whitepaper-V9
No ratings yet
AWS Storage Services Whitepaper-V9
41 pages
Zero Trust Security Azure Zscaler
No ratings yet
Zero Trust Security Azure Zscaler
28 pages
nri_2023
No ratings yet
nri_2023
284 pages
AWS Certified Solutions Architect Study Guide 3E Associate SAA C02 Exam Aws Certified Solutions Architect Official Associate Exam 3rd Edition Ben Piper pdf download
No ratings yet
AWS Certified Solutions Architect Study Guide 3E Associate SAA C02 Exam Aws Certified Solutions Architect Official Associate Exam 3rd Edition Ben Piper pdf download
54 pages
Rahul Bharuka Python Dev 3 YOE
No ratings yet
Rahul Bharuka Python Dev 3 YOE
1 page
Serverless Architecture Patterns and Best Practices
100% (1)
Serverless Architecture Patterns and Best Practices
42 pages
INS3066 Midterm 2024
No ratings yet
INS3066 Midterm 2024
9 pages
AWS Questions
No ratings yet
AWS Questions
17 pages
978-3-642-10665-1_63
No ratings yet
978-3-642-10665-1_63
2 pages
Assignment Cover Sheet - IT
No ratings yet
Assignment Cover Sheet - IT
84 pages
AcademyCloudFoundations Module 07
No ratings yet
AcademyCloudFoundations Module 07
65 pages
6.CC Lab-Manual
No ratings yet
6.CC Lab-Manual
19 pages
Resume-Vinay Kumar Akula
No ratings yet
Resume-Vinay Kumar Akula
7 pages
Career Awareness on AWS Cloud Computing by keystone edtech
No ratings yet
Career Awareness on AWS Cloud Computing by keystone edtech
3 pages
AWS CloudTrail
No ratings yet
AWS CloudTrail
2 pages
Blind Profile
No ratings yet
Blind Profile
10 pages
Bece 355l Aws Cloud Module 2 f (1)
No ratings yet
Bece 355l Aws Cloud Module 2 f (1)
73 pages
SevenMentor-AWS-Syllabus
No ratings yet
SevenMentor-AWS-Syllabus
5 pages
Service Offering Foundational Technical Review Calibration Guide
No ratings yet
Service Offering Foundational Technical Review Calibration Guide
31 pages
AWS Certification Exam Prep14
No ratings yet
AWS Certification Exam Prep14
2 pages
AWS Certified Solutions Architect Associate-Exam Guide EN 1.8 PDF
No ratings yet
AWS Certified Solutions Architect Associate-Exam Guide EN 1.8 PDF
3 pages
Cloudfront Dublin Your Questions Answered!
No ratings yet
Cloudfront Dublin Your Questions Answered!
1 page

How To Use Llama 2 With An API On AWS To Power Your AI Apps

Uploaded by

How To Use Llama 2 With An API On AWS To Power Your AI Apps

Uploaded by

How to use Llama 2 with an API

on AWS to power your AI apps

Step 0: Log-in or Sign up for an AWS account

1. Go to https://aws.amazon.com/ and log-in or sign up

2. If you sign up for a new account, you will

Step 1: Go to AWS Sagemaker

AWS Sagemaker is AWS’s solution for deploying and hosting

1. Click on Domains on the left sidebar

2. Click on Create a Domain

3. Make sure the Quick Setup box is selected

5. Click Submit on the form to create your domain

If your domain had an error being created, it is likely due to user

Both topics are complex in and of themselves, so if you would like

1. Click on the Studio link in the left sidebar once your

We are going to deploy the chat optimized & 7 billion parameter

There is a more powerful 70b model, which is much more robust,

1. Click on Models, notebooks, solutions in the left

4. Wait 5–10 minutes for deployment to finish and the confirmation

Step 1: Go to AWS Lambda to create a Lambda Function

A lambda function will be used to call your LLM model’s endpoint

1. Search for the Lambda service in the AWS console

2. Click on Create function

1. Click on the Configuration tab in your newly created model

2. Click on Environment variables and click on Edit

4. Enter ENDPOINT_NAME as the key and your model’s

# grab environment variables

def lambda_handler(event, context):

2. Click Deploy after the code is successfully inserted

1. Make a POST or GET request to your API URL with

The inputs key stores an array with each element being 1

The conversation is represented by a nested JSON object that

You might receive a few errors in this scenario:

1. Permissions: if your role does not have

2. Timeout: depending on your prompt and variables,

You might also like