How To Use Llama 2 With An API On AWS To Power Your AI Apps
How To Use Llama 2 With An API On AWS To Power Your AI Apps
Once you are in your AWS Dashboard, search for AWS Sagemaker in
the search bar, and click on it to go to AWS Sagemaker
If you are new to this, choose create a new role in the Execution
role category. Otherwise pick a role that you may have created
before.
Note down the user name you see here as it will be needed to deploy
our model in the next step
2. Select the domain name and the user profile you selected
previously and click Open Studio
This will take you to a Jupyter lab studio session that looks like this
Step 4: Select the Llama-2–7b-chat model
If you do not see this model then you may need to shut down and
restart your studio session
3. This will take you to the model page. You can change the
deployments settings as best suited to your use case but we will just
proceed with the default Sagemaker settings and Deploy the model
as is
The 70B version needs a powerful server so your deployment might
error out if your account does not have access to it. In this case,
submit a request to AWS service quotas.
Note down the model’s Endpoint name since you will need it to
use the model with an API.
And with that, you are now done with Part I of hosting the model.
Have a beverage or snack of your choice to celebrate!
Part II — Use the model with an API
Enter the LLM model’s endpoint name from the last step of Part
I as an environment variable
You can actually add anything for the key you wish but it will need to
match up with what we write in our code to call the function later
Step 3: Write the code that will call the Llama model
1. Go back to the Code tab and copy and paste the following code
there
import os
import io
import boto3
import json
result = json.loads(response['Body'].read().decode())
return {
"statusCode": 200,
"body": json.dumps(result)
}
3. Fill out the API Gateway dialog as follows and click on Add
4. After the API endpoint has been successfully created, you can
view the API URL under the Configuration tab
and Triggers sidebar
Step 5: Test your brand spanking new LLM API
{
"inputs": [
[
{"role": "system", "content": "You are chat bot who writes songs"},
{"role": "user", "content": "Write a rap about Barbie"}
]
],
"parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}
2. Check the response status code and the response JSON from
the API. The status code should be 200 and the response JSON will
be like the following
[{"generation": {"role": "assistant",
"content": " Yo, listen up, I got a story to tell
'Bout a doll that's fly, and she's doin' well
Barbie, Barbie, she's the queen of the scene
From Malibu to Kenya, she's always lookin' supreme
She's got the style, she's got the grace
She's got the swag, she's got the pace
She's got the cars, she's got the cash
She's livin' large, she's got the flash
She's got the perfect body, she's got the perfect face
She's got the perfect hair, she's got the perfect pace
She's got the perfect life, she's got the perfect vibe
She's got it all, she's got the perfect ride
She's got Ken by her side, he's her main man
He's got the skills, he's got the plan
They're livin' large, they're got the fame
They're the perfect couple, they're got the game
So listen up,"}}]
You can use the following Python code to test the API. Replace the
value for the api_url with the API Url that you created in the last
step 4
import requests
api_url = 'https://spip03jtgd.execute-api.us-east-1.amazonaws.com/default/call-bloom-llm'
json_body = {
"inputs": [
[
{"role": "system", "content": "You are chat bot who writes songs"},
{"role": "user", "content": "Write a rap about Barbie"}
]
],
"parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}
r = requests.post(api_url, json=json_body)
print(r.json())
Potential Errors