0% found this document useful (0 votes)
3 views

Labsheet9

The document outlines a Python script for setting up an environment to utilize OpenAI's API for document loading and question-answering retrieval chains. It includes steps for installing necessary libraries, reading and writing text files, splitting documents into smaller chunks, creating embeddings, and querying the system with specific questions about natural disasters and FEMA. The script demonstrates the integration of various components from the LangChain library to facilitate these tasks.

Uploaded by

lunaaaaa0309
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Labsheet9

The document outlines a Python script for setting up an environment to utilize OpenAI's API for document loading and question-answering retrieval chains. It includes steps for installing necessary libraries, reading and writing text files, splitting documents into smaller chunks, creating embeddings, and querying the system with specific questions about natural disasters and FEMA. The script demonstrates the integration of various components from the LangChain library to facilitate these tasks.

Uploaded by

lunaaaaa0309
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

import os

# Set the OpenAI API key


os.environ['OPENAI_API_KEY'] = ' '
#####################################
api_key = os.getenv('OPENAI_API_KEY')

print(api_key) # Check if the key is successfully retrieved


########################
!pip install chromadb
!pip install langchain
!pip install langchain_community

#############################################
# document loading and QA retrieval chains
# pip install chromadb
# pip install tiktoken
import chromadb

from langchain_openai import OpenAI


from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA

# Read the file using the correct encoding


with open("/content/input_text.txt", "r", encoding="utf-8") as f:
text = f.read()

# Write the text back to a new file, ensuring it's in UTF-8 encoding
with open("input_text_utf8.txt", "w", encoding="utf-8") as f:
f.write(text)

loader = TextLoader("/content/input_text.txt")
document = loader.load()

print(document)

#Split the document into smaller chunks that are semantically related
# it will split the text in three recursive calls. First it splits at paragraph
level
#Second call at sentence level and third at word level.
# This is at \n\n , \n and space

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,
chunk_overlap = 200)
texts = text_splitter.split_documents(document)

#print(texts[0])

#print(texts[1])
# Create the embeddings for the texts. Each text is converted to vector space
# and using the embeddings of two texts which is a floating point number
#we can understand how related they are. OpenAI uses sentence embedding, word
embeddings
#based on various factors
embeddings = OpenAIEmbeddings()

store = Chroma.from_documents(texts, embeddings,


collection_name='input_text')
llm = OpenAI(temperature = 0)
chain = RetrievalQA.from_chain_type(llm,
retriever = store.as_retriever())

#Start querying
question1 = "What is a natural disaster"
result = chain.invoke({"query": question1})
print(result)

question2 = "List all the natural hazards"


result = chain.invoke({"query": question2})
print(result)

question3 = "What is FEMA?"


result = chain.invoke({"query": question3})
print(result)

You might also like