I Built an AWS Well-Architected Chatbot with ChatGPT. Here's How I Approached It
Tips and guidance for building a ChatGPT chatbot.
- Data Collection
- Creating Text Embeddings
- Prompt Engineering
- Creating the Chat Interface
Selenium
and BeautifulSoup
to methodically scrape content from the entire Well-Architected Framework page. To ensure comprehensive data extraction, I meticulously examined every section on the main page, as well as followed and scraped all associated links found in the sidebar. As a result, I captured the complete content and compiled it into a CSV file, along with the corresponding titles and URLs for easy reference and citation.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def get_page_info(url):
browser.get(url)
html = browser.page_source
# Have soup parse the website
soup = BeautifulSoup(html, "html.parser")
# Get title
title = soup.find("title").string
main_article = soup.find(id="main-col-body") # main text of article
# Get text sections
text_sections = main_article.findAll("p")
text_list = []
for list_item in text_sections:
text_list.append(list_item.text)
# Get info in tables
tables = main_article.findAll("table")
for table in tables:
# Add all ths and tds
tds = table.findAll("td")
ths = table.findAll("th")
for th in ths:
text_list.append(th.text)
for td in tds:
text_list.append(td.text)
json_obj = {}
json_obj["url"] = url
json_obj["title"] = title
json_obj["sections"] = text_list
return json_obj
OpenAI's embeddings API.
- Search (where results are ranked by relevance to a query string)
- Clustering (where text strings are grouped by similarity)
- Recommendations (where items with related text strings are recommended)
- Anomaly detection (where outliers with little relatedness are identified)
- Diversity measurement (where similarity distributions are analyzed)
- Classification (where text strings are classified by their most similar label)
How do I design VPC architectures with security components?
we get a list of the documents that has text which is relevant to the query.1
2
3
4
5
6
7
8
9
10
11
12
13
14
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": "You are an AWS Certified Solutions Architect. Your role is to help customers understand best practices on building on AWS. Return your response in markdown, so you can bold and highlight important steps for customers.",
},
{
"role": "system",
"content": f"Use the following context from the AWS Well-Architected Framework to answer the user's query.\nContext:\n{context}",
},
{"role": "user", "content": f"{query}"},
],
)
get_answer_from_chatgpt()
function is called to get a response from the ChatGPT and the referenced documents.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def app() -> None:
"""
Purpose:
Controls the app flow
Args:
N/A
Returns:
N/A
"""
# Spin up the sidebar
sidebar()
# Load questions
query = st.text_input("Query:")
df = load_data_frame("min_aws_wa.csv")
document_embeddings = load_embeddings("document_embeddings.pkl")
if st.button("Submit Query"):
with st.spinner("Generating..."):
answer, docs = utils.get_answer_from_chatgpt(
query,
df,
document_embeddings,
)
st.markdown(answer)
st.subheader("Resources")
for doc in docs:
st.write(doc)
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.