'Hey Computer, Talk to Me in Polish': Building a Polish-language Speaking Chatbot
Use Amazon SageMaker, Hugging Face, TRURL 2, and Streamlit to build a foreign language chatbot.
- Pre-installed tools:
- Most recent AWS CLI.
- AWS CDK in version 2.104.0 or higher.
- Python 3.10 or higher.
- Node.js v21.x or higher.
- Configured profile in the installed AWS CLI with credentials for your AWS IAM user account.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Do those in the repository root after checking out.
Ideally, you should do them in a single terminal session.
Node.js v21 is not yet supported inside JSII, but it works for that example - so "shush", please.
export JSII_SILENCE_WARNING_UNTESTED_NODE_VERSION=true
make
source ./.env/bin/activate
cd ../infrastructure
npm install
Or `export AWS_PROFILE=<YOUR_PROFILE_FROM_AWS_CLI>`
export AWS_DEFAULT_PROFILE=<YOUR_PROFILE_FROM_AWS_CLI>
export AWS_USERNAME=<YOUR_IAM_USERNAME>
cdk bootstrap
npm run package
npm run deploy-shared-infrastructure
Answer a few AWS CDK CLI wizard questions and wait for completion.
Now, you can push code from this repository to the created AWS CodeCommit git repository remote.
# Here you can find an official guide on how to configure your local `git` for AWS CodeCommit:
https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up.html
git remote add aws <HTTPS_REPOSITORY_URL_PRESENT_IN_THE_CDK_OUTPUTS_FROM_PREVIOUS_COMMAND>
git push aws main
npm run deploy
Again, answer a few AWS CDK CLI wizard questions and wait for completion.
1
2
3
4
5
Those steps should be invoked in the *System terminal* inside *Amazon SageMaker Studio*:
cd deploying-trurl-2-on-amazon-sagemaker
./install-amazon-code-whisperer-in-sagemaker-studio.sh
./install-project-dependencies-in-sagemaker-studio.sh
trurl-2
directory and explore the capabilities of the TRURL 2 model that you will deploy from the SageMaker Studio notebook as an Amazon SageMaker Endpoint and CodeWhisperer will be our AI-powered coding companion throughout the process.CTRL/CMD + ENTER
). Remember that before executing the clean-up section and invoking the cell with predictor.delete_endpoint()
, you should stop, as we will need the running endpoint for the next section.1
2
3
4
5
If you use the previously used terminal session, you can skip this line:
source ./.env/bin/activate
cd trurl-2
streamlit run chatbot-app.st.py
1
2
3
conda activate studio
cd deploying-trurl-2-on-amazon-sagemaker/trurl-2
./run-streamlit-in-sagemaker-studio.sh chatbot-app.st.py
endpoint_name
value inside Jupyter notebook when you invoked huggingface_model.deploy(...)
operation.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
if st.session_state.messages[-1]["role"] != "assistant":
with st.chat_message("assistant"):
with st.spinner("Thinking..."):
response = talk_with_trurl2(predictor, st.session_state.messages)
placeholder = st.empty()
full_response = ''
for item in response:
full_response += item
placeholder.markdown(full_response)
placeholder.markdown(full_response)
message = {"role": "assistant", "content": full_response}
st.session_state.messages.append(message)
1
2
3
4
def talk_with_trurl2(endpoint, dict_message):
llama2_prompt = build_llama2_prompt(dict_message)
output = call_sagemaker_endpoint(endpoint, llama2_prompt)
return output
"assistant"
) to visualize that in the conversation flow. For those who do not speak Polish, the main part of the prompt sets a friendly conversational tone and asks the chatbot to play a game of 20 questions, where a player specifies the category to guess as an entry point to the conversation.1
2
3
4
5
6
7
8
9
10
11
Those steps should be invoked locally from the repository root:
export STUDIO_STACK_NAME="Environment-SageMakerStudio"
export EFS_ID=$(aws cloudformation describe-stacks --stack-name "${STUDIO_STACK_NAME}" --query "Stacks[0].Outputs[?OutputKey=='SharedAmazonSageMakerStudioDomainEFS'].OutputValue" --output text)
export DOMAIN_ID=$(aws cloudformation describe-stacks --stack-name "${STUDIO_STACK_NAME}" --query "Stacks[0].Outputs[?OutputKey=='SharedAmazonSageMakerStudioDomainId'].OutputValue" --output text)
(cd infrastructure && cdk destroy "${STUDIO_STACK_NAME}")
./clean-up-after-sagemaker-studio.sh "${EFS_ID}" "${DOMAIN_ID}"
(cd infrastructure && cdk destroy --all)
ml.g5.2xlarge
) and compute for kernel that was used by Amazon SageMaker Studio Notebook (1x ml.t3.medium
). Assuming that we have set up all infrastructure in eu-west-1
, the total cost of using 8 hours of cloud resources from this code sample will be lower than $15 (here you can find detailed calculation). Everything else you created with the infrastructure as code (via AWS CDK) has a much lower cost, especially within the discussed time constraints.Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.