As part of the AWS Game Builder Challenge, I have built a simple application to play a spelling game. For that, I utilise AWS Serverless and GenAI services. CDK is used as the IAC tool to implement this.
Players can select a language to play the game. As of now, only English (US) and Dutch are the available languages.
When a player selects a language, there will be a maximum 5 words generated. This includes an audio, brief meaning of the word and the number of characters for each word.
Then the player needs to fill the word in the text box given.
There is an indicator next to the text box how many characters have been entered in the text box and how many characters are required for the word. This indicator will be red until the required number of characters are filled, then it turns into green.
There is a timer that starts as soon as the words are generated. This is based on the number of words generated.
When the remaining time is less than 30 seconds, the background of the page as well as the background of the timer turns into red.
Players can submit the game any time. However, if the player was unable to submit it until the timer runs out, the game will be automatically submitted.
Then, the answers are evaluated and based on the number of correct answers, there is a pop up visible.
If all the answers are correct, there will be a "Confetti" effect appearing on the page.
Clicking on the 'show results' button on the pop up, the player can see the correct/incorrect answers and the correct word (in case of an incorrect answer).
Implementation
There are 2 main parts of this application. * Backend - where words are generated and APIs are available. * Frontend - Vue.js application for players to interact.
There are two repositories available with the complete source code:
Backend is implemented using AWS CDK and can be deployed as a generic CDK application. The base URL of the API Gateway is required for the frontend to work.
In the frontend, add the API Gateway base url to the VITE_API_BASE_URL in the env file. Install necessary dependencies and then run the application using npm run dev in the dev mode. Or you can build the frontend app using npm run build.
I have used AWS Amplify web hosting to deploy the built frontend and configured a custom domain, automated deployment upon Github push etc. (Those are not included as IAC in the provided Github repositories).
Backend
There are 2 main components of the backend.
Words generator component - to generate and save words in the database.
API component - to serve frontend.
Backend words generator component
Here is the high level overview of the words generator component and the steps within the state machine. Within Step Functions, there will be many AWS services called as is explained below.
Words Generator component high level structure
Words Generator State Machine
1. Words Generator Step Functions state machine is responsible for generating words.
2. This Step function execution takes the language code as an input. ex: en-US, nl-NL, etc.
3. As the first step, Bedrock InvokeModel is being called to generate 5 words with descriptions for each word, based on the language.
4. Here, the Anthropic Claude 3 Haiku model is being used which gives better balance of accuracy and the pricing in this scenario.
5. Here is the prompt I used to generate the words:
"Generate 5 unique words that have a random number of characters more than 4 and less than 10 in Dutch language. For each word, provide a brief description of its meaning in English with more than a couple of words. Produce output only in a minified JSON array with the keys word and description. Word must always be in lowercase."
6. Here, the response is a JSON string.
7. Within the step's Result selector, the response will be converted to an array using the intrinsic function - StringToJson.
8. Then there is a map state where each word is an input to each map.
9. Within the map state, there are branches based on the language.
10. In each branch, there is a step to synthesise the word using Polly using StartSpeechSynthesisTask.
11. StartSpeechSynthesisTask is an async operation. So, there is an immediate step to check if the synthesis task is completed using Polly's GetSpeechSynthesisTask.
12. Great thing about the StartSpeechSynthesisTask api is that it not only synthesises, but automatically saves the mp3 into the given S3 bucket.
13. If the speech synthesis task is not finished based on the status of the GetSpeechSynthesisTask, it waits and retries the status check.
14. Once synthesis is done, the execution continues to save word to DynamoDB step.
15. In this step, we use DynamoDB's PutItem api to save the generated data to the table. One record consists of below data:
16. Since this synthesis and save to db task runs on a map state, after a single execution, there will be a maximum 5 new words available in the ddb for the given language.
17. There are two EventBridge Schedules running every 5 minutes to call this State Machine with the different language codes - English and Dutch.
Backend API component
There are two APIs available in the backend.
1. POST /questions - To generate questions using Step Functions to appear on the frontend. 2. POST /answers - To validate the answers submitted by the player.
Backend API component
Generate questions API
1. Generate questions API accept one argument. Which is the language code.
2. This /questions api has a proxy integration to a Lambda Function which will start an execution in Questions Generator State Machine synchronously using start_sync_execution SDK call.
3. This Questions Generator State Machine is in type - Express.
4. A sample input is as follows:
5. Here, "iterate" is a hard coded array to start a map execution within the state machine.
6. Within the state machine, first the map state is executed based on the "iterate" array from the input.
7. Inside the map state, first, it fetches max 50 records from DynamoDB. Here, DynamoDB scan is used. However, in order to fetch some random data, a random ExclusiveStartKey is in use with the help of the intrinsic function UUID(). Also, FilterExpression is used to filter the records applicable only for the given language code.
8. Next, the number of item counts returned from the previous step is checked.
9. If the count is more than 0, then in the next step, single random record is selected from them. This step is a Pass state with transformation using Parameters, which uses intrinsic functions - ArrayGetItem, MathRandom and ArrayLength.
10. Then, the selected record is being sent to the Generate Pre signed URL step. This is a Lambda function, which generates a pre-signed URL for the s3file path of the record. So, from the frontend the mp3 file can be played using this pre-signed url. Also there is a transformation of data within this Lambda function. The expiry of the pre-signed url is set to minimum because it is only required within the session of the game.
11. This is the last step within the map state which outputs the record in below format.
12. Once all the map steps are completed, there is a final aggregation Lambda function - Get Unique Results Lambda function. Since each map step is independent, there is a possibility of selecting the same record in more than one map state. This Lambda function simply removes such duplications.
13. And the response is returned to the frontend as the response to the /questions end point.
Validate Answers API
1. POST /answers API is responsible for validating the answers submitted in the frontend.
2. This API endpoint has a proxy Lambda function which accepts the payload in below format:
3. Within the Lambda function, it does a DynamoDB's batch_get_item SDK call to fetch words per ids and match with the word provided in the API.
4. Then it returns the response in below format:
5. Based on the correct flag, frontend will calculate the results.
Frontend
For the frontend, I have used a simple single page application built with Vue.js. I have very limited knowledge on frontend technologies. Because of that, I used Amazon Q Developer on VSCode to implement the frontend application.
Almost 95% of the frontend application was built by Amazon Q Developer. I have asked different questions and in most of the cases, Amazon Q was able to analyse the code and generate the code as per my requirements.
Here are some "versions" of the application that was implemented and fine tuned step by step using Amazon Q.
And below are some of the questions I got help from Amazon Q:
This is to add the timer functionality to auto-submit the form after the set time expires:
Question 1 I asked from Amazon Q
Answer 1 from Amazon Q
Another example where I needed to show the number of correct answers after verifying the answers:
Question:
Answer:
And this gave this nice popup with the number of correct answers:
Also, when all the answers are correct, I need to have a "confetti" effect on the screen. Amazon Q provided it with quite accurately.
Result is:
Lesson learnt / Feedback on AWS Services
Below are some of the lessons I learnt while I was working on this project, also, some feedback on some of the services I used here.
1. Bedrock does not always return JSON. Within the prompt I used, I stated - "Produce output only in a minified JSON array with the keys word and description". However, once in a while, Bedrock returns data in different formats. This could have been more accurate if I add the beginning of the response in the prompt, so Bedrock can continue from there. However, this will increase the request token count for each API call. So, to avoid additional cost and also, the error rate is acceptable (since this is anyway a background job), I kept this prompt as it is.
2. Amazon Polly's StartSpeechSynthesisTask doesn't support S3 path. We can provide the OutputS3BucketName where the generated audio will be stored. However, we cannot specify a path to save the object. Instead, I have used the OutputS3KeyPrefix parameter to provide the path with the language code so, the audio is saved in s3://bucket_name/language_code/file_name.mp3
However, one minor issue with that is, Polly always adds a dot (.) between the file name and the prefix. So, all the files generated in the sub path start with a dot.
3. Cost of Polly. Polly has Generative and Neural text-to-speech engines apart from Standard. However, the cost of them is quite high compared to Standard. Also, those are available only for a limited number of languages. https://aws.amazon.com/polly/pricing/
4. Selecting random items from the DynamoDB table is hard. There is no straightforward way to achieve this. That's why I had to use a ExclusiveStartKey and fetch maximum 50 items and select one random item.
5. I initially used direct integrations to start the Step Functions express workflow to generate questions directly from API Gateway. However, the VTL is complex to build specially to get the response in a specific format. So I stuck to the Lambda proxy option which was more simple.
Next steps
There can be many improvements to this game. Here are some of them:
1. ently there is no history of the games played. User login with a history and keep the progress can be a nice feature. For that we can use AWS Cognito along with Amplify UI library.
2. Different Levels - At the moment, there are no levels to choose. In future, a language and level can be options to start a game.
3. Add more languages - This can be easily done by extending the state machine and adding a new schedule job for each new language. These new languages must be supported by Polly.
4. Implement caching - Currently, there is no caching in place but will be handing in future for example using one pre-signed urls in many games.
Conclusion
Building this spelling game was an incredible learning experience in GenAI and showed me just how quickly Amazon Q Developer can support development. With my very little knowledge of frontend technologies, I was really surprised by how far you can go in such a short time.