logo
Menu
Super Mario Bros: The LLM Levels - Generate levels with a prompt

Super Mario Bros: The LLM Levels - Generate levels with a prompt

Learn how you can generate your own Super Mario levels with just a prompt

Banjo Obayomi
Amazon Employee
Published Apr 30, 2024
I’ve had lots of fun seeing LLMs play games, but for my next experiment, I wanted to test how they fare on creating entertaining experiences. Researchers from the IT University of Copenhagen created MarioGPT to build Super Mario levels from a prompt.
However, their model was constrained as it was a fine tuned GPT2 model and could only understand basic prompts such as “many pipes, many enemies”.
Their model used a tokenizer which represented elements in a level such as a Goomba or a pipe. Using it we can craft a prompt to an LLM and tell it to generate a level using the symbols from the tokenizer, and then have the simulator they developed generate a playable level!!
Design a level with blocks arranged in a pyramid-like shape, with coins scattered around the base
Design a level with blocks arranged in a pyramid-like shape, with coins scattered around the base
Let’s explore how it works.

Crafting the Prompt

When working with LLMs, its ideal to give the model a persona and constraints. Since we are making Mario levels, I opted to let the model know it’s an esteemed level designer in Super Mario Maker.
As an esteemed level designer renowned for creating some of the top 100 levels in Super Mario Maker, you are tasked with crafting a playable section for the original Super Mario on NES. Your extensive experience and creativity are key to designing levels that are not only challenging but also immensely enjoyable.
Next, from the research we can see which are the valid tokens their model generated. With that we can provide instructions for our LLM
Use the following symbols to represent different game elements, ensuring each level is a masterpiece of design:
<symbols>
- = "Sky"
X = "Unbreakable Block"
E = "Enemy"
o = "Coin"
S = "Breakable Block"
? = "Question Block"
[] = "Pipe"
<> = "End of Pipe"
</symbols>
From here, we need to give guidelines and examples for the model to follow. For this we need the output returned as an array of characters that represent a level.
<example><input>Design a level with blocks arranged in a pyramid-like shape, with coins scattered around the base and Goombas guarding the top.</input><output>
['--------------------------------------------------', '--------------------------------------------------', '--------------------------------------------------', '--------------------------------------------------', '--------------------------------------------------', '--------------------------------------------------',
'----------------EEE-------------------------------',
'--------------ooooooo-----------------------------',
'------------ooo?S?Sooo----------------------------',
'-----------oooSSSSSoooo---------------------------',
'----------oooSSSXSSSoooo--------------------------',
'---------oooSSSXXXSSSoooo-------------------------',
'--------oooSSSXXEXXSSSoooo------------------------', 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]
</output>
</example>
And finally, I instruct the model about the output format and remind it of its goals.
Generate the level section as a 2D array, where each row is represented as a string of characters. The level section should be 14 rows tall and 50 columns wide. Only return the 2D array of characters.
Remember, your creations should challenge players but remain fair. Use your expertise to weave together obstacles and rewards, encouraging exploration and skillful play. Always ensure that Mario has a clear and navigable route to finish the level, and provide ample block tiles for Mario to walk on.
Once the system prompt is crafted, the LLM can then take a request from a user and use that to generate a level.

Creating the Level

By using Amazon Bedrock, I can test out different models with the provided prompts to get a text representation of a level. Due to the nature of how the simulator was setup each “row” of a level needs 50 characters exactly. Due to how the models work 50 isn’t guaranteed in each row, so I was able to use Amazon Q Developer to help generate helper function to trim or pad the text.
I also wanted to embed the prompt into the image, to know how the level was generated. I used this as an opportunity to test out the new /dev feature which can generate a plan for implementing a feature request, and then insert code.
I first asked
I would like to add the text of 'prompt' to the img generated in the generate function
Q provided a plan, but it was too complex as it wanted to change multiple files that weren't relevant to what I was using. I was able to follow up and provide more context:
Lets not change the classes, lets make this as light weight as possible I'd like a new function called add_prompt_to_image function, and then I can call it in app.py like this:
img = convert_level_to_png(cleaned_level, mario_lm.tokenizer)[0]
new_img = add_prompt_to_image(img,prompt)
By providing context and constraints the plan Q provided was great! Q selected the file that made the most sense for this utility function utils.py. It then was able to update the imports and create my new function. I still had to make small edits, but this feature saved me time, and i was able to do it right inside Visual Studio Code.
Amazon Q /dev feature
Now each level has a "watermark" of the prompt used to generate it.
Design a level with pillars at least 3 blocks high to jump on
Design a level with pillars at least 3 blocks high to jump on
With the updates, I was ready to test how well the models generated levels.

Model Behavior

I tested out different models, each had its own quirks. Claude Haiku's levels were mostly unplayable, while Sonnet produced decent levels most of the time. Opus was the most consistent in creating levels and even made some enjoyable ones with an abundance of Goombas.
Create a fun level, lets have lots of goombas. Make sure there is a path for Mario to take
Create a fun level, lets have lots of goombas. Make sure there is a path for Mario to take
Interestingly, some models struggled to follow the basic instruction of "pipes should follow this format," often offsetting the top from the rest of the pipe or making it upside down.
Offset Pipe
In contrast, the new Llama3 models exhibited less variability in their output. The 70B model, in particular, generated almost identical levels every time at a temperature of 0.5. Even increasing the temperature to 1 yielded the same result. This consistency might be attributed to the prompt structure not aligning with Llama 3's expectations.
The latest Cohere Command R Plus model was good at making levels, but Command R was not following instructions and generated overly long levels taking more than a minute to return text.
Initially, I only included one example in my prompt, which resulted in lower-quality output and many levels that were impossible to complete. However, by simply adding two more examples, I noticed a significant improvement in the quality of the generated levels. This observation underscores the importance of providing good data to enhance the performance of LLMs.

The LLM Levels

You can access a playable demo, where you can adjust settings such as the system prompt, model and temperate to create your own playable level.
As a big advocate for video games, I'm excited about the ability of generative AI to democratize the process of building experiences. With this approach, you don't need a research team to build an AI to create levels, all you need is a prompt and some creativity.
I'm excited to see what types of levels you can build, and I encourage you to share them with the community.
So, what are you waiting for? Give it a try, and let's see what incredible levels you can come up with!
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

5 Comments