How I Taught an LLM to Play DOOM (And Why It Matters)
Learn how I built a bot to play DOOM powered by Amazon Bedrock
Banjo Obayomi
Amazon Employee
Published Sep 3, 2024
"But can it run DOOM?" This classic meme has been a benchmark for computing power for decades. Today, we're not just running DOOM - we're getting AI to play it!
Recently, a groundbreaking paper introduced GameNGen, the first game engine powered entirely by a neural model. This impressive system can interactively simulate DOOM at over 20 frames per second, showcasing the potential of AI in game development and simulation.
While my project, DOOM-Bedrock, may not be as revolutionary, it draws inspiration from GameNGen and my previous experiments with Pokémon Red. I set out to see if we could get a Large Language Model (LLM) to play DOOM. Let's dive into how I built it, and learn why it matters.
DOOM-Bedrock combines AI, classic gaming, and cloud computing to create an AI-powered DOOM player. Let's break down the key components and explore how they interact to make this possible.
The core of DOOM-Bedrock is built on effective prompt engineering. We create a detailed system prompt that defines the AI's role as a DOOM player and guides its decision-making process.
The prompt includes:
- Instructions on what to observe in the game state (health, ammo, enemies, obstacles)
- A list of valid actions
- The expected format for the AI's response
- A list of valid actions
- The expected format for the AI's response
Here's an example of our system prompt:
This prompt essentially serves as a configuration file for the LLM, defining its behavior without traditional programming. The next step involves using this prompt with a capable language model.
With the prompt prepared, we use Amazon Bedrock to access Claude 3.5, a powerful multimodal language model that can understand text and images. For each game state, the process is as follows:
1. Capture the current game screen and convert it to a base64 string.
2. Send this image along with our prompt to Claude 3.5 via Amazon Bedrock.
3. Receive and parse the AI's response, which includes a strategy explanation and a sequence of 10 actions.
Here's the code for interacting with Claude via Amazon Bedrock:
This step is analogous to an API call that returns the next set of actions based on the current game state. However, the response isn't directly compatible with our game engine, which leads us to the next step.
Claude's response comes in a structured JSON format, which needs to be translated into actions the game can understand. We parse the JSON to extract the action sequence, then map these text-based actions to numerical action indices used by the game environment.
Here's how we perform this translation:
This translation step completes the loop, allowing the AI's decisions to be executed in the game. By integrating these three components - prompt engineering, AI decision-making via Amazon Bedrock, and action translation - we've created a system that can play DOOM using large language models.
This approach demonstrates a novel application of LLMs in gaming, opening up new possibilities for AI-driven gameplay and game testing.
While developing DOOM-Bedrock, I encountered technical hurdles that affected the LLM's gameplay performance.
The main challenge was balancing real-time gameplay with using API calls to Amazon Bedrock. To address this, I implemented a system where the LLM sends 10 commands at once, each executed for 30 frames.
This approach allows for "pseudo" real-time gameplay and spaces out API calls, but it results in delayed reactions, sometimes causing the AI to run into walls or make other suboptimal movements.
These challenges highlight the current limitations of using LLMs in real-time gaming scenarios, but they also point to exciting areas for future improvement and research.
There's never been a better time to be a builder. LLMs are opening up new possibilities, allowing us to create and play in ways we never thought possible before. DOOM-Bedrock is just one example of how we can combine games with AI to create unique experiences.
Why does this matter? Because projects like this:
- Push the boundaries of AI applications, showing us new ways to use LLMs beyond text generation.
- Bridge the gap between gaming and AI, potentially leading to more immersive and dynamic gaming experiences.
- Encourage us to think creatively about applying powerful models to various domains, from entertainment to problem-solving in other industries.
- Highlight both the potential and current limitations of AI in real-time, interactive environments, guiding future research and development.
While there's certainly room for improvement, DOOM-Bedrock and similar projects showcase the expanding potential of LLMs in gaming and beyond. They challenge us to think outside the box and explore novel applications of these powerful tools.
While there's certainly room for improvement, projects like this showcase the potential of LLMs in gaming and beyond. They challenge us to think creatively about how we can apply these powerful models to various domains.
Want to join the exploration? You can run DOOM-Bedrock by following the instructions in the GitHub repository. Experiment, modify, and see what new insights you can uncover.
Happy gaming, and happy building!
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.