How I Taught an LLM to Play DOOM (And Why It Matters)

"But can it run DOOM?" This classic meme has been a benchmark for computing power for decades. Today, we're not just running DOOM - we're getting AI to play it!

Image not found

Doom running on Washing Machine - https://www.reddit.com/r/itrunsdoom/comments/1budbid/doom_on_the_n

Recently, a groundbreaking paper introduced GameNGen, the first game engine powered entirely by a neural model. This impressive system can interactively simulate DOOM at over 20 frames per second, showcasing the potential of AI in game development and simulation.

While my project, DOOM-Bedrock, may not be as revolutionary, it draws inspiration from GameNGen and my previous experiments with Pokémon Red. I set out to see if we could get a Large Language Model (LLM) to play DOOM. Let's dive into how I built it, and learn why it matters.

Image not found

Amazon Bedrock Models Playing Doom

How it Works

DOOM-Bedrock combines AI, classic gaming, and cloud computing to create an AI-powered DOOM player. Let's break down the key components and explore how they interact to make this possible.

Prompt Engineering

The core of DOOM-Bedrock is built on effective prompt engineering. We create a detailed system prompt that defines the AI's role as a DOOM player and guides its decision-making process.

The prompt includes:

- Instructions on what to observe in the game state (health, ammo, enemies, obstacles)
- A list of valid actions
- The expected format for the AI's response

Here's an example of our system prompt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
    Follow these instructions carefully:

    1. Examine the provided game screen image closely. Pay attention to:
    - Your current health and ammo
    - The presence and location of enemies
    - Obstacles or items in the environment
    - Any text or UI elements visible
    - If you are facing a wall, consider turning around to avoid running into it

    2. Based on your analysis, determine a sequence of 10 appropriate actions to take. Valid actions are:
    - NO_OP (No operation, stay still)
    - ATTACK
    - MOVE_FORWARD
    - MOVE_FORWARD ATTACK
    - TURN_RIGHT
    - TURN_RIGHT ATTACK
    - TURN_RIGHT MOVE_FORWARD
    - TURN_RIGHT MOVE_FORWARD ATTACK
    - TURN_LEFT
    - TURN_LEFT ATTACK
    - TURN_LEFT MOVE_FORWARD
    - TURN_LEFT MOVE_FORWARD ATTACK

    3. Provide your response in the following JSON format:
    
        "explanation": "A brief explanation of your overall strategy for this sequence of actions",
        "actions": [
        "ACTION_1",
        "ACTION_2",
        "ACTION_3",
        "ACTION_4",
        "ACTION_5",
        "ACTION_6",
        "ACTION_7",
        "ACTION_8",
        "ACTION_9",
        "ACTION_10"
        ]
...

This prompt essentially serves as a configuration file for the LLM, defining its behavior without traditional programming. The next step involves using this prompt with a capable language model.

Leveraging Amazon Bedrock and Claude 3.5

With the prompt prepared, we use Amazon Bedrock to access Claude 3.5, a powerful multimodal language model that can understand text and images. For each game state, the process is as follows:

1. Capture the current game screen and convert it to a base64 string.

2. Send this image along with our prompt to Claude 3.5 via Amazon Bedrock.

3. Receive and parse the AI's response, which includes a strategy explanation and a sequence of 10 actions.

Here's the code for interacting with Claude via Amazon Bedrock:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def call_claude(system_prompt, prompt, base64_string):

    prompt_config = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 4096,
        "system": system_prompt,
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/png",
                            "data": base64_string,
                        },
                    },
                    {"type": "text", "text": prompt},
                ],
            }
        ],
    }

    body = json.dumps(prompt_config)

    modelId = "anthropic.claude-3-5-sonnet-20240620-v1:0"
    accept = "application/json"
    contentType = "application/json"

    response = bedrock_runtime.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())

    results = response_body.get("content")[0].get("text")
    return results

This step is analogous to an API call that returns the next set of actions based on the current game state. However, the response isn't directly compatible with our game engine, which leads us to the next step.

Translating LLM Results to Game Actions

Claude's response comes in a structured JSON format, which needs to be translated into actions the game can understand. We parse the JSON to extract the action sequence, then map these text-based actions to numerical action indices used by the game environment.

Here's how we perform this translation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
    while not done:
        # Get the current game state and image
        state = env.game.get_state()
        img = np.transpose(state.screen_buffer, [1, 2, 0])
        base64_string = rgb_to_base64(img)

        # Call the LLM to get the next 10 actions
        system_prompt = str(generate_system_prompt(previous_actions))
        llm_result = call_claude(system_prompt, prompt, base64_string)
        print(f"LLM Result: {llm_result}")
        previous_actions = llm_result

        current_actions = convert_llm_result_to_actions(llm_result)
        print(f"New actions: {current_actions}")

        # Execute the 10 actions, repeating each for frames_per_action
        for action in current_actions:
            for _ in range(frames_per_action):
                if done:
                    break

                state, reward, done, truncated, info = env.step(action)
                env.render()
                steps += 1
                total_reward += reward

            if done:
                break

This translation step completes the loop, allowing the AI's decisions to be executed in the game. By integrating these three components - prompt engineering, AI decision-making via Amazon Bedrock, and action translation - we've created a system that can play DOOM using large language models.

This approach demonstrates a novel application of LLMs in gaming, opening up new possibilities for AI-driven gameplay and game testing.

Image not found

How DOOM Bedrock Works

Challenges

While developing DOOM-Bedrock, I encountered technical hurdles that affected the LLM's gameplay performance.

The main challenge was balancing real-time gameplay with using API calls to Amazon Bedrock. To address this, I implemented a system where the LLM sends 10 commands at once, each executed for 30 frames.

This approach allows for "pseudo" real-time gameplay and spaces out API calls, but it results in delayed reactions, sometimes causing the AI to run into walls or make other suboptimal movements.

Image not found

Running into a Wall

These challenges highlight the current limitations of using LLMs in real-time gaming scenarios, but they also point to exciting areas for future improvement and research.

Conclusion

There's never been a better time to be a builder. LLMs are opening up new possibilities, allowing us to create and play in ways we never thought possible before. DOOM-Bedrock is just one example of how we can combine games with AI to create unique experiences.

Why does this matter? Because projects like this:

Push the boundaries of AI applications, showing us new ways to use LLMs beyond text generation.
Bridge the gap between gaming and AI, potentially leading to more immersive and dynamic gaming experiences.
Encourage us to think creatively about applying powerful models to various domains, from entertainment to problem-solving in other industries.
Highlight both the potential and current limitations of AI in real-time, interactive environments, guiding future research and development.

While there's certainly room for improvement, DOOM-Bedrock and similar projects showcase the expanding potential of LLMs in gaming and beyond. They challenge us to think outside the box and explore novel applications of these powerful tools.

While there's certainly room for improvement, projects like this showcase the potential of LLMs in gaming and beyond. They challenge us to think creatively about how we can apply these powerful models to various domains.

Want to join the exploration? You can run DOOM-Bedrock by following the instructions in the GitHub repository. Experiment, modify, and see what new insights you can uncover.

Happy gaming, and happy building!

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

How I Taught an LLM to Play DOOM (And Why It Matters)

Learn how I built a bot to play DOOM powered by Amazon Bedrock

How it Works

Prompt Engineering

Leveraging Amazon Bedrock and Claude 3.5

Translating LLM Results to Game Actions

Challenges

Conclusion

1 Comment