
How I Taught an LLM to Play DOOM (And Why It Matters)
Learn how I built a bot to play DOOM powered by Amazon Bedrock
- A list of valid actions
- The expected format for the AI's response
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Follow these instructions carefully:
1. Examine the provided game screen image closely. Pay attention to:
- Your current health and ammo
- The presence and location of enemies
- Obstacles or items in the environment
- Any text or UI elements visible
- If you are facing a wall, consider turning around to avoid running into it
2. Based on your analysis, determine a sequence of 10 appropriate actions to take. Valid actions are:
- NO_OP (No operation, stay still)
- ATTACK
- MOVE_FORWARD
- MOVE_FORWARD ATTACK
- TURN_RIGHT
- TURN_RIGHT ATTACK
- TURN_RIGHT MOVE_FORWARD
- TURN_RIGHT MOVE_FORWARD ATTACK
- TURN_LEFT
- TURN_LEFT ATTACK
- TURN_LEFT MOVE_FORWARD
- TURN_LEFT MOVE_FORWARD ATTACK
3. Provide your response in the following JSON format:
"explanation": "A brief explanation of your overall strategy for this sequence of actions",
"actions": [
"ACTION_1",
"ACTION_2",
"ACTION_3",
"ACTION_4",
"ACTION_5",
"ACTION_6",
"ACTION_7",
"ACTION_8",
"ACTION_9",
"ACTION_10"
]
...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def call_claude(system_prompt, prompt, base64_string):
prompt_config = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"system": system_prompt,
"messages": [
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": base64_string,
},
},
{"type": "text", "text": prompt},
],
}
],
}
body = json.dumps(prompt_config)
modelId = "anthropic.claude-3-5-sonnet-20240620-v1:0"
accept = "application/json"
contentType = "application/json"
response = bedrock_runtime.invoke_model(
body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())
results = response_body.get("content")[0].get("text")
return results
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
while not done:
# Get the current game state and image
state = env.game.get_state()
img = np.transpose(state.screen_buffer, [1, 2, 0])
base64_string = rgb_to_base64(img)
# Call the LLM to get the next 10 actions
system_prompt = str(generate_system_prompt(previous_actions))
llm_result = call_claude(system_prompt, prompt, base64_string)
print(f"LLM Result: {llm_result}")
previous_actions = llm_result
current_actions = convert_llm_result_to_actions(llm_result)
print(f"New actions: {current_actions}")
# Execute the 10 actions, repeating each for frames_per_action
for action in current_actions:
for _ in range(frames_per_action):
if done:
break
state, reward, done, truncated, info = env.step(action)
env.render()
steps += 1
total_reward += reward
if done:
break
- Push the boundaries of AI applications, showing us new ways to use LLMs beyond text generation.
- Bridge the gap between gaming and AI, potentially leading to more immersive and dynamic gaming experiences.
- Encourage us to think creatively about applying powerful models to various domains, from entertainment to problem-solving in other industries.
- Highlight both the potential and current limitations of AI in real-time, interactive environments, guiding future research and development.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.