How We Built The Model Brawl League: A Chat Bot Arena for LLMs

Benchmarking Large Language Models (LLMs) has become increasingly popular in the AI community. My previous experiment using Street Fighter as a benchmark provided valuable insights into how models can compete in a controlled environment.

However, we faced limitations in controlling the game mechanics and relied on workarounds like pixel analysis to identify characters. To address these challenges and create a more tailored experience, we decided to re:invent the chat bot arena from the ground up. Thus, the Model Brawl League was born.

Image not found

Model Brawl League

The Model Brawl League is a 2.5D Fighting Game built using Unity. It allows players to pit two LLMs powered by Amazon Bedrock against each other in combat. Moves are executed in real-time, and players can observe each LLM's "thought process" behind move selection.

Now, let's dive into the technical details of how we brought this exciting concept to life.

How It Works

The Model Brawl League is built on Unity, leveraging the Universal Fighting Engine (UFE) to create a robust and flexible fighting game framework. Unity's powerful capabilities allowed us to create visually appealing environments, while UFE provided essential fighting game mechanics such as hit detection, combo systems, and character movement.

One of the key advantages of using Unity is its ability to compile our game to WebGL, allowing us to run the entire game directly in a web browser. By compiling to WebGL, we are able to control the game using JavaScript.

We are able to get the LLMs actions by sending the prompt using a Lambda Function that calls Amazon Bedrock to get a JSON response of the next moves, which are executed in real time.

Image not found

Model Brawl League Architecture

Game State to Prompt

One of the key challenges in creating the Model Brawl League was translating the game state into a format that LLMs could understand and respond to. Fortunately, the UFE provides a global configuration of game state information, which greatly simplified this process.

By leveraging UFE's built-in event system, we were able to access variable data such as current HP, character positions, and other relevant game state information. This data is then transformed into a natural language prompt that provides context and asks the LLM to make a decision about the next moves. Because we have the full state we didn't need to do any image processing.

What's particularly exciting about this process is that it didn't require much knowledge of C#. Instead, we utilized Amazon Q Developer, a powerful coding assistant integrated into our IDE. This AI-powered tool helped us write the necessary code to interact with UFE's global configuration, making the development process more efficient and accessible.

Here's an example of how we set up event listeners in UFE to capture game state changes:

Image not found

Amazon Q Developer Writing Code

Once the game state data is captured in the global configuration, we access this data using JavaScript and construct our prompt. This prompt is then sent to a Lambda function, which acts as an intermediary between our game and Amazon Bedrock.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
 prompt = `Your name is {name} and you are a fighter in a fighting combat game. You are allowed to do the following regular moves which generates energy:
    - Rush Forward: Get closer to your opponent very quickly,
    - Rush backward: Get away from your opponent very quickly - this is a defensive move,
    - Get closer: Move at a normal pace towards your opponent,
    ....

Your current life is {current_life}% and your opponent life is {oppenent_life}%.\n\nIf your life reaches 0 % then you die.

You have {current_energy} % of energy for special attacks. Your opponent has {opponent_energy} % of energy for special attacks. You generate energy when you hit your opponent. {energy_recommendation}

You are currently {distance}. Keep in mind that when you hit your opponent, he will be pushed backward and most likely be un-reachable, so make sure to get closer after a couple of hits.

Your fighting style is : {fighting_style}.

Your persona is : {persona}.

Using the following JSON structure to respond: 

{"explanation": "I selected the following moves because it will quickly disable my opponent and allow me to win this battle in a few moves.", "moves": ["Move Forward", "High Kick", "Medium Kick", "Move Forward"]}

Your answer must only contain the JSON list of at least 10 moves that you should do in order to win the game along with a very short explanation of why you chose those moves. The explanation should be less than two sentences. I need to parse your response as a JSON so don't add anything else to your answer than the JSON list.`

Amazon Bedrock Integration

With our game state converted into a suitable prompt, we leverage Amazon Bedrock to communicate with various LLMs.

This serverless approach allows us to efficiently manage API calls, implement any necessary pre-processing or post-processing of the prompts and responses, and maintain a smooth game flow even when dealing with varying response times from different models. Using the Converse API we are able to use the same code to process each request.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def generate_conversation(model_id, system_prompts, messages):
    """
    Sends messages to a model.
    Args:
        model_id (str): The model ID to use.
        system_prompts (JSON) : The system prompts for the model to use.
        messages (JSON) : The messages to send to the model.

    Returns:
        response (JSON): The conversation that the model generated.

    """

    print(f"Generating message with model {model_id}")

    # Inference parameters to use.
    temperature = 0.7

    # Base inference parameters to use.
    inference_config = {"temperature": temperature}

    # Send the message.
    response = bedrock_runtime.converse(
        modelId=model_id,
        messages=messages,
        system=system_prompts,
        inferenceConfig=inference_config,
    )

    # Log token usage.
    token_usage = response["usage"]
    print(f"Input tokens: {token_usage['inputTokens']}")
    print(f"Output tokens: {token_usage['outputTokens']}")
    print(f"Total tokens: {token_usage['totalTokens']}")
    print(f"Stop reason: {response['stopReason']}")

    text_response = response["output"]["message"]["content"][0]["text"]

    return text_response

Our Lambda function handles the following tasks:

Receives the game state prompt from the JavaScript front-end
Formats the prompt if necessary
Calls the appropriate LLM through Amazon Bedrock's API
Processes the LLM's response
Returns the processed response back to the game

This architecture enables us to easily switch between different models, facilitating matchups between LLMs while keeping the core game logic separate from the AI interaction logic.

Now that we have our AI responses, the next challenge was to translate these decisions into actual game moves.

JavaScript LLM Controller

To make the Model Brawl League accessible and easy to use, we implemented an LLM "controller" which allows moves to be executed directly in the browser using JavaScript. This approach eliminates the need for complex server-side processing and enables real-time gameplay with minimal latency.

Our JavaScript execution engine interprets the LLM's response, translates it into game commands, and applies those commands to the characters in the Unity game.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
async startGameLoop() {
    while (this.gameInProgress) {
        this.moves = [];
        await this.getNextMoves();
        if (!this.gameInProgress) {
            return;
        }

        if (this.moves.length > 0) {
            for (var i = 0; i < this.moves.length; i++) {
                this._currentMove = this.moves[i];
                let actions = this.getActions(this.moves[i]);

                switch (this._currentMove) {
                    case 'Rush Forward':
                    case 'Get closer':
                    case 'Jump Forward':
                        ++this.totalOffensiveMove;
                        break;

                    ....

                    case 'Fireball':
                        ++this.totalAttemptSpecialAttacks;
                        break;
                }

Final Round, Fight!

The heart of the Model Brawl League is its game loop, which continuously cycles through the process of capturing the game state, generating prompts, receiving LLM responses, and executing moves. This loop continues until a winner is determined.

We've also implemented a robust logging system that captures each step of the process, allowing for post-game analysis and providing valuable data on how effective the LLM was and the cost.

On average the smaller models do better, due to faster response time, but still collecting data to get more comprehensive results.

Image not found

Game Over

Closing Thoughts

The Model Brawl League represents an exciting new frontier in LLM benchmarking. By creating a controlled, purpose-built environment for AI combat, we've opened up new possibilities for understanding and improving an LLMs decision-making capabilities in dynamic, adversarial settings.

While the full version of the Model Brawl League offers high-fidelity graphics and advanced features, we understand that not everyone has access to Unity or the resources to set up a complete fighting game engine.

To that end, we've developed a mini, open-source version of the Model Brawl League using PyGame based on this tutorial form Coding With Russ. This simplified version maintains the core concepts and functionality of the full game, but in a much more digestible format.

Image not found

PyGame Model Brawl League

To get started with the PyGame Model Brawl League and see how all the pieces fit together, check out our open-source code repository.

We're excited to see how the community will use and expand upon this framework. Whether you're benchmarking the latest language models, exploring AI decision-making, or just curious about how LLMs perform in a fighting game context, we invite you to join us in the Model Brawl League arena!

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

How We Built The Model Brawl League: A Chat Bot Arena for LLMs

Learn how we built a video game to benchmark LLMs

How It Works

Game State to Prompt

Amazon Bedrock Integration

JavaScript LLM Controller

Final Round, Fight!

Closing Thoughts

Comments