AWS Logo
Menu
MyAcrobot - Physics-Driven AI Olympics

MyAcrobot - Physics-Driven AI Olympics

"MyAcrobot" is a 2 dimensional physics-based AI Olympics game where players step into the role of coach Ed Burns and team up with AI gymnast Trent Dimas to compete in the high bar event.

Published Jan 14, 2025

Welcome, Coach Ed Burch!

Your mission is to guide High Bar AI gymnast Trent Dimas to Olympic glory, just like the duo did in '92. As Trent executes a complete rotation, your coaching will determine the perfect moment to let go. Can the duo secure gold, silver, or bronze on the leaderboard? Let’s find out!


Game Storyline: A Tribute to Team Collaboration

MyAcrobot is a physics-based simulation game inspired by the teamwork of U.S. Olympic gymnast Trent Dimas and Coach Ed Burch during their 1992 Olympic gold medal win. Before the event, Coach Ed Burch confidently predicted that "our high bar will stand out." His prediction proved accurate as Dimas scored an impressive 9.875, breaking the tie between gymnasts Grigory Misutin and Andreas Wecker, who both scored 9.837. This partnership between coach and athlete proved to be exactly what was needed to secure the gold medal.
Check out MyAcrobot at this link: https://myacrobot.weebette.ai/ It runs seamlessly on both mobile and desktop!

Two Genera: Simulated Environment and RL Model

1. Physics Driven Simulated Environments.
The front end, powered by Bootstrap 5, ensures seamless gameplay on both mobile and desktop, adapting to touch controls or mouse interactions based on the user's device. The gymnastics environment is built with Matter.js, delivering a physics-driven 2D grid that supports realistic, real-time interactions between the AI, the player, and the pendulum.
Players earn points by landing pendulum links in target zones after dismounting, aligning them during momentum buildup, and completing full rotations over the bar with Trent. Successful landings in the goal quadrant contribute to their score.
Customizable options—such as pendulum link properties, air friction, length, and width—enhance replayability and encourage experimentation. These mechanics foster collaboration between the AI and the player, enabling them to refine strategies, overcome challenges, and help Trent achieve full bar rotations, dismounts, and target landings.
Mobile Play

Gameplay Overview

  1. Core Gameplay
  • The game is set on a 2D grid with x and y coordinates, where Trent (the AI gymnast) is represented as a pendulum, with his hands attached to a fixed point—the bar—at the center of the grid.
  • Trent's goal is to perform a full rotation over the bar
  • Players goal is to tell Trent when to dismount to land in the target goal zone within time until landing countdown timer and give Trent the extra push to rotate over the bar and help align his torso and legs.
  • Players also have the option to disable the AI and take full control of Trent. By selecting "Switch Off AI" from the menu, the coacwh assumes complete control over Trent’s movements with the AI toggle switch

Scoring, Difficulty, and Customizing the Environment

Scoring System

Points are awarded for:
  • Landing at least one pendulum link in the target zone.
  • Earning alignment points for precise positioning.
  • Completing an over-the-bar motion (full rotation not required) for bonus points.

Personalization Options

  • Username: Choose a clever username to represent yourself or generate a random one using the open-source Chance.js library.
  • Difficulty Levels: Easy (1 goal), Medium (2 goals), Hard (3 goals), Expert (4 goals).
  • Environmental Customization: Players can tailor gameplay to their preferences and create unique challenges for themselves and the AI by adjusting:
    • Number of pendulum links
    • Air friction level
    • Pendulum length
    • Pendulum width
  • Congrats

Leaderboard

After successfully landing within the required quadrant goal, players can compete for a spot on the leaderboard.
Leaderboard
  1. Reinforcement learning AI
Reinforcement learning AI was chosen for this project to handle the continuous and dynamic nature of a gymnastics high bar environment. The agent learns in real-time from six observations of pendulum positions and velocities, allowing it to determine its precise location on the x-y grid and make decisions for completing a full rotation.
The game builds on Gymnasium's Acrobot environment while mirroring the Matter.js front-end simulation to deliver realistic, synchronized physics. Both environments operate on 2D grids, ensuring seamless integration and consistency. The agent's neural network is a Deep Q-Network (DQN), which processes state data through computational layers to determine the best action for the current state. It’s all about deciding the smartest move based on what’s happening at the moment!
Agent Brain
Reward System Overview: How the Agent Excels with Math
The reward system leverages precise mathematical computations to evaluate the gymnast's performance. A method calculates the gymnast's foot position (y-coordinate) using pendulum angles and link lengths. This calculation is critical for determining if the gymnast completes a full rotation over the bar and successfully lands in the target goal zones. The resulting y-coordinate serves as the basis for assigning rewards.
Rewards are structured to guide the agent toward mastering advanced mechanics:
  • High Reward: Achieved by flipping above the bar (y > 0.0).
  • Moderate Reward: Granted for nearing the bar (-1.5 < y ≤ 0.0).
  • Penalty: Issued for remaining below the bar (y ≤ -1.5).
  • Reward
The Agent’s Training Process: Dual-Network Setup
The agent's training process employs two neural networks: an online network and a target network. This dual-network architecture enhances stability by calculating temporal difference values (Q-values), which compare future rewards to current rewards. These Q-values are integral to the training process, feeding into the epsilon-greedy strategy to balance exploration and exploitation effectively.
Optimizer
Balancing Exploration and Exploitation: Epsilon-Greedy Strategy
An essential component of the agent's training process is the epsilon-greedy strategy, which balances exploration (trying new actions) and exploitation (choosing the best-known actions). To refine this balance over time, epsilon decay is implemented, gradually shifting the agent's behavior toward exploitation as it becomes more confident in its learned strategies.
The challenge lies in finding the perfect balance between the rate of decay and the level of risk-taking. This balance ensures the agent continues to explore enough to discover optimal strategies while leveraging its existing knowledge to maximize performance.
Eplison
The Adam optimizer, used in PyTorch, efficiently adjusts weights and biases during training. Its implementation in PyTorch is particularly effective compared to other AI frameworks, ensuring training stability and improving model performance.
This screenshot showcases the agent training in the environment. The current state, represented as a 6-dimensional observation vector of pendulum points and links, reflects a reward of 191. This reward indicates the agent has successfully demonstrated its ability to complete the rotation.
Agent Training
Amazon EC2 Overview
Purpose: Hosting the website, backend, Docker containers, AI computations, SQLite database, and AI training.
Contributions:
  • Scalable virtual machine for backend operations.
  • Hosted both front-end and back-end with WebSocket support for real-time updates.
  • Facilitated reinforcement learning model training and Docker hosting.
  • Easy setup with SSH, SSL certificates, and Python integration for security.
  • Enabled cross-platform access for desktop and mobile players.
  • Reliable, high-availability hosting for website and backend operations.
Amazon Elastic Container Registry (ECR)
Hosted Docker repositories for version control and release management.
Amazon S3 - Cloud Object Storage
Stored trained reinforcement learning models for version control and selecting the best model during training.
Amazon Q Developer
Purpose: My Copilot on this journey
Contributions of Amazon Q
  • AI Optimization: Fine-tuned AI models under various environmental conditions for realistic and engaging gameplay.
  • Rapid Prototyping: Streamlined development and reduced time required to optimize AI decision-making and adaptability.
  • Code Assistance: Provided creative coding solutions, including alternatives for complex tasks, and suggested unique features like confetti animations for wins.
  • Efficient Debugging: Helped navigate large codebases (e.g., 1,000+ line JS files), simplifying debugging and customization for mobile-friendly, responsive designs with custom controls.
  • Enhanced Workflow:
    • Supported front-end/back-end integration and WebSocket setup for fast state exchanges.
    • Enabled quick fixes for tasks like storage upgrades, port management, and cross-browser compatibility.
  • User-Friendly Tools:
    • Allowed intuitive interaction through features like code highlighting in VS Code and keyboard shortcuts for efficient communication.
    • Worked alongside Jupyter Notebook for better model training integration.
Amazon Q also assisted in the development and creativity of the front-end UI design and back-end Python. At one point, I thought I needed to convert from PyTorch to TensorFlow and use TensorFlow.js, but Amazon Q helped me set up the optimizer correctly.
I loved how Amazon Q offered alternative ways to solve coding problems—it’s true that there’s more than one way to write code. It felt like art, and Amazon Q had the style. It even suggested fun features like adding confetti animations when winning.
It helped me traverse over 1,000+ line JavaScript files to locate necessary changes. Building a custom simulated environment in Matter.js was challenging, but Amazon Q provided valuable assistance. Over two months, it helped me write nearly 3,000 lines of code to create a responsive, mobile-friendly app with touch and keyboard controls, custom screen sizes, and a leaderboard.
Even though Amazon Q’s responses are non-deterministic like any LLM, it provided consistent outputs for simple tasks such as setting up Python environments, installing requirements, running SSH commands, and managing Docker containers.
Amazon Q loves to talk code, and its ability to highlight and ask questions made it easy to direct conversations and keep it on track. I even set up a keyboard shortcut (Ctrl + Shift + Q) for quick access in VS Code, which made coding faster and more efficient.
Amazon Q also integrated with Jupyter Notebook, though there’s room for improvement, as it couldn’t read the entire file structure. For this single-page application, Amazon Q provided solutions to make it work on mobile and ensured compatibility across different browsers.
Development Journey of MyAcrobot
1. Getting Started: The Inspiration
I began exploring AI two years ago, and six months ago, I delved into reinforcement learning (RL), captivated by its mathematical depth and potential for innovation. After initial setbacks, I succeeded with a cart-pole RL model, which inspired me to build the Acrobot—a model designed to perform a 360-degree rotation against the forces of physics. Using Amazon Q Developer, Jupyter notebooks, and AWS services, I tackled challenges like building an AI, building a frontend, debugging, version control, and ai model management. Inspired by my daughter’s love for gymnastics, I transformed the Acrobot into a dynamic game, replicating the physics and creativity of a gymnast’s high bar routine.
Challenges
Developing a dynamic, physics-based RL model brought significant hurdles and integrating with the frontend and hosting on the web
  • Implementing a custom reward system, replay buffer, and fine-tuning a Deep Q-Network (DQN).
  • Understanding advanced concepts like geometry, non-linear equations, and reward systems.
  • Resolving the lack of clarity in Gymnasium documentation through extensive problem-solving.
  • Synchronizing the front-end with AI using WebSocket connections for real-time updates.
  • Ensuring cross-platform compatibility with responsive design for desktop and mobile devices.
  • Balancing AI independence with pauses for player coaching feedback.
  • Making a game work across multiple browsers and devices.
  • Knowing when to disable touch controls for mobile devices during screen overlays
Lessons Learned
This project reinforced the value of persistence, adaptability, and a growth mindset. Overcoming challenges often involved trial and error, leading to breakthroughs. Gaining deeper insights into reinforcement learning’s mathematical foundations was both rewarding and essential to building a capable AI not to mention math.
Closing
The journey was both technically and personally fulfilling. Amazon Q Developer simplified complex tasks, streamlined model adjustments, and added creative touches like confetti animations. AWS provided a robust infrastructure for hosting, scaling, and versioning, enabling me to turn an idea into a functional, public-facing application. Together, Amazon Q’s problem-solving capabilities and AWS’s scalability transformed MyAcrobot into a rewarding project of innovation and growth.
 

Comments