AWS Logo
Menu
Browser Automation for the AI Era — Introducing nova‑act‑mcp

Browser Automation for the AI Era — Introducing nova‑act‑mcp

Enable AI assistants to control browsers with natural language via MCP. Eliminate testing scripts and accelerate your development workflow.

Published Apr 18, 2025
TL;DR nova‑act‑mcp bridges Amazon Nova Act's browser automation with the Model Context Protocol (MCP) - the USB-C for AI tools. Your AI assistant can now spin up a browser, interact with your local web apps, and report findings—no automation scripts required. Just plain English instructions. Perfect for rapid dev‑loop testing and beyond.

The Vision

Imagine asking your AI assistant:
"Open http://localhost:5173, click 'Generate Token', then tell me if the operation succeeded."
And having it actually do this—spawning a real browser, navigating your app, and reporting back with results. That's what nova-act-mcp enables today.
I built this because I believe we're at the beginning of a fundamental shift in how we interact with development tools. AI assistants shouldn't just write tests—they should be able to execute them for us.

Why This Matters for Developers

Web testing currently sits at two extremes:
  1. Manual clicking — High effort, breaks flow, catches issues too late
  2. Automated test suites — Upfront investment, maintenance overhead, fixed test paths
nova-act-mcp introduces a middle path: conversational testing that follows your natural workflow during development.
"Check if the sign-up form validates email addresses correctly."
No more context switching to manually verify every change. No boilerplate test code for quick validations.

Real-World Use Cases

🔄 Dev Loop Acceleration

Implement → Verify → Refine, all without leaving your conversation with the assistant.

🐞 Bug Investigation

"Go to this page, try these 3 steps, and show me what happens."

🏗️ Local Deployment Verification

"Check if my backend is responding on localhost:3000 and if the API returns correct data structure."

🔌 Integration Testing

"Try placing an order with product ID 12345 and verify the confirmation screen shows the right details."

🚨 Alert Triage

"Is the critical alert from Staging a false positive? Check if the service is actually responding."

Quick Start 🚀

bash# Clone & install
git clone https://github.com/madtank/nova-act-mcp.git
cd nova-act-mcp
uv sync
# Nova Act API key (grab from https://nova.amazon.com/act)
export NOVA_ACT_API_KEY="your-key-here"
# Run the MCP server
uv run nova_mcp.py

Configure Claude Desktop

jsonc"mcpServers": {
"nova-browser": {
"command": "uv",
"args": ["--directory", "/path/to/nova-act-mcp", "run", "nova_mcp.py"],
"transport": "stdio",
"env": { "NOVA_ACT_API_KEY": "your-key-here" }
}
}

How It Works Under the Hood

  1. MCP server receives JSON commands from your AI assistant
  2. Nova Act SDK creates a Chromium browser with the specified profile
  3. Agent thinking gets extracted and returned so you can see its reasoning
  4. Browser actions are performed based on natural language instructions
Each session maintains its own isolated profile directory for cookies and storage persistence.

Current Capabilities and Limitations

What Works Today

Browser Control: Start, navigate, end browsing sessions
UI Interaction: Click elements, fill forms, interact with UI
Action Transparency: See the agent's reasoning at each step
Cookie Persistence: Maintain login state between actions

What's Coming

In-page persistence: Maintain form state between execute calls
Screenshot support: Visual verification for UI feedback
Structured data extraction: Return specific page elements as structured data
File upload/download: Handle file system interactions

Note on Current Interaction Model

The current agent primarily operates by showing its internal reasoning and execution steps. While it doesn't always return structured data about what it found (like the generated token values), it reliably performs the actions requested. This makes it ideal for workflows where the execution of tasks matters more than programmatic access to the results.

Beyond Testing: The Future of Browser Automation

While nova-act-mcp excels at development-loop testing today, I see this technology evolving into something much more powerful:
  • Automated incident response — Verify and triage alerts before waking on-call engineers
  • Dynamic content management — Update content based on automated visual verification
  • Cross-system workflows — Chain together interactions across multiple services
  • Accessibility verification — Validate that UI changes maintain accessibility standards
This is just the beginning of AI agents as active participants in our development process, rather than just code-generating assistants.

Troubleshooting 🩺

  • API key invalid → regenerate at nova.amazon.com/act.
  • Agent can't hit localhost → ensure port is correct / check for HTTPS enforcement.
  • Wrong element clicked → be more specific in prompts (e.g., "click the blue 'Submit' button in the form").
  • Timeouts → Consider simpler instructions or enabling verbose logs with export NOVA_MCP_DEBUG=1.

Join the Exploration

This project is in its early stages, with plenty to discover and improve. PRs and issues welcome at github.com/madtank/nova-act-mcp.
Let's build a future where our AI tools don't just suggest code—they help us validate it too.

License & Credits

MIT License. Built on Amazon's Nova Act SDK and the Model Context Protocol standard.
Ship faster — let the AI press the buttons.
 

Comments