
Browser Automation for the AI Era — Introducing nova‑act‑mcp
Enable AI assistants to control browsers with natural language via MCP. Eliminate testing scripts and accelerate your development workflow.
Published Apr 18, 2025
TL;DRnova‑act‑mcp
bridges Amazon Nova Act's browser automation with the Model Context Protocol (MCP) - the USB-C for AI tools. Your AI assistant can now spin up a browser, interact with your local web apps, and report findings—no automation scripts required. Just plain English instructions. Perfect for rapid dev‑loop testing and beyond.
Imagine asking your AI assistant:
"Open http://localhost:5173, click 'Generate Token', then tell me if the operation succeeded."
And having it actually do this—spawning a real browser, navigating your app, and reporting back with results. That's what
nova-act-mcp
enables today.I built this because I believe we're at the beginning of a fundamental shift in how we interact with development tools. AI assistants shouldn't just write tests—they should be able to execute them for us.
Web testing currently sits at two extremes:
- Manual clicking — High effort, breaks flow, catches issues too late
- Automated test suites — Upfront investment, maintenance overhead, fixed test paths
nova-act-mcp
introduces a middle path: conversational testing that follows your natural workflow during development."Check if the sign-up form validates email addresses correctly."
No more context switching to manually verify every change. No boilerplate test code for quick validations.
Implement → Verify → Refine, all without leaving your conversation with the assistant.
"Go to this page, try these 3 steps, and show me what happens."
"Check if my backend is responding on localhost:3000 and if the API returns correct data structure."
"Try placing an order with product ID 12345 and verify the confirmation screen shows the right details."
"Is the critical alert from Staging a false positive? Check if the service is actually responding."
bash
# Clone & install
git
clone https://github.com/madtank/nova-act-mcp.gitcd
nova-act-mcpuv sync
# Nova Act API key (grab from https://nova.amazon.com/act)
export NOVA_ACT_API_KEY="your-key-here"
# Run the MCP server
uv run nova_mcp.py
jsonc"mcpServers": {
"nova-browser": {
"command": "uv",
"args": ["--directory", "/path/to/nova-act-mcp", "run", "nova_mcp.py"],
"transport": "stdio",
"env": { "NOVA_ACT_API_KEY": "your-key-here" }
}
"nova-browser": {
"command": "uv",
"args": ["--directory", "/path/to/nova-act-mcp", "run", "nova_mcp.py"],
"transport": "stdio",
"env": { "NOVA_ACT_API_KEY": "your-key-here" }
}
}
- MCP server receives JSON commands from your AI assistant
- Nova Act SDK creates a Chromium browser with the specified profile
- Agent thinking gets extracted and returned so you can see its reasoning
- Browser actions are performed based on natural language instructions
Each session maintains its own isolated profile directory for cookies and storage persistence.
✅ Browser Control: Start, navigate, end browsing sessions
✅ UI Interaction: Click elements, fill forms, interact with UI
✅ Action Transparency: See the agent's reasoning at each step
✅ Cookie Persistence: Maintain login state between actions
✅ UI Interaction: Click elements, fill forms, interact with UI
✅ Action Transparency: See the agent's reasoning at each step
✅ Cookie Persistence: Maintain login state between actions
⏳ In-page persistence: Maintain form state between execute calls
⏳ Screenshot support: Visual verification for UI feedback
⏳ Structured data extraction: Return specific page elements as structured data
⏳ File upload/download: Handle file system interactions
⏳ Screenshot support: Visual verification for UI feedback
⏳ Structured data extraction: Return specific page elements as structured data
⏳ File upload/download: Handle file system interactions
The current agent primarily operates by showing its internal reasoning and execution steps. While it doesn't always return structured data about what it found (like the generated token values), it reliably performs the actions requested. This makes it ideal for workflows where the execution of tasks matters more than programmatic access to the results.
While
nova-act-mcp
excels at development-loop testing today, I see this technology evolving into something much more powerful:- Automated incident response — Verify and triage alerts before waking on-call engineers
- Dynamic content management — Update content based on automated visual verification
- Cross-system workflows — Chain together interactions across multiple services
- Accessibility verification — Validate that UI changes maintain accessibility standards
This is just the beginning of AI agents as active participants in our development process, rather than just code-generating assistants.
- API key invalid → regenerate at nova.amazon.com/act.
- Agent can't hit localhost → ensure port is correct / check for HTTPS enforcement.
- Wrong element clicked → be more specific in prompts (e.g., "click the blue 'Submit' button in the form").
- Timeouts → Consider simpler instructions or enabling verbose logs with
export NOVA_MCP_DEBUG=1
.
This project is in its early stages, with plenty to discover and improve. PRs and issues welcome at github.com/madtank/nova-act-mcp.
Let's build a future where our AI tools don't just suggest code—they help us validate it too.
MIT License. Built on Amazon's Nova Act SDK and the Model Context Protocol standard.
Ship faster — let the AI press the buttons.