AWS Logo
Menu
Voice-Controlled Humanoid Robots Using Amazon Nova Sonic and AWS IoT

Voice-Controlled Humanoid Robots Using Amazon Nova Sonic and AWS IoT

This project uses Amazon Nova Sonic and AWS IoT for real-time, hands-free voice control of humanoid robots. By integrating AI speech-to-speech streaming with tool calling, developers can create intuitive and responsive robotic systems. The setup includes AWS IoT-enabled robots and AWS cloud infrastructure for robust and scalable voice command execution.

Published Apr 18, 2025
Last Modified May 14, 2025

Seamless Speech-to-Speech Control for Humanoid Robots with Amazon Nova Sonic and AWS IoT

Introduction

Previously, hands-free voice control was complicated, making it difficult to determine the break points for extracting voice commands or user intentions. In the era of large language models (LLMs), we can handle text input commands using tools or function calling. However, enabling voice commands requires developers to use multiple models to convert speech to text, process it through the LLM, and then convert text back to speech. This process makes streaming input challenging, leading to significant delays and errors due to incorrect stopping points.
The ideal solution is an LLM that supports speech-to-speech streaming with integrated tool use, eliminating the need for developers to manage voice commands. Amazon Nova Sonic is the model we've been searching for!

Demo

Overview

Amazon Nova Robotic architecture
Amazon Nova Robotic Architecture
The system comprises two main components: AWS IoT-enabled robots and the AWS cloud with real-time AI speech-to-speech streaming and tool calling.
Amazon Nova Sonic Tool Use
Amazon Nova Sonic Tool Use
In this blog post, I will focus on the AWS cloud. As for the AWS IoT-enabled robots, they are simply pub/sub Python services that subscribe to an AWS IoT topic. When a new message arrives, it forwards the trigger API call with a buffer queue.
For the AWS cloud, the setup is built using AWS CDK. It includes the AWS IoT Thing construct (ThingWithCert L3 construct) and the aws-apprunner-alpha Service. The client certification and robot client code must be manually deployed to each Raspberry Pi.
If the keep_alive_interval_sec parameter isn't explicitly set in the sample code, the client might appear to hang up when, in reality, it’s simply taking a long time to reconnect due to the default setting.

How to Use Amazon Nova Sonic to Control a robot?

The web application is built using Express.js and Node.js, incorporating WebSocket for real-time voice input and speech capabilities within the web browser. We have adapted the sample code and updated the prompt as follows:
With the following tools scheme.
Code to start the streaming session.
The key is to set the "toolChoice" to "any," ensuring that at least one tool is invoked each time. While the model decides which tool to call, there will always be a tool utilized. If you use the default "Auto" setting, it can be challenging to select a tool for an action.

Conclusion

The integration of Amazon Nova Sonic with AWS IoT for seamless speech-to-speech control of humanoid robots represents a significant advancement in hands-free voice command technology. By leveraging real-time AI speech-to-speech streaming and tool calling, developers can now create more intuitive and responsive robotic systems. The combination of AWS IoT-enabled robots and the AWS cloud infrastructure ensures robust and scalable solutions, making it easier to deploy and manage these systems.
This approach not only simplifies the process of voice command extraction but also enhances the accuracy and efficiency of robotic actions. As we continue to explore the capabilities of large language models and real-time AI, the potential for innovative applications in robotics and beyond is immense. Amazon Nova Sonic is a promising step towards a future where voice commands can seamlessly control complex systems, paving the way for more advanced and user-friendly technologies.

Contributors

AWS Educate Student Ambassador
AWS Educate Student Ambassador

 

Comments