Why would I connect my robots to the cloud?
This is a crucial question when deciding whether to use the cloud at all for robotics development, let alone how much to use it. This post is all about the why, rather than the how.
- Scaling Up: scaling software complexity, fleet size, or number of locations is difficult to do without a central system, which is a great fit for the cloud.
- High Power Computing: Machine Learning, GenAI, Simulation - all have high demands for computational power that are hard to meet with personal computers. The cloud offers powerful servers on demand for these resource-intensive tasks.
- Offloading Computation: more complex robot behaviours can be achieved on simple robots by offloading the complex parts to other servers, such as those in the cloud. You can even have the full robotics stack in the cloud, with just a thin client on every robot to read sensors and move motors.
- New Skills: delving into the world of cloud computing means developing a whole new skillset, which is an investment of time, effort, and money.
- Costs: new users of cloud technologies find it difficult to predict cloud computing costs, which is daunting in itself. However, correctly used, moving workloads to the cloud can reduce costs by only paying for resources while they're in use.
- Integration: integrating cloud technologies with your robotics stack is work. It often means taking a longer route to get to the same outcome during development, and the more mature a robotics stack, the more difficult it is to integrate cloud technologies.
- Connectivity: the availability of the cloud is a key consideration when deciding how much to use the cloud. Robots that cannot consistently access the internet should be less reliant on the cloud. Even if the internet is available, sending messages to the cloud is slightly slower than communicating locally, so the round trip time must be justified.
- Security: sending messages between robot and cloud means exposing those messages to the internet. There is inherently more risk with this process, but can still be secure if security is factored in to the design.
- Logging and Data Storage: the ability to stream, record, and store data and logs from a robot is vital for debugging and monitoring.
- Without the cloud, this means increased storage capacity per robot and a server cluster that must be expanded with the number of robots and number of users that need to access the data.
- With the cloud, this process is much simpler, with services such as Amazon S3 and Amazon CloudWatch offering to store and provide access to users with fine-grained permissions in a way that will scale with the number of robots automatically.
- One resource is this video, showing how to store and access logs in CloudWatch from ROS2 robots.
- Monitoring & Co-ordination: monitoring and co-ordinating gets more and more complicated as the number of robots grows.
- With the cloud, enabling auto-scaling to more servers or more powerful servers is made simple (see Amazon EC2), and it's possible to use serverless technologies to make parallel execution easier to manage (see AWS Lambda and AWS Step Functions).
- Without the cloud, this is a much more difficult engineering problem, as you will need to design solutions that can be scaled up, then monitor server use to determine when to scale to more servers.
- Some resources include running simulations on EC2, co-ordinating robot fleets with Lambda, and ordering smoothies with Step Functions.
- Deployment and OTA Updates: setting up new robots to become part of the fleet and updating existing robots Over-The-Air (OTA) is a necessary part of scaling up. A manual update process becomes increasingly impractical as the number of robots grows.
- With the cloud, services such as AWS IoT Greengrass can make this setup and update process secure and automatic, with extra features available such as rolling back failed updates and storing per-robot configuration centrally.
- Without the cloud, you will need to build OTA updates and initial deployment process from scratch, making them as lean as possible so that adding new robots to the fleet is simple.
- Some resources include Greengrass Concepts and Components and deploying Docker Compose in Greengrass.
- Dashboards: Once data from the robots is in the cloud, it becomes simple to manage and gain insights from that data.
- With the cloud, services such as AWS IoT SiteWise allow users to build dashboards from their industrial data, allowing a view at-a-glance into fleet health and operational capacity.
- Without the cloud, these dashboards must be built manually along with the controls to access data from the entire fleet in one location.
- Some resources include building a fleet overview with IoT Fleet Indexing, and this video showing how to set up SiteWise to show battery measurements from a fleet of robots.
- For machine learning, the cloud can store huge data sets and use powerful machines to crunch through the data to train ML models.
- For GenAI, the cloud can refine an existing Foundational Model (FM), and is capable of performing inference using an FM. FMs are very large networks, so performing inference locally is difficult. GenAI inference commonly requires multiple graphics cards working together to compute in reasonable time.
- For simulation, the cloud can offer virtual machines with a large memory capacity and high-power processors for performing simulations. Depending on the simulation complexity, it can be performed on a personal computer with high enough system specs, but the cloud makes it easier to get hold of the right hardware for running the simulation at an effective speed.
- Upgrade the robot with more powerful hardware
- Send the required data to more powerful hardware, then get the result back
- You are unable to upgrade the hardware. You have the best hardware available, or don't have power for more compute, or wouldn't be able to disperse the extra heat.
- You have to stick to a budget, either for the current development, or a per-robot budget to keep future designs affordable.
- You don't want to have to support more complex hardware or multiple compute devices.