The Benefits of On-Device AI for Streaming: A Real-World Case Study
Learn how a streaming company successfully expanded into new markets by partnering with Locaal to implement cost-effective, AI-powered content localization for their streamers.
Tony Vu
Amazon Employee
Published Nov 19, 2024
This blog post was co-authored by Roy Shilkrot and Saul Lustgarten, co-founders at Locaal, an APN Partner.
In the past year, artificial intelligence (AI) has evolved from a differentiator to a must-have feature as consumers have become accustomed to chatbots and other useful products and features. As a result, companies now need to have an AI strategy. In this case study, we'll look at how a streaming company partnered with Locaal, an Amazon Partner Network (APN) partner, to enter a new market by using AI to localize their content. This partnership allowed the company to offer transcription and translation capabilities to streamers, enabling them to reach new audiences while reducing costs. By leveraging on-device AI, the streaming company expanded revenue opportunities and significantly reduced their AI expenses.
The streaming company wanted to enter a new market and offer transcription and translation so their streamers could reach the new market's audience. The latest advances in AI model architectures make AI ideal for performing transcription and translation. The problem with most existing solutions, however, is that being cloud-based, they have the following shortcomings:
- Latency - Data has to go to the cloud and back, which makes it subpar for real-time applications like streaming
- Privacy - Data is shared with third-party AI providers
- Generic - Models are trained with generic irrelevant data, so they may not be as accurate as models trained with domain specific data
- Expensive - Last but not least, they can be cost-prohibitive when translating and transcribing large volumes of data.
On-device AI solves these problems by running inference at the edge. This means that AI models are executed directly on the user's device, rather than sending data to the cloud for processing. This enables use cases like transcription and translation to happen in real time without latency. Since the data doesn't leave the user's device, it is 100% private. Using open source models, you can do custom-training to specialize the model to outperform generic ones (this is just not possible with cloud based or closed source models). And since all inference happens on the user's hardware, there is no cloud spending, so it leads to significant cost savings.
The problem is that making AI work on-device is a heavy lift. Companies need to identify the right models, compress and optimize them for on-device performance, and ensure that these models run effectively across different hardware platforms. Additionally, building the necessary infrastructure to deliver, secure, and monitor the AI models can be challenging, and many companies lack the expertise or appetite to take on this task.
To address these challenges, the streaming company partnered with Locaal, a leader in on-device AI and an Amazon partner. By leveraging AWS's extensive technology stack, Locaal successfully helped the streaming company transition their transcription and translation workloads on-device, ensuring high performance and ease of use.
Locaal identified the optimal models for supporting the specific languages needed by the streaming company. These models were fine-tuned using Amazon EC2 instances powered by GPUs for training, while Amazon S3 was used for storing the training data. The data preparation and preprocessing were handled with Amazon SageMaker through Jupyter notebooks. To generate synthetic data and conduct automatic evaluation, Amazon Bedrock was used. For baseline evaluations, AWS Translate and AWS Transcribe services were leveraged. Amazon Interactive Video Service (IVS), a fully managed live streaming service that empowers developers to build low-latency and real-time streaming applications, was used for low-latency streaming.
Once the model was complete, Locaal integrated it into their SDK. Locaal also built a plugin for OBS that leveraged the SDK to provide the transcription and translation natively within OBS so the streamers could access it seamlessly. To date, the OBS plugin has attracted around 40 thousand streamers across major platforms such as YouTube, Twitch and Kick, enhancing accessibility for multilingual audiences. Since the model is also available through Locaal’s native SDK, the company can easily add it to broadcasting platform software as well so their streamers don’t have to rely on OBS to perform the transcription or translation.
The results of bringing transcription and translation on-device were impressive.
Better quality than general models. Compared to baseline pretrained ASR models, the quality of transcription and translation significantly improved—Locaal's solution had 33% reduction in error, as measured by Word Error Rate (WER). This is thanks to Locaal’s specialization of the transcription and translation models so they’re optimized for the edge and specific task at hand.
Faster than cloud AI. The on-device solution also completely eliminated latency, allowing streamers to provide a smoother experience without the lag associated with sending data to the cloud for processing.
Guaranteed privacy. Privacy was another key advantage, as streamers' content was not shared with third-party cloud providers, thus ensuring that their data remained secure. This was an important differentiator for the streaming provider.
Significant cost reductions. In addition to improving quality and privacy, the streaming company saved millions per year by running their AI workloads on-device rather than relying on cloud services.
See the end result in the following video:
As we explored in this case study, on-device AI holds great potential for companies that want to deliver superior products and services while keeping costs low. Beyond transcription and translation, streaming companies could improve their user experience with on-device AI in many ways. For example, streaming companies can leverage on-device AI to ensure their safety and moderation standards are met, help their creators look better on screen with advanced filters, provide narration of streams to viewers with text to speech, and many other ways. With the latest advancements in GenAI, the only limitation is product and engineering’s imagination.
Beyond streaming, many industries can benefit from on-device AI, particularly in sectors like healthcare, finance, legal, and defense, where privacy is a significant concern. By keeping data on the user’s device, companies can mitigate the risks of data leakage and avoid using customers' information for model training. Furthermore, on-device AI is a powerful solution for mission-critical applications that require real-time processing or offline functionality, especially where connectivity is unreliable, making cloud-based AI unsuitable. While historically, on-device AI has been harder to implement than cloud-based AI, companies like Locaal are democratizing access to it by making it seamless to integrate.
If you’d like to explore how on-device AI could benefit your business, feel free to get in touch with Locaal for a consultation.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.