
Leaving no language behind with Amazon SageMaker Serverless Inference 🌍💬
Host a high-quality translation model at scale using Amazon SageMaker Serverless Inference.
💬 “Language was just difference. A thousand different ways of seeing, of moving through the world. No; a thousand worlds within one. And translation – a necessary endeavour, however futile, to move between them.” ― R. F. Kuang, Babel, or the Necessity of Violence: An Arcane History of the Oxford Translators' Revolution
translation
model from 🤗 Hub with Amazon SageMaker using Serverless Inference (SI).200
languages, from Acehnese to Zulu.
7164
languages in use today. However, this number is rapidly declining: 40%
of all living languages are endangered, while the top 25 by number of speakers accounts for more than half (!) of the world population.
prod
environment then, by all means, go for it; otherwise, you'll find that alternative deployment options like Real-Time Inference (RTI) are probably a better fit. If you're looking for guidance, the diagram below from the Model Hosting FAQs is a good place to start.
instance_type
and instance count
, while for SI we work with attributes like memory size or the number of concurrent requests the model should be able to handle. get_execution_role
(we'll need it in a second to create the model endpoint).HuggingFace
estimator class.🚩 Memory size for SI endpoints ranges between1024MB/1G
and6144/6GB
in1GB
increments. SageMaker will automatically assign compute resources aligned to the memory requirements.
.deploy
ment
🎯 The full list of supported languages and their codes is available in the special tokens map and in the model card metadata (language_details
).
💡 Pro Tip: you can turn the Python snippet above into a locustfile and call it a stress test. If you didn't understand a bit of what I just said, then check the AWS ML Blog post Best practices for load testing Amazon SageMaker real-time inference endpoints which, despite the name, also works with SI endpoints.
6GB
) in the SI configuration without so much as a "how do you do". Do we really need all this RAM? Is this the optimal choice? How can we tell?4-6GB
range to save time.1 hour
⌛)result_save_path
.5GB
offers essentially the same level of performance (average latency +18 ms
) at a fraction of the price (-20%
when compared to 6GB
).
- (NLLB Team et al., 2022) No Language Left Behind: Scaling Human-Centered Machine Translation
- Amazon SageMaker Examples - includes a section on Serverless Inference
- SageMaker Serverless Inference Toolkit - tool to benchmark SageMaker serverless endpoint configurations and help find the most optimal one
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.