Observability, monitoring, and layered guardrails for GenAI

Observability, monitoring, and layered guardrails for GenAI

Building out a full-stack solution for FMOps

Randy D
Amazon Employee
Published Jun 19, 2024
As more AWS customers are moving from idea to proof-of-concept to production deployments, the idea of "MLOps for GenAI", sometimes called FMOps or LLMOps, is gaining traction. It's a broad problem space, but some of the important elements are experiment tracking, monitoring, observability, model evaluation, and security guardrails. With today's launch of managed MLFlow in SageMaker, it's a good time to look at the options for covering these requirements.
The diagram below shows one possible implementation option.
I'll briefly describe some of the important choices in this picture.

Experiment tracking

We use MLFlow to capture experiments via langchain integration.

Prompt engineering UI

MLFlow also provides a prompt engineering UI.


MLFlow provides an evaluation capability that uses another LLM as a judge.
For cases where you have ground truth data available, we recommend using Bedrock model evaluation. Alternatively, you can use MLFlow data evaluation.

Observability and monitoring

We use the open-source OpenLLMetry framework to collect traces and forward them to Amazon X-Ray via the OpenTelemetry collector.
We also use Prometheus and Grafana for more general metric collection.

Security and guardrails

We use the Comprehend Moderation Chain to check for toxicity, prompt safety, and PII.
If you are using agents (allowing the LLM to invoke tools), I recommend that you look at this example of fine-grained access control using Amazon Verified Permissions.

Embedding analysis

This solution captures incoming queries in S3 for future analysis. See this blog for an example of using this information for drift detection and other analysis.

Ground Truth

If you want to use SageMaker Ground Truth to evaluate model output, the application stores the prompts and outputs (via a Firehose) in an S3 bucket.


It's important to cover the FMOps bases if you are moving to production with a GenAI workload. Hopefully this article gives you some ideas on how to solve for observability, trust and safety, and other concerns. I'd keep an eye on MLFlow's GenAI capabilities, as they are growing rapidly and SageMaker just launched a managed MLFlow capability.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.