Best Practices When Working With Events, Schema Registry, and Amazon EventBridge

Overview

The schema registry is a critical component of event-driven architectures, which are are increasingly adopted to create scalable systems, enhance flexibility and decouple components for developer agility. In this post, you will learn about the schema registry, the important role it plays in an event-driven world, and best practices to guide your implementation. Let's dive in!

What Is a Schema Registry?

In a request-response architecture, you may be familiar with the concept of Application Programming Interfaces (APIs). APIs form a contract of communication between services. In an event-driven architecture, this contract is facilitated by the schema registry. The schema registry is a central collection of schemas, including its version history. Schemas describe the structure of events including fields, values, and formats. To illustrate, we will use an e-commerce store use case. For example, an OrderCreated event may include the status of the order as a string, a list of product-ids as an array.

A diagram of the EventBridge Schema Registry enforcing a contract between producers and consumers

Why Is It Important?

At its core, event-driven architectures consist of producer services generating events, and consumer services reacting to those events. Producers and consumers are decoupled by a service such as Amazon EventBridge, a serverless event bus. By decoupling, developers can move fast: they can build, deploy and scale applications independently. Developers can subscribe to events they are interested in, emit events for extensibility, and avoid writing integration code.

Image not found

However, with evolving business requirements, producers and consumers can be out of sync leading to reliability challenges. For example, the OrderCreated event can introduce a new field such as the total cost of the order. In a growing business and increasingly complex application, it can be challenging for teams to understand what events are available and what they mean. The schema registry plays an important role in reliability, allowing producers and consumers enforce a contract. And, event discovery, helping teams understand events and the applications they can build on top of them.

Best Practices

Understand Event Discovery and Design Principles

An event is a data representation of something that happened. While this sounds simple, event discovery and design is a nuanced and complex topic that extends well beyond this post. While we cover some best-practices in this post, implementation will vary highly depending on your needs, so ensure that you take the time to do your research. Review concepts such as event storming and Domain-Driven Design (DDD).

Develop a Naming Convention for Event Types

Events in Amazon EventBridge have a standard structure, including the source and detail-type. This provides important context including where the event came from, and what the event is about. This context will be used by consumers, collaboration across teams, monitoring and observability tools, and tracing across services. As such, developing names that are easy to understand will be beneficial. Names are typically difficult to change, so it’s worthwhile thinking about setting a standard up-front.

For example, consider naming conventions to distinguish event types:

Notification and Delta Events: If the event is communicating only the detail relevant to state changes, consider a <Noun><PastTenseVerb> format. For example, OrderCreated.
Fact Events: If the event is communicating full state changes (also known as event-carried state transfer), consider using an alternative <Event>Fact format. For example, OrderCreatedFact.
Domain Events: In addition to events used for communicating across services (outside events), there are events that are only used within a service (inside events). Here, consider a namespace prefix. For example, Order@OrderCreated. Note that events can be produced by different services as the system evolves, and the source field in the event data can be used.

Catalog and Document Events for Shared Understanding

In addition to a schema registry, consider documentation for business-level definitions to develop shared understanding across teams. Consider tools such as EventCatalog, AsyncAPI and Contextive. A key concept is bounded context. The same term can have different meanings within a service (inside events) vs. communication across services (outside events). These documentation tools can assist with understanding the producers and consumers relevant to an event, including its team members, assisting in discovery across your organization.

Incorporate a Standard Event Metadata for Context Awareness

While there is freedom to publish any data into event detail, consider augmenting event data with additional metadata to provide additional context for consumers. This is a pattern that is increasingly adopted by the community.

1
2
3
4
5
6
7
8
9
10
11
12
13
{
    "version": "1.0",
    "id": "0d079340-135a-c8c6-95c2-41fb8f496c53",
    "detail-type": "OrderCreated",
    "source": "com.orders",
    "account": "123451235123",
    "time": "2023-09-01T18:41:53Z",
    "region": "ap-southeast-2",
    "detail": {
		"metadata": { ... }, /* Metadata for additional context */
		"data": { ... } /* Content about the event */
	}
}

For example, consider the following metadata:

event-id: (or idempotency-key) With a unique identifier (such as a UUID), consumers can identify duplicates. This is important as EventBridge guarantees at-least once delivery and consumers will need to be idempotent.
sequence-id: With a sequence identifier in a new field (or as part of the event-id), consumers can handle ordering. If an event is out of order, the consumer can make a decision on what to do such as waiting or saving the event for later use.
tenant-id: For a SaaS application, the tenant context will enable consumer services to handle events relevant to the tenant boundary.
data-classification: For security, a data classification tag can identify whether the event contains sensitive data such as Personally Identifiable Information (PII). This enables security policies to be implemented for certain classifications.

Implement a Tool for Consistency and Abstraction When Publishing and Consuming Events

For producers, implement a consistent way to enforce the standard metadata outlined above. To achieve this, develop a custom utility such as PublishEvent() to initialize events. Distribute across teams using a package manager such as AWS CodeArtifact. The utility can additionally provide abstraction for implementation details, enforce security on sensitive data and perform validation. Producers are isolated from details such as the EventBridge PutEvents() API with AWS SDK, and only need to concern themselves with the event they are publishing.

Image not found

For consumers, consider conforming payloads to a standard such as CloudEvents. To achieve this, use EventBridge input transformer to transform data prior to consumption. See CloudEvents input transformer for an example.

Use Code Bindings and Developer Tools for Agility

EventBridge schema registry generates code bindings to accelerate development. Code bindings can be accessed via the EventBridge console, Schema Registry API and in popular developer IDEs with AWS Toolkit. See video to learn more.

AWS Serverless Application Model (SAM) can be used to easily build serverless producer and consumers, see example on Github. To adopt best-practices, consider tools such as Lambda Powertools, cfn-lint and serverless-rules.

Implement Validation Across Your Producers and Consumers

EventBridge schemas support open-standards such as OpenAPI and JSONSchema. Producers and consumers can cache and use the schema to validate the event structure including data types. If an event does not conform to the schema, an exception can be thrown. Depending on the use case and frequency of schema changes, validation can be done at implementation time, build time in the CI/CD pipeline prior to deployment, or at runtime (see example). Teams can use observability tools such as AWS X-Ray and Amazon CloudWatch to monitor any issues.

Handle Schema Evolution With Versioning and Schema Discovery

Businesses are rarely static and events change (schema evolution). With schema discovery enabled, teams do not have to maintain their own schemas. This increases developer productivity and reduces the risk of errors. Schemas for all AWS sources are automatically updated under AWS event schema registry. While SaaS partner and custom schemas are automatically generated to the Discovered schema registry.

Image not found

With schema versions, producers and consumers can determine how to handle compatibility at their own choice. For example, a new version can be backwards compatible and consumers can process as is. However, for major and breaking changes, producers and consumers can wait until the application is updated before enforcing the new version.

Start Small With Sparse Event Payloads

A common question when building event-driven architectures is “how much data should an event contain?”. While there is no one-size fits all answer, one approach is to start small with sparse event payloads. As opposed to describing the full-state of an event, they contain little details. Perhaps, this is just the event identifier and fields critical for filtering by EventBridge that can be matched by event patterns.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/* Sparse event example */
{
    "detail-type": "OrderCreated",
    "source": "com.orders",
    "detail": {
		"metadata": { 
            "event-id": "1a2b3c4d5e6f7g8h9i",
        }, 
		"data": { 
            "order-id": "123456789",
            "fulfillment-type": "fulfilled-by-amazon", /* Data for filtering */
        } 
	}
}

The benefit of sparse events is reduced coupling of the producer and consumer. However, if consumers require additional data, they will need to retrieve the data from a service or database. This can increase processing load, costs, and complexity with duplicate integration code. To resolve, over time, the event payload can expand to include additional data that are commonly used by consumers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/* Full state description example */
{
    "detail-type": "OrderCreated",
    "source": "com.orders",
    "detail": {
		"metadata": { 
            "event-id": "1a2b3c4d5e6f7g8h9i",
        }, 
		"data": { 
            "order-id": "123456789",
            "fulfillment-type": "fulfilled-by-amazon", 
            /* Provide data commonly used by consumers */
            "total": 100,
            "status": "Pending",
            "product-ids": ["ABC-123", "DEF-345", "GHI-678"], 
        } 
	}
}

How to Avoid Common Challenges and Pitfalls

Avoid Using Directed Commands as Events

With familiarity for request-response architectures, it can be natural to mistake directed commands for events. For example, Send Email may look like an event. However, in an event-driven architecture, it is typically processed by the consumer such as an EmailNotificationService, which subscribes to an observed business domain event such as OrderCreated.

Image not found

Be Aware of Complexity on Calculating the Current State

Notification or delta events only communicate details relevant to state changes. To compute the current state, each consumer needs to process past events in the correct order. Typically, the Event Sourcing and Command Query Responsibility Segregation (CQRS) patterns are used. Events that are lost, duplicates or processed in the incorrect order can lead to an incorrect state. This can have significant adverse impact, especially if there are flow-on downstream systems.
To assist, EventBridge provides an archive to replay events. See example Amazon EventBridge implementation.

Be Aware of Complexity on Inferring State Changes

Fact events communicate the full state (event-carried state transfer). Depending on the use case, consumers may need to infer the reason for state change. To achieve this, it is not uncommon for these events to incorporate the reason, and even the full before and after states. This can increase the payload size for each event, increasing costs and load on the system, especially for frequent events. In addition, consumers will need to incorporate application logic to infer the state changes. This can be duplicative and add complexity to consumers.

Be Aware of Complexity on Calculating Relationships and Joins

When transitioning to an event-driven architecture, it is common to generate events from an existing relational database, where data is normalised. For example, events can be generated from the Order and Product tables using Change Data Capture (CDC) streams. However, consumers such as EmailNotificationService may require details about both events: the order and relevant products ordered. Unfortunately, unlike a relational database, consumers may not be optimised for relational data and complex joins. Performing these calculations can introduce performance bottlenecks and increase costs.

To resolve the challenge, consider adopting the transactional outbox pattern. Here, you create a dedicated outbox table in your database, as a buffer prior to publishing events. The outbox can contain denormalised data that is isolated and purpose-built for consumers. A database transaction is used to ensure that both the producer’s internal state (in the Order and Product tables) and the consumer’s state (as facilitated by the outbox table) are consistent. To avoid performance bottlenecks, avoid data that is too large or frequently updated, and only include data that is commonly used by consumers.

For events generated by non-relational databases such as Amazon DynamoDB Streams, data may already be denormalised. However, you can also implement the same pattern with EventBridge Pipes in order to avoid inconsistencies associated with dual writes to the database and the event bus.

Conclusion

In this post, we covered the EventBridge schema registry. And more importantly, its critical role in facilitating a contract between producers and consumers in an event-driven architecture. We learned about best-practices on how to best leverage the capabilities of the registry, as well as designing and structuring events for scalability, performance and security.

Remember that each event represents an extension point for your application. And so, the goal is to design them in a way that it’s easy for consumers. Consumers will process key events, build applications on top of them, ultimately deliver value for your customers. First and foremost, be attentive and aware of their needs. From there, you will need to balance considerations across various factors including architecture coupling between producers and consumers, how to handle data model relationships, development effort and infrastructure considerations, and how business requirements and teams may evolve in future.

If you are building event-driven architectures, connect with me. I'd be interested in your feedback and insights. For additional resources, visit Serverless Land.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Best Practices When Working With Events, Schema Registry, and Amazon EventBridge

Developers are embracing event-driven architectures to scale their applications. In this post, you will learn about the schema registry, the important role it plays in an event-driven world, and best practices to guide your implementation.

Overview

What Is a Schema Registry?

Why Is It Important?

Best Practices

Understand Event Discovery and Design Principles

Develop a Naming Convention for Event Types

Catalog and Document Events for Shared Understanding

Incorporate a Standard Event Metadata for Context Awareness

Implement a Tool for Consistency and Abstraction When Publishing and Consuming Events

Use Code Bindings and Developer Tools for Agility

Implement Validation Across Your Producers and Consumers

Handle Schema Evolution With Versioning and Schema Discovery

Start Small With Sparse Event Payloads

How to Avoid Common Challenges and Pitfalls

Avoid Using Directed Commands as Events

Be Aware of Complexity on Calculating the Current State

Be Aware of Complexity on Inferring State Changes

Be Aware of Complexity on Calculating Relationships and Joins

Conclusion

2 Comments