S3 Express One Zone and Directory Bucket
In re:Invent 2023, AWS launched S3 Express One Zone, "reinvented" high-speed object storage. Let's explore!
Published Dec 8, 2023
This advanced storage class stands out with three key features:
- Single-digit millisecond first byte latency for compute-intensive and latency-sensitive applications
- Consistent performance eliminates tail latencies, driving down query times
- Data access speeds up to 10x faster, and requested costs up to 50% lower than S3 standard.
This storage class was designed not just to offer low-latency storage but also to ensure remarkable consistency, eliminating tail latencies. This is especially crucial for analytics workloads sensitive to such delays (a topic we'll dive into shortly). S3 Express One Zone presents data access speeds up to 10 times faster than S3 standard, coupled with a potential 50% reduction in request costs. This presents a notably distinct performance profile compared to what you're used to get with S3.
One fundamental difference between Express One Zone and the traditional storage classes lies in the introduction of a new bucket type within S3. Until now, S3 buckets have always been, well, just S3 buckets. However, with Express One Zone, AWS introduced a new category called a "directory bucket" which differs in three key aspects from the buckets you've been using for years.
Firstly, directory buckets enable you to store data in a single availability zone that you specifically select. Secondly, they follow a different request scaling model compared to traditional buckets. And thirdly, their authentication works based on sessions rather than on a per-request basis.
Let's take a closer look into each of these differences to understand the significance of this new feature.
When you upload an object to AWS, it's distributed across multiple availability zones. These AZs are essentially independent data centers, each separated by miles yet connected by high-speed and highly redundant networks. While this distribution offers resilience and availability benefits, the distance between these zones adds a degree of latency to your requests.
-Traditional S3 Infrastructure-
However, with a directory bucket, you have the power to specify a single availability zone for storing all data within that bucket. This eliminates the latency associated with spreading data across multiple AZs, providing your applications with lower-latency storage.
-One Zone Storage Infrastructure-
Moving to this new single-zone storage model involves two primary architectural considerations. Firstly, with S3 placement now a notable factor, unlike regional buckets where AWS handles object placement across availability zones for you. With directory buckets, you have the flexibility to place your data adjacent to your compute resources, reducing network distance and enhancing application speed.
Secondly, S3 Express One Zone offers a different durability model, an essential aspect for builders to comprehend. To explain, let's step back and understand how AWS ensures durability in S3 across its services. AWS employs various techniques for 11 nines of data durability in S3 Standard and other regional storage classes spread across multiple zones. These techniques involve robust integrity checks through checksums at various stages of data handling, storing data redundantly across devices to tolerate failures, and periodic audits to maintain data correctness over time.
However, it's important to note that while S3 Express One Zone maintains the same standards for durability within a single zone, it doesn't safeguard against the complete loss or damage of an entire availability zone. While such occurrences are highly unlikely, it's a factor worth considering for your architecture.
The method of scaling requests for Directory Buckets differs significantly from that of regional buckets. With regional buckets, the process involves starting at 5,500 read requests per prefix, and under load, AWS dynamically adds additional transaction capacity behind the scenes to support your growing usage.
Let's look at the image below. This gradual, linear scaling process (depicted by the orange and yellow lines) ensures elasticity, allowing for scaling up to hundreds of thousands of transactions per second as your application grows. However, in certain scenarios—like scaling up rapidly with a large GPU cluster—customers might outpace AWS's provisioning pace.
Recognizing this, AWS observed that in transactional analytics and machine learning workloads, customers often prefer instant scalability to make immediate use of their compute resources. With directory buckets, AWS takes a different approach. Instead of gradually adding capacity over time, directory buckets instantly provision hundreds of thousands of transactions per second upon creation. This instant scalability empowers users to immediately scale up their compute resources and efficiently handle substantial workloads. It's a distinct scaling model designed to meet these specific workload demands efficiently.
The third significant difference involves the security model. With regional buckets, every single request requires authentication, offering high granularity and expressive policies in IAM and bucket policies. However, this authentication process incurs some latency, which accumulates over numerous requests.
To address this, AWS introduced a new Create Session API for directory buckets. This API allows users to authenticate a session and distribute the associated authentication latency over subsequent requests by obtaining a token granting access to the entire bucket. Sessions can operate in three modes: read-only, write-only, or read-write, providing essential options for users.
While AWS abstracts most security complexities in the SDK, writing bucket policies for directory buckets varies due to different data path authentication. Simplifying policies for directory buckets involves actions like allowing session creation based on specific principles.
Regarding performance improvements, AWS reports substantial enhancements across several services: up to 5.8x faster loading times for Amazon SageMaker, about 2.1x quicker query times for Amazon Athena, and up to a 4x performance boost for Amazon EMR. These improvements showcase the tangible benefits of leveraging S3 Express One Zone for various AWS services.
Let's take a look at the various scenarios where the new Express One Zone and Directory Bucket services can play a pivotal role. From expediting machine learning and AI training by bolstering data access speeds to speeding up model dataset processing and development, these services usher in dynamic capabilities. They empower interactive data analytics by facilitating rapid insights and processing of extensive data volumes with exceptional speed and minimal latency for swift query execution. Seamlessly handling compute-intensive HPC workloads becomes effortless with scalable, high-performance storage that seamlessly integrates with computing resources. Additionally, these services enhance financial modeling, enabling scalable operations with heightened granularity and faster modeling speeds through low-latency operations. Furthermore, they optimize real-time advertising efforts by swiftly delivering personalized content with minimal latency, refining personalization strategies, and accelerating ad deployment. Moreover, they efficiently manage media content workloads by meeting tighter timelines for VFX, rendering, and transcoding needs through scalable storage that syncs seamlessly with computational demands. [*]
Understanding how S3 Express Zone is handy and how it work is really important for making the most of them. Figuring out the best situations to use them is key to using them well. Learning about how the infrastructure actually works can help us prevent future problems before they happen, making things run more smoothly. Remember, while this service excels in speed, it may involve trade-offs in terms of resilience. It's our responsibility to enhance the system's resilience by designing it accordingly.
When deciding against utilizing AWS Directory buckets, it's essential to consider specific limitations that might hinder their support for your particular use case. These constraints include several aspects:
- Objects within Directory buckets cannot have tags applied to them. Consequently, attempts to copy an object with a tag to a Directory bucket will result in a 501 Not Implemented response.
- Directory buckets become inactive after remaining idle without request activity for 3 months. During this inactive state, the buckets are inaccessible for both read and write operations. Reactivation occurs upon access request, which might take a few minutes, leading to 503 slowdown responses for read and write requests.
- Only Server Side Encryption with S3 Managed keys (SSE-S3) is supported for Directory buckets. Other encryption methods like SSE-KMS and SSE-C are not compatible.
- Several essential S3 features such as Multi-Factor Authentication, S3 Versioning, Replication, Inventory reports, and S3 event notifications are not supported in conjunction with Directory buckets.
- The authorization model differs for Directory buckets, lacking object-level authorization; instead, authorization must occur at the bucket level.
- Currently, Directory buckets are supported only in specific regions: us-east-1, us-west-2, ap-northeast-1, and eu-north-1.
If you have any questions, please don't hesitate to contact me.
- https://aws.amazon.com/blogs/storage/tag/amazon-s3-express-one-zone/
- https://youtu.be/IGQtG-7kbbM
- https://aws.amazon.com/s3/storage-classes/express-one-zone/integrations/