logo
Menu
Safe List Updates with DynamoDB

Safe List Updates with DynamoDB

Several solutions to safely modify a list attribute on a DynamoDB document.

Robert Zhu
Amazon Employee
Published Jun 16, 2023
Last Modified Apr 19, 2024
Full code if you want to follow along.
Amazon DynamoDB is one of the most versatile and popular services on AWS. In seconds, we can deploy a highly available, dynamically scaling key-document store with global replication, transactions, and more! However, if we modify a list attribute on a document, we need to take extra steps to achieve correctness and concurrency. Below, I'll describe the problem and offer several solutions.
Suppose we insert the following document, using the JavaScript AWS-SDK and the DynamoDB DocumentClient:
In the DynamoDB console, here's what the document looks like:
DDB Console after running put operation
By default, the DocumentClient has marshalled the JavaScript array as a DynamoDB List type. How would we remove the value "frylock" from the "friends" List attribute? Here's the doc on the List's remove operation. Since we need to specify the index of the element to remove, we need to read the document and find the index:
But this implementation has a race condition; there is a small window of time between reading the document, finding the index, and sending the update request, during which the document could be updated by another source, thus causing the operation "remove element at index X" to produce an undesired result. Luckily, there are several solutions to this common problem.

1. Condition Expressions on the list contents

DynamoDB supports a handy feature called a Condition Expressions, which lets us specify a condition that must be met in order for the operation to execute. In this case we want to build a rule that says, "only execute this operation if the target value is in the list":
A Condition Expression is a predicate that prevents execution if it evaluates to false. When we use Condition Expressions, we also need to handle the error case where the condition expression is not met. Here's the updated function:
But this technique only ensures that updates to the list attribute are safe. How can we ensure we only apply updates when the document has not changed?

2. Condition Expression on a version attribute

Borrowing from databases that employ multi-version concurrency control, we can introduce a "version" attribute at the root of our document. We can use the version field to set a condition expression that aborts the update when any other update has occurred. During the put operation, we can include an initial version property like so:
Let's update the condition expression and add error handling:
Notice that the Update Expression also increments the version attribute. The two drawbacks to this approach:
  1. We need to add a version attribute to every document/table for which we want to enforce this pattern.
  2. We need to create a wrapper layer that ensures all updates respect the version attribute and somehow ensure that direct update operations are avoided.

3. Use the Set data type

In practice, a friends list would store of a list of unique foreign keys. If we know the entries are unique, we can marshal the friends field as the DynamoDB Set data type instead of a List. Compared to lists, sets have a few differences:
  1. All values must be of the same type (string, bool, number)
  2. All values must be unique
  3. To remove element(s) from a set, use the DELETE operation, specifying a set of values
  4. A Set cannot be empty
Sounds perfect for storing a list of related document keys. However, we saw that the DocumentClient serializes JavaScript arrays as Lists, so we need to override that behavior with a custom marshaller.
In the console, our new document looks almost identical, except for the "StringSet" type on the friends attribute.
Storing the friends field as a String Set
Now for the DELETE operation:
Working with Sets from JavaScript has two gotchas. First: a set attribute on a document does not deserialize into a JavaScript array. Let's see what it actually returns:
From the above experiment, a DynamoDB Set deserializes into an object with its array of items stored under the values property. If we want a Set to deserialize into an array, we need to add an unmarshalling step where we assign the values property instead of the deserialized set object itself. Remember how sets cannot be empty? If we try to remove all elements from a set, the console will stop us:
Sets cannot be empty
However, if we remove the last element from a set in code, the attribute will be deleted from the document. This means the unmarshalling step we mentioned in gotcha #1 will need to account for the case where the property is undefined. Here's a helper function that covers both cases:
You'll still get an error if you try to store an empty array as a set, so here's the helper function going the other way:

4. Global Write Lock

The final solution is avoid the Transactional Memory problem by preventing concurrent writes. We can avoid concurrent writes by requiring any writer to obtain a distributed write lock (using a distributed lock service, such as etcd or zookeeper).
Since there are many implementations of the global-write-lock pattern, I'll omit sample code and directly discuss the tradeoffs.
This technique has two significant drawbacks: 1) a distributed lock service adds extra complexity and latency. 2) A global write lock reduces write throughout. If you’re already using a distributed lock service and you don’t need high write throughput, this solution is worth considering.

What About Transactions?

DynamoDB also supports multi-document transactions, and this sounds like a promising solution. But, as my colleague Danilo puts it:
Items are not locked during a transaction. DynamoDB transactions provide serializable isolation. If an item is modified outside of a transaction while the transaction is in progress, the transaction is canceled and an exception is thrown with details about which item or items caused the exception.
For this use case, transactions will essentially act like a slower version of condition expressions.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments