AWS open source newsletter, #205
A round up of the latest open source news, projects, and events that every open source developer should know about.
- Introducing kro: Kube Resource Orchestrator looks at this new experimental open source project from AWS that simplifies and empowers the use of custom APIs and resources with Kubernetes [hands on]
- How to build custom nodes workflow with ComfyUI on Amazon EKS builds on from some earlier posts and sample code, and shows how you can deploy the ComfyUI project on Amazon EKS [hands on]
- Enhancing Developer Productivity: Finch’s Support for Development Containers and the Finch Daemon looks at some additional capabilities of Finch that will now allow Finch users to use it with VSCode dev containers, as well as the Finch demon which will help improve compatibility as you look to migrate off Docker [hands on]
- Diving Deeper into Projen: Exploring Advanced Features is a follow up from a previous post (also featured in an earlier version of this newsletter) that looks at some of the more advanced capabilities that help improve developers experience and productivity
- OCSF Joins the Linux Foundation: Accelerating the Standardization of Cybersecurity Data shares more details about the recent move to the Linux Foundation for the Open Cybersecurity Schema Framework (OCSF), and what it means going forward
- Amazon Aurora PostgreSQL Limitless Database is now generally available looks at a new serverless horizontal scaling (sharding) capability of Amazon Aurora called Aurora PostgreSQL Limitless Database, where you can scale beyond the existing Aurora limits for write throughput and storage by distributing a database workload over multiple Aurora writer instances while maintaining the ability to use it as a single database [hands on]
- Disposition strategy and planning for migrating Kubernetes clusters provides some great insights into a perhaps all to little discussed topic - how you manage your infrastructure resources as you update and migrate your Kubernetes clusters
- Amazon EKS now supports Amazon Application Recovery Controller demonstrates how you can use the ARC zonal shift and zonal autoshift capabilities to prepare for and recover from AWS Region or Availability Zone (AZ) impairments [hands on]
- Amazon EKS enhances Kubernetes control plane observability looks at some new control plane observability enhancements to the Amazon EKS clusters [hands on]
- Run high-availability long-running clusters with Amazon EMR instance fleets helps you learn how to launch a high availability instance fleet cluster using the newly redesigned Amazon EMR console [hands on]
- MultiXacts in PostgreSQL: usage, side effects, and monitoring dives deep into the inner workings of MultiXacts, a special structure within PostgreSQL that relies upon when multiple transactions attempt to lock the same row simultaneously [hands on]
- Benchmark Amazon RDS for PostgreSQL with Dedicated Log Volumes guide you through the process of benchmarking the performance of Amazon RDS for PostgreSQL using the Dedicated Log Volume (DLV) feature that is available in Amazon RDS - if you are a heavy user of PostgreSQL, this is a must read post [hands on]
- Load vector embeddings up to 67x faster with pgvector and Amazon Aurora provides another great benchmarking post, this time looking at the index build and query times of the new version of pgvector 0.7.0 with the prior [hands on]
- Optimize Amazon Aurora PostgreSQL auto scaling performance with automated cache pre-warming is a great post that show how to improve query performance of a new replica by warming up the cache using pg_prewarm [hands on]
- Visualize vector embeddings stored in Amazon Aurora PostgreSQL and explore semantic similarities explores how you can gain valuable insights into your data, uncover hidden patterns, and make informed decisions using principal component analysis (PCA) [hands on]
- Use Amazon ElastiCache as a cache for Amazon Keyspaces (for Apache Cassandra) shows you how to use Amazon ElastiCache as a write-through cache for an application that uses an Amazon Keyspaces (for Apache Cassandra) tables [hands on]
- Best practices for running Apache Cassandra with Amazon EBS covers how Cassandra interacts with the OS’s file system and page cache when reading from disk, using various Linux tools like iostat, xfsdist, xfsslower, cachestat, and biolatency, to get insights into different layers of disk I/O performance [hands on]
- Achieve a high-speed InnoDB purge on Amazon RDS for MySQL and Amazon Aurora MySQL looks at how you can improve InnoDB purge efficiency, through a combination of workload optimisation, database capacity planning, and configurations [hands on]
- Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose is a ready to go solution that shows you how to set up Firehose streams to deliver streaming records into Apache Iceberg tables in Amazon S3 [hands on]
- Apache HBase online migration to Amazon EMR provides some real-world migration cases to introduce the process of migrating HBase to Amazon EMR HBase using HBase snapshot and replication and the deployment mode of HBase on Amazon S3 [hands on]
- Build fullstack AI apps in minutes with the new Amplify AI Kit dives into what is in the Amplify AI kit and how it can simplify building secure full-stack AI applications with Amplify and Amazon Bedrock [hands on]
- Run Apache XTable in AWS Lambda for background conversion of open table formats looks at how this emerging open source project helps facilitates seamless conversions between open table formats (OTFs), providing hands on examples and code [hands on]
- Amazon FSx for Lustre increases throughput to GPU instances by up to 12x provides some nice details on how Amazon FSx for Lustre now supports 12 times higher (up to 1200 Gbps) per-client throughput compared to the previous FSx for Lustre version
- Amazon Relational Database Service (RDS) for PostgreSQL now supports pgvector 0.8.0, an open-source extension for PostgreSQL for storing and efficiently querying vector embeddings in your database, letting you use retrieval-augmented generation (RAG) when building your generative AI applications. pgvector 0.8.0 release includes improvements on PostgreSQL query planner’s selection of index when filters are present, which can deliver better query performance and improve search result quality. pgvector 0.8.0 release includes a variety of improvements to how pgvector filters data using conditions in WHERE clauses and joins that can improve query performance and usability. Additionally, the iterative index scans help prevent ‘overfiltering’, ensuring generation of sufficient results to satisfy the conditions of a query. If an initial index scan doesn't satisfy the query conditions, pgvector will continue to search the index until it hits a configurable threshold. This release also has performance improvements for searching and building HNSW indexes. pgvector 0.8.0 is available on database instances in Amazon RDS running PostgreSQL 17.1 and higher, 16.5 and higher, 15.9 and higher, 14.14 and higher, and 13.17 and higher in all applicable AWS Regions.
- Amazon Relational Database Service (RDS) for PostgreSQL now supports the latest minor versions 17.2, 16.6, 15.10, 14.15, 13.18, and 12.22. We recommend that you upgrade to the latest minor versions to fix known security vulnerabilities in prior versions of PostgreSQL, and to benefit from the bug fixes added by the PostgreSQL community. You are able to leverage automatic minor version upgrades to automatically upgrade your databases to more recent minor versions during scheduled maintenance window. Additionally, starting with PostgreSQL major version 18, Amazon RDS for PostgreSQL will deprecate plcoffee and plls PostgreSQL extensions. We recommend that you stop using Coffee scripts and LiveScript in your applications, ensuring you have an upgrade path for future.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.