Cartoon Stock Talks About Using Semantic Search & Vectors | S02 EP38 | Lets Talk About Data

Cartoon Stock Talks About Using Semantic Search & Vectors | S02 EP38 | Lets Talk About Data

In this show we talk about how Cartoon Stock went from a standard lexical search to semantic. It started with a python server running on an EC2 and a model from Hugging Face, and now to a setup with Bedrock and Titan v2 embeddings which is much more efficient. All the while having vectors stored in an RDS Postgres database with the pgvector extension.

Ibrahim Emara
Amazon Employee
Published Oct 30, 2024
CartoonStock is a platform featuring 750,000 cartoons from artists around the world, founded in 1997. In 2018, the company was acquired by former New Yorker cartoon editor Bob Mankoff, who brought new enthusiasm and a focus on exploring AI and new technologies. Prior to the acquisition, CartoonStock's search capabilities were limited to exact keyword matching, resulting in poor customer experience. They then implemented a lexical search with AND/OR logic, which was an improvement but still had limitations.
To address these limitations, CartoonStock explored semantic search capabilities. They initially used an EC2 instance to host a Python API that would vectorise search queries using models like InstructorXL, and then query a Postgres database with PGVector to find relevant results. However, this setup had performance and maintenance challenges. CartoonStock then transitioned to using AWS Bedrock to handle the vectorisation, with a Lambda function orchestrating the search flow against their Postgres database. They also implemented HNSW indexing to improve query performance.
CartoonStock's latest iteration involves using AI to generate detailed descriptions for each cartoon, which are then vectorised and stored in Postgres alongside the existing metadata. This allows for more nuanced and contextual search capabilities, as users can now search using natural language descriptions rather than just keywords. The team is also exploring integrating a chatbot assistant to further enhance the search experience. CartoonStock's journey showcases the benefits of adopting managed cloud services and leveraging the latest AI/ML capabilities to continuously improve their platform.
Highlights:
- Transitioned from exact keyword matching to semantic search capabilities
- Used EC2 instance with Python API for initial vectorisation, then switched to Bedrock
- Leveraged PGVector and HNSW indexing in Postgres to optimize search performance
- Employed AI to generate detailed cartoon descriptions for more contextual search
- Exploring chatbot assistant to further enhance the search experience
- Small team (2 people) able to rapidly iterate and adopt new technologies
- Benefited from an enthusiastic leadership team driving innovation
Loading...

Hosts of the show 🎤

Ibrahim Emara, RDS Specialist Solutions Architect @ AWS

Guests

Chris Elkins Technical Director @CartoonStock
Luke Henderson Software Engineer @CartoonStock
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments