Enhancing IVS Customer Success with Generative AI: Leveraging Amazon Q with Slack
a journey on how an internal chatbot helps to tap into our vast knowledge-base with Amazon Q and Slack
Sasha
Amazon Employee
Published Jun 26, 2024
Customers often discover unique uses for any technology, leading to integration-related inquiries. On IVS Customer Success, we work directly with Amazon IVS customers to find solutions to their questions, which results in a significant buildup of internal knowledge. To manage this knowledge, we capture interactions in customer engagement documentation, and use a ticketing system to track customer details and provide continuity. These artifacts often contain remedy information in the form of a fix or workaround to a problem. The problem is providing rapid access to these gems to our front line customer success personnel and to other cross team members for knowledge sharing.
In it’s default form, much of this knowledge too specific to be included in the public user guides. Instead, it is stored internally and made accessible to frontline team members who mentally index this data. While searching is possible, it requires searching many internal repositories and documents with not always great search capabilities. And like any pre-GenAI based search it requires that you know what you’re looking for. To help unite these repositories and make searching more “intelligent” why not leverage a Generative AI that taps into this knowledge layer?
Parallel to our internal efforts to build a bespoke generative AI tool, this presented an opportunity to explore Amazon Q to compare and evaluate its retrieval augmentation (RAG) and inference capabilities. Given that our team already uses Slack daily, my first thought was to use Slack as the interface for interactivity.
Here is a high level view of the application architecture:
- An existing support Slack Bot is expanded to work with Amazon Q to facilitate the look up past conversations and access a large knowledge base for IVS questions.
- Using an Amazon Q native retrieval augmentation pipeline to create embedding to run inference on 2000+ redacted/sanitized tickets, documents and public user guides.
- The bot provides attribution to the both internal and external source document URLs.
First I extracted ticket data from our internal tools. Each file was tagged with a unique identifier (UUID) for clarity and traceability, allowing us to trace content back to the original artifact or customer interaction. The next step was sanitizing and redacting any sensitive information in these records.
Following the initial extraction of the knowledge data, the next step is sanitization and reduction of context. This involves filtering out sensitive data from over 40+ permutations of patterns, including personally identifiable information, confidential details, and temporal references.
Some example regexs:
The goal is to seek out to replace any identifiers, stream/channel details, amazon-internal domains, confidential data, and other sensitive elements by replacing these with placeholders.
By doing so, we effectively prevent the accidental leakage of such information during the retrieval augmentations operations that follow. Once redacted, this data is ready to be ingested into a retrieval augmentation pipeline of Amazon Q to generate embeddings.
Amazon Q provides the ability to feed the data artifacts via integrated services, simplifying the more involving part of “prompt engineering” typically required of implementing a custom retrieval augmentation (RAG) orchestration.
The flow to create an app goes something like this:
Amazon Q console view in the first steps of creating an application:
The next step of ingestion involves selecting a retriever.
Amazon Q allows you to choose between native or existing retrievers, each offering varying functionalities (see screenshot below). Additionally, you need to specify an index provisioning. For our purposes, we use the Native retriever with Starter Index provisioning, as it meets the criteria for our proof of concept needs and development.
To build a retriever, first you must connect data sources.
Amazon Q integrates with a number of AWS and partnered services for retrieving large data sets, providing a wide selection of integrations:
The setup process involves specifying your S3 bucket details and defining the parameters for how Amazon Q should process and retrieve information from your dataset. In our case, we set up a couple of S3 buckets containing pre-processed ticket artifacts, web crawler data, and Quip documents.
We also specify Sync scopes and Sync mode as part of S3 ingestion. After pointing to a bucket containing the artifacts, have the Retriever sync the database and index the sources. This can take tens of minutes, depending on the dataset size. Additionally, both the sync for the S3 and the Web crawler (that can go through web pages — in our case IVS public user guides) can be scheduled to run at specific intervals. In our case, we set the crawler to run weekly and the S3 sync to run upon detecting changes in the bucket.
Retrieval is a computationally expensive process and will incur costs. An initial full sync is advised when indexing new buckets or crawling through ser guides. However, for any subsequent runs, the “New, Modified, or Deleted Content Sync” mode is preferred. This mode can help avoid unexpected costs that can arise from repeatedly executing the retrieval step at frequent intervals.
Once the setup is complete, running the inference step is straightforward. At this point we add our Slack integration.
Note: Amazon Q provides a UI component to the application the users can interact with, however we won’t cover this topic here.
So once we’ve put this together, how do we get Slack to interact our Q?
One challenge was concurrency. Since running inference takes up to 20 seconds, and Slack times out after 3000 ms, we needed a mechanism to queue the request. For this we chose Amazon Simple Queue Service. SQS allows us to buffer incoming messages and process them asynchronously when they complete. This is also useful when dealing with high volumes of incoming queries.
Example of queuing request using Golang using a FIFO queue:
The request flow starts with receiving an event from the SQS queue, which contains the user’s message and metadata such as the channel ID and thread timestamp. The bot then makes an API request to Amazon Q. This involves creating a `ChatSyncInput` object with the relevant application ID, user ID and message. The request is sent using the `qbusiness.ChatSync` method, and the response is logged and processed. Here’s an example of how the API request may look like:
When it comes to sending the Amazon Q response back to Slack, the process involves capturing the inferred response and any associated attributions. The named `sendMessageToSlack` function handles this by taking the user’s input, the generated response message, the thread timestamp, and the user’s channel ID. Typically, the response from Amazon Q includes the inferred response along with a couple of source attributions, which are formatted and included in the message sent back to Slack.
Additionally, to provide proper attribution for the information retrieved, I needed to regenerate internal URLs based on the UUID so that a user can jump to the internal source ticket or a document.
This involves extracting specific patterns from the titles of the response attributions. For instance, titles matching the pattern “([a-zA-Z0–9-]+)_sanitized(\.json|_claude\.txt)” are transformed into internal URLs like https://subdomain.amazon.com/{alphanumericName}.
So far, it’s been useful; I use the bot on the weekly and at times daily to help me look up information that I think may be floating in the knowledge base but I don’t know the keywords for. Not only does the bot locate the information, but it also provides me with reasonably phrased language that I can quickly repurpose for my correspondence. Additionally, it offers attributes and links to the sources where it found the information, helping me to identify and dig into more specifics if needed. The bot also provides snippets of the original articles or language used in the correspondence.
When compared to a ClaudeV3-based retrieval that we’ve built internally for wider customer use and the initial yet withstanding Amazon Q implementation, we’ve found that Amazon Q’s tone is well-suited for our internal needs, allowing us to look up information with more specifics quickly.
Here are a couple of simple examples:
Next, I plan to set up automation to provider Support with proposed responses upon support request creation. An IVS colleague can then review these proposals and, with some adjustments, provide speedy and informed replies to the inquirer, likely saving tenth of minutes of search and writing time.
## Special Thanks to Mike Gaffney aka "Gaffo", an L8 Principal Engineer Amazon/Twitch colleague (no big deal!) for helping me to revise this post!
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.