AWS Logo
Menu
Bedrock Knowledge Bases: Structured Data Retrieval

Bedrock Knowledge Bases: Structured Data Retrieval

Using Generative AI to query your relational datasets

Lester
Amazon Employee
Published Apr 6, 2025
Last Modified Apr 11, 2025

Intro

Amazon Bedrock Knowledge Bases now supports connecting to a structured data store for users to use natural language querying to retrieve structured data from your data sources. Using advanced natural language processing (NL2SQL), Bedrock Knowledge Bases can transform natural language queries into SQL queries, executes the query on the querying engine, and returns the results in a summarized narrative response.
Flow of topics:
  • How Bedrock Structured Data Retrieval Works (NL2SQL)
  • Sample Data and Architecture Diagram
  • Step 1: Setting Up Redshift Serverless and Loading Sample Data
  • Step 2: Setting Up Bedrock Knowledge Base with Redshift
  • Step 3: Syncing your Bedrock Knowledge Base with Redshift
  • Step 4: Test the Bedrock Knowledge Base
  • Step 5: Using AWS Python SDK for Bedrock (boto3)
  • Wrapping Up and Resources

How Bedrock Structured Data Retrieval Works (NL2SQL)

Here’s a sample flow:
NL2SQL
When you create a Bedrock Knowledge Base with a structured data store, you’ll first select Redshift as the querying engine and you’ll have the option to select where your metadata is stored: 1/ Redshift, or 2/ Glue Data Catalog. The metadata storage is important as this is where Bedrock knows where to retrieve the relevant schemas to be able to reply the user’s questions.
Tutorial Architecture Diagram

Sample Data and Architecture Diagram

For this blog, we’ll be using the sample data (tickit) provided within Redshift. The tickit database track sales activity for the fictional TICKIT web site, where users buy and sell tickets online for sporting events, shows, and concerts. It contains information for each sales transaction, the listing information, event information, etc. Here’s the database schema:
Schema for Tickit Dataset
For this blog, I’ll be using Redshift Serverless as the querying engine with the data stored within Redshift. This is how the architecture diagram looks like:
Architecture Diagram for Tutorial

Tutorial

Step 1: Setting Up Redshift Serverless and Loading Sample Data

  1. Select the us-east-1 region. All resources are provisioned in the us-east-1 region for this tutorial.
  2. Create a Redshift Serverless Workgroup with the steps here: https://docs.aws.amazon.com/redshift/latest/gsg/new-user-serverless.html
  3. Follow the steps to load the sample tickit data into Redshift. Alternatively, you can load it from S3 to Redshift.
    • Redshift Query Editor
  4. Explore the different tables in this dataset.

Step 2: Setting Up Bedrock Knowledge Base with Redshift

  1. In AWS console, search for Bedrock and click on Knowledge Bases
  2. Select Create > Knowledge Base with Structured Data Store
  3. Give the knowledge base a name. For the Query Engine, select Redshift. As of this blog’s writing date, only Redshift is supported as the querying engine.
  4. Under IAM Permissions, create a new service role. Alternatively you can use an existing role with the necessary permissions stated here: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-prereq-permissions-general.html
  5. Click Next. For the Query Engine, select Redshift Serverless. Select the Redshift workgroup that you have created.
  6. For the Authentication, select IAM role. Bedrock Knowledge Base will be using the IAM role from step 4 to access Redshift.
  7. For the Default Storage Metadata, select Redshift. Select the database name. If you are using the sample data, select the sample_data_dev database.
  8. For the optional Query Configurations, there are 3 configurations you can enter
    • Description: Include descriptions for table or columns to improve the accuracy of SQL generation. This is helpful if your tables/columns uses short forms in the database and requires further explanations for Bedrock to identify the correct tables/columns more accurately when generating the SQL query.
    • Inclusion/Exclusion: Specify table names and column names to include or exclude for SQL generation.
    • Curated Queries: Include example questions and SQL queries that correspond to them to improve the accuracy of the SQL generation.
  9. Review and create the knowledge base.

Step 3: Syncing your Bedrock Knowledge Base with Redshift

  1. In AWS console, search for Redshift and open the Redshift Query Editor for the workgroup that you have set up earlier.
  2. Run the following command in Redshift Query Editor, which uses CREATE USER to create a database user and allow it to authenticate through IAM, replacing ${service-role} with the name of the Amazon Bedrock Knowledge Bases service role you created. If you sync your data store before you do this step, the user will be created for you, but the sync will fail because the user hasn't been granted permissions to access your data store.
  1. Grant the Bedrock database user run SELECT statements in your data warehouse by running the GRANT command. You can specify which table/schema you want to grant Bedrock access to.
  1. Back to AWS console for the Bedrock, select the Knowledge Base you created earlier. Under the Query Engine section, select the Redshift database and click on Sync.
  2. It should now show you a green banner stating that the sync has completed successfully with the data source.

Step 4: Test the Bedrock Knowledge Base

  1. In your Bedrock knowledge base created, click the test button. This opens up a playground for you to type some sample prompts to view the results from the Knowledge Base.
  2. Select a foundation model eg. Anthropic > Claude 3.5 Sonnet v2. Click Apply.
  3. Enter the prompt “What is the total sales?”. Once the results are replied, click “Show Details” and expand the source chunk 1 section. You will be able to view the SQL query that was generated.
    • Bedrock transforms the initial prompt from the user to a SQL query by understanding the table schemas available using advanced natural language processing. It thens run the SQL query in Redshift, outputs the data, then reply the user in a summarized narrative response.
    • Copy the SQL query. In Redshift query editor, paste the SQL query and execute it. View the results and it should be matching with what Bedrock responded with.
    • Back to the Bedrock Knowledge Base test section, click on the configuration button. You can select 3 option to return the results in:
      • Retrieval and Response generation: Query the data sources and generate a summarized response
      • Retrieve the data only: Return only the data that was retrieved from the SQL query
      • Generate SQL query only: Return only the SQL query that was generated with no data or summarized response.
  4. Run a few sample prompts such as the ones below and view the SQL query generated. You will notice that Bedrock is capable of understanding the schemas of different tables to perform JOIN, GROUP BY, ORDER statements, etc. It is also able to perform transformations for different columns eg. extracting the hour from a datetime column when asked which timing of the day generates the highest transactions.
Which event has the highest sales?
On average, do holidays generate more sales?
Which day of the week usually generate the most sales?
Which timing of the day usually generate the most number of ticket transactions?

Step 5: Using AWS Python SDK for Bedrock (boto3)

The next step is to use AWS Python SDK for calling Bedrock APIs:
  1. Import the python boto3 library. Ensure that you've configured boto3 with the necessary credentials to be able to call bedrock APIs: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
  2. Create a bedrock-agent-runtime client
  3. Use the Bedrock retrieve_and_generate function. Enter the following key information for the function parameters:
    • input: Enter the user’s prompt here
    • knowledgeBaseId: In Bedrock AWS console, you can select your knowledge base and view the knowledge base id.
    • modelArn: Enter the foundation model ARN that you intend to use or a cross-region profile.
    • type: “KNOWLEDGE_BASE”
  4. You can break down the response into 3 parts depending on your response requirements:
    1. Retrieval and Response generation
    2. Retrieve the data only
    3. Generate SQL query only
Full Sample Code:

Wrapping Up

With this new feature, Bedrock Knowledge Base is not only able to connect to a vector database ( for unstructured data eg. PDFs, TXT, DOCX) but it can also connect to your structured data stores (relational data) too. You can now build Generative AI applications that can access and incorporate contextual information from a variety of structured and unstructured data sources.

Demo Video

Resources

Authors and Acknowledgements

Lester Sim - AWS Assoc. Solutions Architect for SMB and ISV.
Paul Villena - AWS Sr. Specialist Solutions Architect for Analytics (OpenSearch/Redshift). Thank you for your guidance on this blog post and sharing your experience in the Analytics and Generative AI field.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments