Deploy LangChain 🦜🔗 applications on AWS with LangServe 🦜️🏓

EDITS

26-06-2024: Refactored code to use langchain-aws and added Claude 3.5 Sonnet
18-04-2024: Added Claude 3 Opus and removed Claude 1 (EOL)
12-04-2024: Updated instructions to reflect the langchain package split

Introduction

Organizations must move quickly when building generative AI applications. The rapid pace of innovation and heavy competitive pressure means that accelerating the journey from prototype to production is no longer an ancillary concern, but an actual imperative. One key aspect of this involves choosing the right tools and frameworks that will enable faster iteration cycles and quick experimentation by both developers and external users.

LangChain is an open source framework that simplifies the entire development lifecycle of generative AI applications. By making it quick to connect large language models (LLMs) to different data sources and tools, LangChain has emerged as the de facto standard for developing everything from quick prototypes to full generative AI products and features. LangChain has recently entered the deployment space with the release of LangServe, a library that turns LangChain chains and runnables into production-ready REST APIs with just a few lines of code.

In this post, I’m going to show you how to build, test, ship and deploy LangChain-based applications with LangServe on Amazon Elastic Container Service (ECS) and AWS Fargate in a quick, secure and reliable way, without the need to manage infrastructure.

Solution Overview

Objectives 🎯

In the following sections, we're going to build, test and deploy a simple LangChain application, powered by Anthropic’s Claude on Amazon Bedrock, that assumes a role before replying to a user’s request.

Role prompting can be used to control the style, tone, depth and even the correctness of the generated text. Using this technique, LLMs can play a wide variety of roles with different personalities simply by feeding in the right input (prompt).

👨‍💻 All code and documentation for this walkthrough is available on GitHub.

Prerequisites ✅

Before we get started, take some time perform the following prerequisite actions:

1/ If you’re using your own workstation, make sure these tools are installed and properly configured:

👇 The Development Environment Setup section below explains how to provision the development environment using AWS CloudFormation.

Conda (preferred) or Python (version >=3.9)
Docker
AWS Copilot CLI
Any code editor or IDE e. g. VS Code

2/ Enable access to Anthropic’s Claude models via Amazon Bedrock

💡 For more information on how to request model access, please refer to the Amazon Bedrock User Guide (Set up > Model access)

Steps 👣

This walkthrough includes two separate tracks:

Fast lane 🏍️ - deploy an existing LangServe application
Deep Dive 🤿 - build, test and deploy a LangServe application from scratch

You can start with the Fast Lane track, delete the application, then carry on with the Deep Dive track, and vice versa. Either track will produce similar results.

Here's a high-level overview of what we're going to do in each one:

Fast Lane 🏍️

Setting up the development environment
- Deploy the CloudFormation stack
- Log in to the Code Server IDE
- Access the instance via EC2 Instance Connect
Deploying an existing LangChain application

Deep Dive 🤿

Setting up the development environment
- Deploy the CloudFormation stack
- Log in to the Code Server IDE
- Access the instance via EC2 Instance Connect
Building and testing LangChain applications
- Bootstrap a new LangServe project using the LangChain CLI
- Add LangChain templates to the project
- Deploy and test the application locally with LangServe
Deploying LangServe applications
- Deploy the application to Amazon ECS and AWS Fargate using AWS Copilot
Securing and operationalizing LangServe application
- Add security features like Basic Authentication and security groups
- (optional) Connect the application to LangSmith for monitoring and logging

Walkthrough 🌀🚶‍♂️

Setting up the development environment

Let's start by setting up the development environment with Conda, code-server, Docker and AWS Copilot pre-installed.

Deploy the CloudFormation stack

1/ Open the CloudFormation console, click on Create stack > With new resources (standard), select Specify template > Upload a template file in the Create stack section and upload the template file (infra/cloudformation/deploy.yml).

2/ In the Specify stack details section, fill in the Parameters and click Next:

Use the Region selector in the navigation bar at the top to select the region you want to use to deploy the resources (the default region is us-east-1 / N. Virginia).
For Stack name, use the default value (langserve-aws) or change it to something else.
For CodeServerPassword, enter a secure password and store it somewhere safe - this password will be used later to access code-server.
For all the remaining parameters, use the default values.

3/ In the Configure stack options section, leave everything unchanged and click Next.

4/ Review everything, check the acknowledgement boxes and click Create stack.

🕓 The stack should take ~5 minutes to deploy. When it's finished, it will show the status CREATE_COMPLETE.

Log in to the Code Server IDE

1/ On the AWS CloudFormation console, select the stack and open the Outputs tab.

2/ Point your web browser to the URL shown in CodeServerUrl.

3/ Use the password specified in the Deploy the CloudFormation stack (step 2) to login.

💡 At this point, you can open a new Terminal session from code-server and/or access the instance directly using EC2 Instance Connect.

Access the instance via EC2 Instance Connect

1/ On the AWS CloudFormation console, select the stack and open the Outputs tab.

2/ Point your web browser to the URL shown in InstanceConnectUrl or start a new Terminal session on your workstation and run the command below (don’t forget to replace <InstanceId>)

Deploying existing LangServe applications

❗ Feel free to skip this section if you’re interested in building the application from scratch.

In this section, we’ll be deploying the full application using AWS Copilot, an open source command line interface (CLI) that makes it simple for developers to build, release, and operate production-ready containerized applications. Behind the scenes, AWS Copilot will be using Amazon ECS and AWS Fargate, which allows us to run applications without the need to manage infrastructure, getting the latest up-to-date and patched compute capacity for our workloads.

0/ If you’re using your own workstation and haven’t done so already, now it’s the right time to install the AWS Copilot CLI - you can do so through Homebrew or by downloading the binaries directly. You can check if the CLI is installed correctly by running copilot --help

1/ Clone the project repository

2/ Initialize the application

3/ Add user credentials and the LangChain API key as secrets

4/ Deploy the application

🕓 The deployment should take ~10 minutes. AWS Copilot will return the service URL (COPILOT_LB_DNS) once it's done.

5/ Point your browser to the service playground (<COPILOT_LB_DNS>/claude-chat/playground) to test the service.

☝️ Use the credentials specified in step 3 to login (the default username/password is bedrock/bedrock).

6/ Finally, don’t forget to clean up the resources when you’re done!

💡 Keep on reading if you want to learn more about the application and how to build one yourself.

Building and testing LangChain applications 🏗️

In this section, we’re going to build the LangServe application from scratch. If you have a project already, just go back to the fast lane section (Deploying existing LangChain applications).

1/ We'll start by creating a new Conda environment to keep our project isolated

2/ Once the environment is activated, you can install the LangChain CLI by running

3/ Create a new directory to hold the application files

4/ Bootstrap a new LangServe project by issuing the command (skip the package installation for now)

This will create the basic structure of a LangServe REST API without external packages. You can use the --package option to add existing packages (commonly known as templates) upon creation, add them afterwards with the langchain app add command or create new ones with the langchain template new command. The tree structure for the project should look as follows (pycache files/folders are not shown).

5/ Create a new template named claude-chat

This command will add the template files to the packages folder

6/ LangChain uses Poetry as the dependency manager. Make sure the local claude-chat service is added as a dependency by issuing the command

Right now, the package contains only sample code. We’ll need to edit the chain.py file under the claude_chat template to do something useful. This file exposes a Runnable, the basic building block that allows us to define custom chains as well as invoke them in a standard way.

7/ Remove the sample code and copy+paste the following sections directly to the chain.py file in order:

> Parameters: the inference parameters which control the degree of randomness and the length of Claude’s responses are configured via environment variables

> Models: the application will use Amazon Bedrock’s native integration with LangChain via Bedrock Chat to call Anthropic’s Claude model. By default, the API uses Claude 3 (Sonnet) and you can quickly configure different alternatives that are available at runtime.

> Prompts: the prompt template is intentionally simple as we’re just assigning a role and injecting the user’s input.

> Chains: the chain is defined declaratively using the LangChain Expression Language (LCEL) - in this case, we’re just chaining (piping) together the prompt template and the chat model.

At this point, the complete file should look something like this

8/ Next, you need to update the template dependencies by running the following commands

Notice that LangChain uses Boto3, the AWS SDK for Python, to call Amazon Bedrock. If you’re using your own workstation, you probably need to select a default AWS Region and setup AWS credentials before making any requests.

💡 For more information on how to do this, please refer to the AWS Boto3 documentation (Developer Guide > Credentials).

9/ Once the chain.py is updated, you’ll need to install the package

10/ To use the claude-chat template, replace the ‘dummy’ API route in the server code (app/server.py)

with a new API route that directs requests to the chain

11/ Since LangServe is automatically installed with the LangChain CLI, we can start the application right away

💡 Use the --host and --port flags to change the listen address and port if necessary.

12/ You can check that the server is up and running by navigating to LangServeUrl, which is available in the Outputs section of the CloudFormation stack, or http://127.0.0.1:8000 if you’re running it locally (unless you specified a different port).

By default, LangServe will redirect you to the self-generated OpenAPI docs endpoint:

LangServe automatically creates a playground under <LangServeUrl>/claude-chat/playground so you can quickly test the service.

I encourage you to try different models (Configure > Configurable > Model), prompts and roles (Try it > Inputs > Input/Role). Notice that the same prompt may not work with different models and model versions. In general, assuming that the role and inference parameters remain unchanged, the same prompt can yield different results. This is to be expected and it’s one of the reasons why it’s important to test your application and run it against different prompts and prompt templates before going into production.

While this is beyond the scope of this post, keep in mind that you can have more control over the model output simply by tweaking the inference parameters and applying some prompt engineering best practices.

💡 For some tips on how to construct effective prompts for Claude models, check out Anthropic's Prompt Engineering section and the Bedrock Prompt Engineering Guidelines.

Deploying LangServe applications 🚀

Now that we’ve tested the API locally, it’s time to deploy the application for the first time on AWS. For this part, we will be using AWS Copilot to create the application (bedrock-chat-app), bootstrap an environment (dev) and deploy a service (claude-chat).

1/ One of the nicest features of AWS Copilot Deploy is that we can deploy a project to AWS with a single command

AWS Copilot will ask for an environment name once the initial setup is complete. ✋ Read the next step before entering anything!

At first glance, the command may seem complex, but let’s break it down:

copilot init is used to create a new Amazon ECS or AWS App Runner application
--app allows us to choose a name for the application (bedrock-chat-app), while --name is used to define the new service (claude-chat)
--type specifies the type of service that will be created
In this case, you are creating an Internet-facing Load Balanced Web Service; if you want a service without a public endpoint, you can create a Backend Service, which is beyond the scope of this post
--dockerfile references a local path to the Dockerfile, which is auto-generated by the LangChain CLI
--deploy asks copilot to deploy the service to a new or existing environment

As far as AWS Copilot is concerned, an application is just a collection of different environments and services working together. Service and environment definitions are provided declaratively via manifest files, which are converted into AWS CloudFormation templates.

2/ ⚠️ Before proceeding, you will need to edit the claude-chat service manifest file (copilot/claude-chat/manifest.yml) in the IDE.

By default, the LangServe application will redirect requests to root (/) to the /docs path with a 307 status code (Temporary Redirect). Since the default health check in the Load Balanced Web Service manifest (http.healthcheck) expects a 200 status code, this will prevent the service from entering a healthy state.

You can either change the health check path (which is commented out in the sample service manifest)

or add 307 to the list of allowed HTTP status codes for healthy targets

Both options will work since http.healthcheck accepts either targets (strings) or full health check configurations (maps).

3/ Next, you need to give the service permissions to invoke specific Bedrock models. 🚦 A simple way to do this with AWS Copilot is to use workload addon templates. These are just CloudFormation templates that a) create at least one resource and b) contain the parameters App, Env and Name.

Create a new folder named addons under copilot/claude-chat and add a template file named bedrock-policy.yml with the following contents

This template will create a custom managed policy that grants bedrock:InvokeModel* (including streaming) access to the models provided by Anthropic (anthropic.*). Since the template returns the policy ARN, AWS Copilot will automatically attach it to the ECS task role.

4/ 🖥️🔙 Return to the terminal and create an environment named dev. After a couple of minutes, the message below will appear in the console and AWS Copilot will return the public URL to access the application.

5/ Once the service is deployed, you can use the Copilot CLI to get more information about the service

to check the service status

and to check the service logs

You can use the --resources flag with the env show / svc show sub-commands to get the full list of active resources or env package / svc package to print the corresponding CloudFormation template.

Securing and operationalizing LangServe Applications

Adding Basic Authentication 🔐

While production-grade authentication for LangServe applications is outside the scope of this post, it is still valuable to spend a few minutes discussing some simple ways to help protect your API which, as it currently stands, is open to everyone.

LangServe is built on top of FastAPI which means that you can implement a basic authentication scheme with a quick refactoring of the server code.

💡 For more in-depth information on how to handle authentication, please refer to FastAPI's security and middleware documentation.

1/ Replace the contents in app/server.py with the following code sample.

Each request that goes through the router will call the function get_current_username to authenticate the user. The default credentials are bedrock/bedrock, but you can change their values by fixing BEDROCK_CHAT_* environment variables.

2/ The new version of the server code adds a dedicated health check route, so don’t forget to change the health check path from /docs to /health in the copilot/claude-chat/manifest.yml.

3/ (Optional) A simple and secure way to inject credentials, like the Bedrock Chat username and password, and other sensitive information into the service is to store that information as a secret in AWS Systems Manager Parameter Store (SSM)

then add a reference to the service manifest file

4/ The changes to the service can be pushed to the dev environment by issuing the command

6/ Once the deployment finishes, you can test that this authentication setup works with a simple curl command:

Be sure to replace USERNAME, PASSWORD, ROLE, PROMPT and MODEL (e. g. claude_3_haiku) with the correct values.

💡 You can restrict access to the service by adding a custom security group via workload add-on templates. The security group will be automatically attached to the ECS service as long as it is included in the Output section of the template.

Integration with LangSmith 🦜️🛠️

Another interesting application of secrets is in setting up a connection to LangSmith, the official LangChain observability tool that can be used to log and monitor calls to the LangServe API.

⚠️ As of this writing, LangSmith is still in limited preview (beta release). For more information on how to get started with LangSmith, please refer to the LangSmith docs for a complete interactive walkthrough.

1/ Create a LangSmith account.

2/ Navigate to the Settings page and generate a new API key.

3/ Create a new secret to hold the LangChain API key.

4/ Add the LangSmith configuration and the LangChain API key to the service manifest file.

5/ Once the changes are deployed, the traces will start to show up in LangSmith under the designated project

and you can start to drill down into each model call

Cleaning Up 🧹✨

When you’re done, don’t forget to delete the application

and remove the CloudFormation stack.

If you’re using your own workstation, you may want to delete the Conda environment

Conclusion

In this post, I have shown how to integrate a LangServe project with the typical AWS container services stack. The LangChain CLI makes it simple to develop and test applications locally, while AWS Copilot simplifies the process of building, packaging and deploying the application.

With some minor adjustments, all the steps in this walkthrough can be fully automated and integrated into your CI/CD pipeline to provide continuous feedback to the rest of the organization.

However, this is just the beginning. Getting a generative AI application ready for production is a long and arduous journey. In future posts, I will explore different ways to bridge this gap and bring generative AI products to your end users.

📩 Any questions/feedback? Leave a comment below and let's build together! 💪

References 📚

(LangChain) Introducing LangServe, the best way to deploy your LangChains
(LangChain) LangChain Expression Language
(Anthropic) Claude Prompt Engineering Techniques - Bedrock Edition
(AWS) Introducing AWS Copilot
(AWS) Developing an application based on multiple microservices using AWS Copilot and AWS Fargate
(AWS) AWS Copilot CLI

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Site Terms, Privacy, and more.