AWS and Salesforce Data Integrations

Introduction

Through seamless integrations between AWS and Salesforce, organizations can harness the full potential of both platforms. This powerful combination enables businesses to elevate their customer experiences, supercharge their operations, and unlock advanced capabilities — from real-time analytics to intelligent workflows. By bridging these industry-leading platforms, companies can spot crucial business insights faster and transform them into strategic decisions and meaningful actions.

In this post, we will explore various options for data integration with Salesforce using Salesforce Integration APIs and AWS services. We'll cover factors to consider when choosing a specific integration mechanism. This guide aims to help you navigate the variety of integration choices and key considerations to accelerate your integration journey, helping you select the most optimal approach for secure, performant data flow between AWS and Salesforce platforms. Furthermore, you can leverage CloudFormation templates from the Quick Start section to gain a practical understanding on how to configure AWS services. These templates can help simplify experimentation, and accelerate enablement to achieve seamless connectivity with Salesforce products.

The following table in Figure 1 summarizes the fundamental Salesforce Integration APIs, which will be considered in this post. This guide focuses on integration choices for Sales Cloud and Service Cloud, excluding Data Cloud integration capabilities. For information about limits on Salesforce APIs, follow the "API Limits" links provided in the Resources section at the end of the post.

Image not found

Figure 1. Summary of Salesforce Integration APIs

There are multiple factors that dictate the selection of the integration service and pattern. We'll use these criteria throughout the guide to help you navigate the available integration options and identify the optimal solution:

Data availability requirements
Data volumes
Data filtering and transformation
Enablement through configuration or coding
Native support for data catalog creation and conversion to efficient formats like Parquet

In terms of data availability requirements, we can categorize integrations into three groups:

Live: data virtualization or data query in place.
Real-time or near real-time: this primarily consists of event-driven integrations.
Asynchronous or scheduled: this involves the integration and processing of historical data at regular intervals, aligned with data change cycles.

Regarding data flow direction, we can categorize integrations into two main groups:

Outbound: data moves from Salesforce to external systems (AWS or third-party platforms) as the destination.
Inbound: Data moves from external systems (AWS or third-party platforms) into Salesforce as the destination.

Salesforce integrations offer the flexibility to select a preferred authorization mechanism, which is crucial for ensuring secure data access. The available options include OAuth 2.0 with the following authorization grant types:

Authorization code
JSON Web Token (JWT)
Client credential
Resource owner password credentials which is not recommended due to the risks associated with password exposure

Additionally, Salesforce supports out-of-the-box named credential to connect to AWS services through OAuth 2.0 or AWS Signature Version 4 protocol, which significantly accelerate and simplify integration enablement.

Outbound Integrations from Salesforce

Outbound integrations focus on making Salesforce data available in AWS data lake (such as Amazon S3), data warehouse (Amazon Redshift) or purpose-built databases with microsecond latency to enable analysis and processing across downstream applications and business processes. The next table in Figure 2 lists various options to integrate Salesforce data with a recommended AWS service:

Image not found

Figure 2. Salesforce APIs and relevant AWS services to enable outbound integrations

To determine the most efficient integration service, consider the following decision criteria:

Data availability: Assess whether data needs to be available in real-time or can be synced on a scheduled basis with the desired frequency.
Data volume: Given Salesforce's multi-tenant architecture, the volumetric capabilities of the Salesforce API should be carefully considered. These capabilities can significantly impact the selection of the most efficient API and data movement strategy such as bulkification.
Data transformations, filtering, and enrichment: When integrating data with data lake (Amazon S3) or data warehouse (Amazon Redshift), it's crucial to understand data usage patterns to identify the most efficient data structure and format (CSV, JSON, Parquet). For example: Data stored in S3 can leverage Parquet format with efficient partitions, appropriate file sizes, and compression to maximize effectiveness of queries through Amazon Athena.
Configuration vs coding is another significant consideration to achieve faster time to market and free up resources for business innovation.
Additional non-functional considerations such as native support for data catalog creation or auto-conversion to efficient formats like Parquet.

The next Figure 3 illustrates all outbound Salesforce integrations and corresponding AWS service in a single view, making it easier to visualize the available options. These AWS services, in the context of Salesforce integrations, are explained further in this post.

Image not found

Figure 3. Outbound Salesforce integrations through relevant AWS services

Inbound Integrations into Salesforce

Data often originates from upstream AWS applications, such as websites, microservices or data pipelines. Additionally, AWS analytics services can detect signals and predict trends that require triggering corresponding workflows in Salesforce. The following table in Figure 4 summarizes various options to make data available in Salesforce through a corresponding AWS service:

Image not found

Figure 4. Salesforce APIs and AWS services to enable inbound integrations

The decision points for selecting the most efficient integration approach are similar for both inbound and outbound integrations. Key criteria include data availability, data volumes, and data transformation requirements. The latter is particularly important since Salesforce records must comply with the Salesforce sObject schema. For asynchronous data integrations involving more than 2,000 records, Bulk API 2.0 is generally a preferable candidate. For operations with fewer than 2,000 records, it's often more efficient from Salesforce and AWS standpoints to use "bulkified" synchronous calls via the REST API, such as Salesforce Composite resources to create multiple, unrelated records of the same type. The Salesforce Composite API is also recommended and highly efficient in situations involving cross-reference data operations that rely on multiple records of different types or dependencies (parent-child or lookup relashionships). This can help eliminate inefficient round-trip requests and support atomic transactions, allowing all operations to be committed at once, thus eliminating the need for custom development to achieve such functionality.

The mental model of possible inbound Salesforce integrations APIs and recommended AWS services is depicted in the next Figure 5:

Image not found

Figure 5. Inbound Salesforce integrations through relevant AWS services

Now lets review AWS services to enable integrations with Salesforce in more details.

AWS Services to enable efficient integrations with Salesforce

Amazon AppFlow

Image not found

Figure 6. Amazon AppFlow integrations with Salesforce

Amazon AppFlow is a seamless no-code tool that accelerates the implementation of data integration through configuration, allowing developers to focus more time on working with the actual data. Amazon AppFlow has built-in Salesforce connector which is relying on asynchronous Salesforce Bulk API 2.0 (query, ingestion) and synchronous Salesforce REST API under the hood depending on the data volume. With Amazon AppFlow, moving data back and forth between Salesforce and AWS is an easy and simple process that only takes a few minutes to set up.

Amazon AppFlow is a perfect choice for near real-time integrations allowing to trigger data flows automatically based on Salesforce events such as Change Data Capture and Platform events. Once the flow is activated, whenever a new record is created or an existing record is modified in Salesforce, the data automatically appears in the configured destination, such as an Amazon Redshift table or Amazon S3 bucket. Alternatively, the event can be routed to Amazon EventBridge for ad hoc processing, such as data transformations and filtering based on event content or metadata that Amazon AppFlow does not support.

Amazon AppFlow is particularly well-suited for:

Rapid enablement of data integrations between Salesforce, AWS, or other supported SaaS platforms.
Configurable data integrations with built-in data type conversions and optimizations that don't require data transformations. Amazon AppFlow offers the flexibility to choose JSON, CSV, or Parquet as the file format when transferring data to Amazon S3.
Data anonymization, such as masking Personally Identifiable Information (PII), utilizing Amazon AppFlow's built-in masking feature.
Automatic registration of data in the AWS Glue Data Catalog, which enables immediate search and querying of cataloged data using Amazon Athena.

To enhance security and private connectivity, Amazon AppFlow can utilize AWS PrivateLink for Salesforce connections. This ensures that data transfers are as secure and private as moving data within the boundaries of your trusted network. Additionally, this guidance in Amazon AppFlow documentation covers how to set up authentication mechanism and apply IP address restrictions to maximize security.

AWS Glue

Image not found

Figure 7. AWS Glue integration with Salesforce

AWS Glue is highly-performant AWS managed service with a pay-as-you-go pricing model and a built-in Salesforce connector, simplifying data transfers between Salesforce, AWS services, or other SaaS products. AWS Glue can manage heavy workloads and varying data volumes efficiently due to its auto-scaling capability. AWS Glue allows to create jobs through a visual interface, an interactive code notebook, or with a script editor. The variety of built-in connectors and transformers makes AWS Glue a powerful choice for complex ETL tasks that require advanced data transformations and seamless integration.

There are two patterns to achieve data integration with Salesforce in AWS Glue:

Zero-ETL integration
Near real-time processing
Periodic batch processing of asynchronous or historical data

For Zero-ETL integration, AWS Glue offers a no-code solution fully managed by AWS with native Salesforce connector to extract multiple selected Salesforce sObjects records and load into Amazon Redshift or Amazon SageMaker Lakehouse. By selecting a few settings in the no-code interface, you can quickly set up your zero-ETL integration to automatically ingest and continually maintain an up-to-date replica of your data in the data lake and data warehouse.

For near real-time processing, Salesforce can feed data to a stream processing service like Amazon Kinesis Data Streams using Salesforce HTTP callout through Amazon Kinesis Proxy and then into a connected AWS Glue job streams. Amazon Kinesis Data Streams, combined with AWS Glue job streams, serve as the backbone to streaming ETL applications and processes. As data flows through the streaming ETL process, the data is sent to a destination, such as an Amazon S3 bucket or Amazon Redshift, or can also be simultaneously routed to other purpose-built data stores.

For batch processing, AWS Glue offers a robust solution with native Salesforce connectivity to efficiently handle large volumes of Salesforce sObjects records, and can be triggered on schedule or in response to specific events.

AWS Glue is a seamless service for data integration between Salesforce and data warehouses, data lakes, or other downstream applications to meet the following requirements:

Large-scale data exports and imports involving substantial data volumes or multiple sources and destinations.
ETL jobs with advanced filtering logic and complex data transformations, including data enrichment, merging, aggregation, or disaggregation, effortless data type conversions such as JSON to Parquet.
Automatic registration of data in the AWS Glue Data Catalog, enabling immediate search and querying of cataloged data using Amazon Athena or Amazon Redshift Spectrum.

When working with the Salesforce connector, it's important to be aware of the following limitations:

It only supports Spark SQL. Salesforce SOQL is not supported.
Job bookmarks are not supported.
Supported API versions of a Salesforce connector.

AWS Step Functions

Image not found

Figure 8. AWS Step Functions integrations with Salesforce and AWS services

AWS Step Functions is a low-code workflow orchestration service with a visual workflow designer. It allows you to quickly build reliable data pipelines to process multiple Salesforce records or perform CRUD operations using the built-in drag-and-drop interface available in the AWS Management Console.

Adopt AWS Step Functions to enable the following integration scenarios with Salesforce:

Configurable business workflows that operate individual Salesforce records or trigger Salesforce events via REST API.
Business workflows involving group of records. To minimize roundtrips or execute multiple dependent requests in a single call, you can leverage the Salesforce Composite REST API. A good example is a requirement to support all-or-nothing transactions across nested or parent-child entities, where all records must be committed simultaneously or rolled back if any single record fails..
Data exports and imports by leveraging asynchronous Bulk API 2.0 in conjunction with a helper Lambda function to process payloads greater than 256 KB. Such integrations leverage Step Functions' asynchronous pattern, allowing you to call a service and wait for a response before proceeding to the next step.

When implementing AWS Step Functions for Salesforce integrations, consider the following capabilities and limitations:

Data transformations: Utilize input-output payload manipulation or intrinsic functions to transform data within your workflow
Data filtering and conditional logic: Apply filtering state output or choice workflow routing to apply data filtering dynamically
Resiliency and error handling: Enhance your workflow's resiliency with configurable retry mechanisms and execution history retention for up to 90 days.
Payload size: Be mindful of the 256 KB payload size limitation when designing your workflows.
API timeout: Ensure that Salesforce API callout (request-response) completes within the 60-seconds HTTP Task duration limit for optimal performance.

Amazon EventBridge

Image not found

Figure 9. Amazon EventBridge integrations with Salesforce and AWS services

Amazon EventBridge is a serverless event bus that simplifies building event-driven applications at scale using events generated from your applications, SaaS sources like Salesforce, and AWS services. Amazon EventBridge partner event source integrations allow customers to receive events from over 30 SaaS applications and ingest them into AWS datastores.

You can use Amazon EventBridge to receive events from Salesforce in the following ways:

By using Salesforce's Event Bus Relay feature to receive events directly on an Amazon EventBridge partner event bus.
By configuring a flow in Amazon AppFlow that uses Salesforce as a data source. Amazon AppFlow then sends Salesforce events to Amazon EventBridge by using a partner event bus.
By leveraging AWS Lambda to capture events through Streaming API and routing these events to a dedicated Amazon EventBridge event bus.
By leveraging Amazon API Gateway and configuring HTTP API proxy for direct integration with Amazon EventBridge.

Amazon EventBridge is a great choice in the following scenarios:

Event-driven architectures with real-time and near real-time requirements.
Both source and destination systems have a high degree of availability and fault tolerance.
Filtering based on data as well as metadata attributes, such as processing of events with a particular attribute.
Configurable solution with data transformations that can be achieved through EventBridge input transformer.
"Storage first" serverless patterns where data is written to the bus before any routing or logic is applied. If the downstream service encounters issues, EventBridge implements a retry strategy with incremental back-off for up to 24 hours.

Limitations to consider when designing event-driven architectures with Amazon EventBridge:

Amazon EventBridge events are limited to 256kb. If your event is greater than that, Amazon AppFlow will publish a summary event with pointer to a specified S3 bucket.
Amazon EventBridge requests to an API destination endpoint must have a maximum client execution timeout of 5 seconds. If the target endpoint takes longer than 5 seconds to respond, Amazon EventBridge times out the request. Amazon EventBridge retries timed out requests up to the maximum threshold configured in your retry policy.

AWS Lambda

Image not found

Figure 10. AWS Lambda integrations with Salesforce and AWS services

AWS Lambda provides a powerful mechanism for encapsulating and executing business logic, as well as for communicating with other AWS or third-party services to route, modify, or store processed data. When integrating with Salesforce, AWS Lambda offers multiple options:

Point to point integrations:
- Integrations initiated from Salesforce typically involve making HTTP callouts to invoke HTTP endpoints hosted on AWS.
  - The simplest way to provision an HTTP endpoint on AWS is by using Lambda and enabling the function URL. Function URLs are ideal for scenarios where you need a single-function microservice with an endpoint that doesn’t require the advanced features of Amazon API Gateway mentioned in the next point.
  - Alternatively, you can leverage Amazon API Gateway as a proxy to a AWS Lambda function and take advantage of capabilities like JWT or custom authorizers, client certificates and throttling, request/response validation and transformation, usage plans, caching, custom domain names, built-in AWS WAF support.
- For integrations initiated from AWS, Lambda can make Salesforce REST API calls to query, insert, upsert, or delete Salesforce data. If your use case involves invoking Salesforce’s Composite REST API or Bulk API 2.0, ensure that the Lambda timeout is increased accordingly.
Event-driven integrations: AWS Lambda can also be used as a routing component to capture Salesforce Change Data Capture and Platform events. These events can then be forwarded to services like Amazon EventBridge, Amazon SQS, or Amazon Kinesis Data Streams for further processing and delivery to downstream applications running on AWS or externally.

AWS Lambda is an excellent choice for integration when:

Full control over the business logic is required**:** AWS Lambda allows you to design and manage advanced business logic with full flexibility.
Complex ad-hoc data transformations: In scenarios where you need to perform advanced data transformations, such as lookups and merging data from multiple systems, Lambda provides the capability to handle these operations efficiently and with high performance.
Proxy APIs or microservices: Additional business logic needs to be incorporated before interacting with Salesforce. An example is synchronizing multiple records into Salesforce as a single atomic transaction using the Salesforce Composite REST API, such as creating an invoice along with its associated line items.

It is important to keep the following considerations in mind:

Payload limitations: Be aware that there are limitations on the Lambda invocation payload, which vary depending on whether the integration is synchronous or asynchronous.
Service quota: Always refer to the official documentation to review the specific quotas and limitations associated with AWS Lambda.
Execution Duration: Be mindful of Lambda's maximum execution time in 15 minutes for most integration tasks such as bulkified data and event processing. For longer-running processes, consider scalable services like AWS Glue.

Amazon API Gateway

Image not found

Figure 11. Amazon API Gateway integrations with Salesforce and AWS Services

Using Amazon API Gateway with AWS Lambda as an intermediary layer is common in data integrations, but direct integration with AWS services can simplify data movement and reduce complexity and risk of errors. Amazon API Gateway HTTP APIs enable direct integration with services like Amazon EventBridge, Amazon Kinesis Data Streams, and Amazon SQS without requiring coding and transformation layers. This pattern is a "storage first" serverless pattern and it offers the ability to directly integrate with the mentioned AWS services and persist data in Kinesis data stream, SQS queue or EventBridge event bus before applying any business logic and data processing, increasing resiliency and reliability. The storage first pattern provides native error handling with retry strategy and dead-letter queues (DLQ) support. This approach also allows to leverage HTTP API features such as authorizers, throttling, and enhanced observability, making applications more secure and efficient.

In addition to HTTP APIs, Amazon API Gateway supports REST APIs, allowing direct integration with AWS services like Amazon DynamoDB, Amazon S3, Amazon Redshift and other services via AWS service proxies. This approach can also bypass AWS Lambda, offering a simple, low-latency solution. REST API proxies allow to leverage advanced features like request validation and payload transformations, throttling, enabling you to build secure, scalable applications while simplifying data flow and reducing architectural complexity. REST APIs can accommodate transformation logic using transformation templates written in Velocity Template Language (VTL), and caching settings for requests to reduce the number of calls made to backend endpoint and improve the latency.

Refer to these important considerations when using point-to-point HTTP callouts from Salesforce Apex to APIs hosted on AWS:

Outbound API-based integrations can be invoked synchronously and asynchronously (with @future annotation) on Salesforce side, as described in Salesforce documentation.
When architecting solutions that involve Salesforce making HTTP callouts to external services, it's crucial to understand the associated limits, such as the number of callouts per Apex transaction, request-response timeout, and payload size (6 MB for synchronous Apex or 12 MB for asynchronous Apex).
Before enabling an HTTP callout, select the appropriate authentication protocol and set up a named credential to authenticate calls to the external API. It is highly advisable to use temporary, limited-privilege credentials by selecting ‘Obtain Temporary IAM Credentials via STS’ when configuring Salesforce named credential. This connectivity requires configuring the Amazon API gateway response for expired tokens to return a 400 or 401 HTTP code, allowing Salesforce to refresh the token upon expiration.
To invoke HTTP endpoint hosted on AWS, Salesforce requires whitelisting by adding the remote host to a list of authorized remote sites. Implement automated deployment pipelines that dynamically configure these settings based on the target environment.

Analytics: Amazon QuickSight and Amazon Athena

Image not found

Figure 12. Amazon QuickSight and Amazon Athena integrations with Salesforce

Amazon QuickSight is fast, easy-to-use, cloud-powered business analytics service at a fraction of the cost of traditional BI solutions. Amazon QuickSight offers a dedicated connector for Salesforce datasets, enabling seamless data retrieval from Salesforce objects and reports. Once the data is imported, it can be stored in SPICE, QuickSight’s in-memory optimized calculation engine, designed for fast and ad hoc data visualization.

With Amazon QuickSight, data analysts, building dashboards, can further enhance their productivity by leveraging generative capabilities. By simply asking Amazon Q, users can compose visuals, build calculations, and refine their dashboards using natural language, significantly saving time and reducing the number of clicks required to achieve the desired results.

Amazon Athena is a serverless, interactive analytics service that offers a simplified and flexible way to analyze petabytes of data directly in place. It supports the Athena Query Federation SDK, allowing you to create custom data source connectors for platforms such as Salesforce. Alternatively, you can use data integrations with Amazon AppFlow or AWS Glue to import data into Amazon S3 and register it in the AWS Glue Data Catalog automatically, making it queryable through Amazon Athena.

Quick Start

To accelerate adoption and enablement of AWS services for setting up integrations with Salesforce quickly, you can leverage CloudFormation templates with some integration examples:

Image not found

To enable Salesforce integrations, create Salesforce Connected App, where it is required, and enable corresponding OAuth Settings referring to the below table in Figure 13:

Image not found

Figure 13. Salesforce connected app settings

Conclusion

In this document we explored various integration capabilities between AWS and Salesforce. Through the wide range of integration options and directions (as illustrated in the Figure 14), organizations can create interconnected ecosystems that unlock the full potential of both platforms, enabling new ways to innovate, and build more customer-focused and data-centric solutions. By implementing a well-designed integration architecture, organizations can expand Salesforce and AWS capabilities while still leveraging the value of each platform independently. The insights shared in this post can help navigate tradeoffs to choose the most efficient integration approach, maximizing benefits from built-in features to advanced capabilities, minimizing unnecessary overheads and reducing the total cost of ownership of the final solution.

Image not found

Figure 14. Supported integration flow Salesforce → AWS → Destination

Resources

AWS services supporting integrations with Salesforce:

Amazon AppFlow: Salesforce connector | Usage
Amazon Glue: Salesforce connector | Query | Ingestion
Amazon API Proxy with Kinesis Data Stream
Amazon EventBridge: Partner event bus for Salesforce events (CDC, Platform events)
Amazon QuickSight: Salesforce connector

Salesforce APIs:

Asynchronous Salesforce Bulk API 2.0: API Limits | Query | Ingestion
Synchronous Salesforce REST API: API Limits | Query | GetUpdatedIds | GetDeletedIds
Synchronous Salesforce Composite REST API: Sync of related records | Bulk import | Supported APIs
Salesforce Analytics REST API: Limits | Download JSON report - up to 2K records | Download excel report - 2K+ records
Salesforce Event Bus Relay
Batch Apex Job with HTTP callout
Salesforce Private Connect
Salesforce integration APIs
Processing large amount of data: part 1 | part 2
Default Platform Event Allocations for Event Publishing and Delivery

Disclaimer

SOAP based integrations are out of scope of this document due to focus on more modern solutions. PubSub API is excluded from this document due to limited integration options.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

AWS and Salesforce Data Integrations

The full list of integration options and key considerations to accelerate your integration enablement

Introduction

Outbound Integrations from Salesforce

Inbound Integrations into Salesforce

AWS Services to enable efficient integrations with Salesforce

Amazon AppFlow

AWS Glue

AWS Step Functions

Amazon EventBridge

AWS Lambda

Amazon API Gateway

Analytics: Amazon QuickSight and Amazon Athena

Quick Start

Conclusion

Resources

Disclaimer

2 Comments