
AWS and Salesforce Data Integrations
The full list of integration options and key considerations to accelerate your integration enablement
- Data availability requirements
- Data volumes
- Data filtering and transformation
- Enablement through configuration or coding
- Native support for data catalog creation and conversion to efficient formats like Parquet
- Live: data virtualization or data query in place.
- Real-time or near real-time: this primarily consists of event-driven integrations.
- Asynchronous or scheduled: this involves the integration and processing of historical data at regular intervals, aligned with data change cycles.
- Outbound: data moves from Salesforce to external systems (AWS or third-party platforms) as the destination.
- Inbound: Data moves from external systems (AWS or third-party platforms) into Salesforce as the destination.
- Resource owner password credentials which is not recommended due to the risks associated with password exposure
- Data availability: Assess whether data needs to be available in real-time or can be synced on a scheduled basis with the desired frequency.
- Data volume: Given Salesforce's multi-tenant architecture, the volumetric capabilities of the Salesforce API should be carefully considered. These capabilities can significantly impact the selection of the most efficient API and data movement strategy such as bulkification.
- Data transformations, filtering, and enrichment: When integrating data with data lake (Amazon S3) or data warehouse (Amazon Redshift), it's crucial to understand data usage patterns to identify the most efficient data structure and format (CSV, JSON, Parquet). For example: Data stored in S3 can leverage Parquet format with efficient partitions, appropriate file sizes, and compression to maximize effectiveness of queries through Amazon Athena.
- Configuration vs coding is another significant consideration to achieve faster time to market and free up resources for business innovation.
- Additional non-functional considerations such as native support for data catalog creation or auto-conversion to efficient formats like Parquet.
- Rapid enablement of data integrations between Salesforce, AWS, or other supported SaaS platforms.
- Configurable data integrations with built-in data type conversions and optimizations that don't require data transformations. Amazon AppFlow offers the flexibility to choose JSON, CSV, or Parquet as the file format when transferring data to Amazon S3.
- Data anonymization, such as masking Personally Identifiable Information (PII), utilizing Amazon AppFlow's built-in masking feature.
- Automatic registration of data in the AWS Glue Data Catalog, which enables immediate search and querying of cataloged data using Amazon Athena.
- Zero-ETL integration
- Near real-time processing
- Periodic batch processing of asynchronous or historical data
- Large-scale data exports and imports involving substantial data volumes or multiple sources and destinations.
- ETL jobs with advanced filtering logic and complex data transformations, including data enrichment, merging, aggregation, or disaggregation, effortless data type conversions such as JSON to Parquet.
- Automatic registration of data in the AWS Glue Data Catalog, enabling immediate search and querying of cataloged data using Amazon Athena or Amazon Redshift Spectrum.
- It only supports Spark SQL. Salesforce SOQL is not supported.
- Job bookmarks are not supported.
- Supported API versions of a Salesforce connector.
- Configurable business workflows that operate individual Salesforce records or trigger Salesforce events via REST API.
- Business workflows involving group of records. To minimize roundtrips or execute multiple dependent requests in a single call, you can leverage the Salesforce Composite REST API. A good example is a requirement to support all-or-nothing transactions across nested or parent-child entities, where all records must be committed simultaneously or rolled back if any single record fails..
- Data exports and imports by leveraging asynchronous Bulk API 2.0 in conjunction with a helper Lambda function to process payloads greater than 256 KB. Such integrations leverage Step Functions' asynchronous pattern, allowing you to call a service and wait for a response before proceeding to the next step.
- Data transformations: Utilize input-output payload manipulation or intrinsic functions to transform data within your workflow
- Data filtering and conditional logic: Apply filtering state output or choice workflow routing to apply data filtering dynamically
- Resiliency and error handling: Enhance your workflow's resiliency with configurable retry mechanisms and execution history retention for up to 90 days.
- Payload size: Be mindful of the 256 KB payload size limitation when designing your workflows.
- API timeout: Ensure that Salesforce API callout (request-response) completes within the 60-seconds HTTP Task duration limit for optimal performance.
- By using Salesforce's Event Bus Relay feature to receive events directly on an Amazon EventBridge partner event bus.
- By configuring a flow in Amazon AppFlow that uses Salesforce as a data source. Amazon AppFlow then sends Salesforce events to Amazon EventBridge by using a partner event bus.
- By leveraging AWS Lambda to capture events through Streaming API and routing these events to a dedicated Amazon EventBridge event bus.
- By leveraging Amazon API Gateway and configuring HTTP API proxy for direct integration with Amazon EventBridge.
- Event-driven architectures with real-time and near real-time requirements.
- Both source and destination systems have a high degree of availability and fault tolerance.
- Filtering based on data as well as metadata attributes, such as processing of events with a particular attribute.
- Configurable solution with data transformations that can be achieved through EventBridge input transformer.
- "Storage first" serverless patterns where data is written to the bus before any routing or logic is applied. If the downstream service encounters issues, EventBridge implements a retry strategy with incremental back-off for up to 24 hours.
- Amazon EventBridge events are limited to 256kb. If your event is greater than that, Amazon AppFlow will publish a summary event with pointer to a specified S3 bucket.
- Amazon EventBridge requests to an API destination endpoint must have a maximum client execution timeout of 5 seconds. If the target endpoint takes longer than 5 seconds to respond, Amazon EventBridge times out the request. Amazon EventBridge retries timed out requests up to the maximum threshold configured in your retry policy.
- Point to point integrations:
- Integrations initiated from Salesforce typically involve making HTTP callouts to invoke HTTP endpoints hosted on AWS.
- The simplest way to provision an HTTP endpoint on AWS is by using Lambda and enabling the function URL. Function URLs are ideal for scenarios where you need a single-function microservice with an endpoint that doesn’t require the advanced features of Amazon API Gateway mentioned in the next point.
- Alternatively, you can leverage Amazon API Gateway as a proxy to a AWS Lambda function and take advantage of capabilities like JWT or custom authorizers, client certificates and throttling, request/response validation and transformation, usage plans, caching, custom domain names, built-in AWS WAF support.
- For integrations initiated from AWS, Lambda can make Salesforce REST API calls to query, insert, upsert, or delete Salesforce data. If your use case involves invoking Salesforce’s Composite REST API or Bulk API 2.0, ensure that the Lambda timeout is increased accordingly.
- Event-driven integrations: AWS Lambda can also be used as a routing component to capture Salesforce Change Data Capture and Platform events. These events can then be forwarded to services like Amazon EventBridge, Amazon SQS, or Amazon Kinesis Data Streams for further processing and delivery to downstream applications running on AWS or externally.
- Full control over the business logic is required**:** AWS Lambda allows you to design and manage advanced business logic with full flexibility.
- Complex ad-hoc data transformations: In scenarios where you need to perform advanced data transformations, such as lookups and merging data from multiple systems, Lambda provides the capability to handle these operations efficiently and with high performance.
- Proxy APIs or microservices: Additional business logic needs to be incorporated before interacting with Salesforce. An example is synchronizing multiple records into Salesforce as a single atomic transaction using the Salesforce Composite REST API, such as creating an invoice along with its associated line items.
- Payload limitations: Be aware that there are limitations on the Lambda invocation payload, which vary depending on whether the integration is synchronous or asynchronous.
- Service quota: Always refer to the official documentation to review the specific quotas and limitations associated with AWS Lambda.
- Execution Duration: Be mindful of Lambda's maximum execution time in 15 minutes for most integration tasks such as bulkified data and event processing. For longer-running processes, consider scalable services like AWS Glue.
- Outbound API-based integrations can be invoked synchronously and asynchronously (with @future annotation) on Salesforce side, as described in Salesforce documentation.
- When architecting solutions that involve Salesforce making HTTP callouts to external services, it's crucial to understand the associated limits, such as the number of callouts per Apex transaction, request-response timeout, and payload size (6 MB for synchronous Apex or 12 MB for asynchronous Apex).
- Before enabling an HTTP callout, select the appropriate authentication protocol and set up a named credential to authenticate calls to the external API. It is highly advisable to use temporary, limited-privilege credentials by selecting ‘Obtain Temporary IAM Credentials via STS’ when configuring Salesforce named credential. This connectivity requires configuring the Amazon API gateway response for expired tokens to return a 400 or 401 HTTP code, allowing Salesforce to refresh the token upon expiration.
- To invoke HTTP endpoint hosted on AWS, Salesforce requires whitelisting by adding the remote host to a list of authorized remote sites. Implement automated deployment pipelines that dynamically configure these settings based on the target environment.
- Amazon AppFlow: Salesforce connector | Usage
- Amazon EventBridge: Partner event bus for Salesforce events (CDC, Platform events)
- Amazon QuickSight: Salesforce connector
- Salesforce Analytics REST API: Limits | Download JSON report - up to 2K records | Download excel report - 2K+ records
- Salesforce Event Bus Relay
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.