Lessons Learned in Using AWS OpenSearch Service
Practical Lessons in optimising AWS OpenSearch
Published Dec 12, 2023
Introduction:
AWS OpenSearch service is a fully managed, distributed, community driven, Apache 2.0 licensed analytics suite search service that allows developers to easily integrate powerful search capabilities like real-time application monitoring, log analytics and website search into any application.
Problem Statement:
In our workload, customer wanted to improve the search performance, query performance, anomaly detection, introduce security analytics and bring lot of improvements in Monitoring and Observability.
For the purpose of study we derived the following differences between AWS OpenSearch and Elasticsearch, following is what we registered:
AWS OpenSearch vs Elasticsearch:
AWS OpenSearch has capabilities such as k-nearest neighbours (KNN) search, SQL, Anomaly Detection, ML Commons, Trace Analytics as well.
Why AWS OpenSearch:
Before I jump on to the lessons – I also wanted to describe why we choose AWS OpenSearch and not Elasticsearch, following are the reasons:
- In our use case wanted to have more ease of usability and maintainability, searching efficiency, resiliency and data protection all were really very important.
- We wanted to have cluster configuration which automatically adjusts and improve performance – this is where ease of using AWS OpenSearch Auto-Tune feature.
- KNN query performance – we did a lot of POCs to validate this between OpenSearch and Elasticsearch. We found out that KNN queries were almost equivalent to Elasticsearch.
- Automated Monitoring with AWS OpenSearch was also a deciding factor in the decision making process.
- Integration with other AWS services like AWS Sagemaker, Lambda, Glue etc. We wanted to build a proper data pipelines which helped us to rejig our analytics in terms of security etc.
Now let’s jump on to lessons learnt. Following is the list:
1. Planning and Design Efficacy:
- Originally customer was using cache and SQL for searching. During the planning phase itself we carefully evaluated the official documentation and noted impact of introducing AWS OpenSearch in existing workflows. Make a careful note of the impacts.
- For the sake of cost optimization and betterment of performance before diving into implementation:
- Cluster Architecture – This is very important for future state, network costs, resiliency and monitoring.
- Data Volume – we missed this aspect and practically wrongly calculated the storage costs.
- Indexing Patterns
- Expected Query Workloads
- This will help you choose the appropriate instance types, storage options, and network configurations to ensure optimal performance.
- Network planning also plays a critical role in success of the introduction of new AWS component.
2. Index Management:
- The toughest lesson we learnt was the proper index management processes. Proper schema designs and providing appropriate data types and normalize data by organizing data into separate indexes for logically distinct entities to avoid redundancy and improve search relevance as well.
- Too much normalization is dangerous as well. If the need arises you should create duplicate indexes for better querying specifically for join purposes.
- Use nested objects to store the common information.
- Use parameters like “number_of_shards”, “number_of_replicas” like parameters to control the number of primary and replica shards for indexes. Choose the appropriate values based on your data size, query load, and hardware resources.
- “index.codec” setting determine how the stored fields on an index are compressed and stored on disk. The parameter specifies the compression algorithm. This setting impacts the index shard size and operation performance.
- Avoid using parameters like “zstd”, “zstd_no_dict” compression codecs for better KNN or Security Analytics indexes.
- Reduced segment count, use one segment per shard which provides the optimal performance with respect to search latency.
- Configure multiple shards to avoid giant shards and make it more parallel. Control the number of segments by choosing a larger refresh interval, or during indexing by asking OpenSearch to slow down segment creation by disabling the refresh interval.
- Efficient indexing and query optimization are essential for achieving optimal performance. Understand your data access patterns and design mappings that cater to specific query requirements.
- We used refresh_interval parameter to control how often the indexes should be refreshed for write heavy workloads.
- We used Batch Jobs to periodically rebuilt and optimize our indexes as well to keep them running smoothly.
- Utilize features like index lifecycle management to manage data retention and optimize storage costs.
3. Filtering and Faceting:
- Filtering and faceting are critical features in OpenSearch that allow users to narrow down search results quickly. However, if not implemented correctly, they can significantly impact performance.
- To address this, we implemented caching for frequently used filters and facets, which improved performance and reduced the load on our servers.
- Along with caching we leveraged features like tuning search settings, optimizing index configurations along with caching to improve the query performance.
4. Searchable Metadata:
- This is probably our biggest learning which we learnt really hard way is that it’s absolutely crucial to define clear, concise metadata for the searchable contents.
- Metadata should include specific fields like title, description, keywords. All the information that will help users to find relevant information fast. Without proper metadata - search results will be inaccurate.
- Having proper Meta information in searchable data will help better adoption rates.
5. Monitoring and Logging:
- The most important thing is to setup and monitor key metrics and logs proactively to identify and address issues before they impact end users.
- We leveraged Kibana and CloudWatch to keep track of our dashboards. We fetched valuable insights like KNN metrics, SQL metrics, UltraWarm metrics, Cold storage metrics, Asynchronus Search metrics, Anomaly detection metrics etc.
- Elevate the key elements, events, queries and activities to log books for better monitoring and troubleshooting.
- Setup critical alerts like query performance, Disk usage, CPU usage etc. as indicators. In one critical scenario - we identified a tricky I/O issue using this alert, which was only occurring for certain data situations.
- Make sure to mask PHI/PII data in logging to avoid any security glitches, we used our own accelerator/component (Data Deidentification) to encrypt and obfuscate the data in streams.
- Track Cluster Configuration specifically in the events of changes made to the OpenSearch service domain to understand their overall effects on performance and stability.
- Utilize Anomaly Detection feature to identify unusual patterns in metrics to proactively investigate the potential bugs and performance bottlenecks.
6. Security and Access Control:
- Enable comprehensive logging mechanism to identify potential security threats.
- Regularly evaluate the loggers
- Apply security patches and updates promptly to address vulnerabilities and security flaws.
- Regular security audits to identify and address any vulnerabilities within OpenSearch environment.
- A very common and simple fix is to enable encrypt data at REST and in transit to protect from unauthorized access.
- Retain logs for a longer periods will help you in investigating security incidents and perform retrospectives.
- The most paramount need would be to secure data within OpenSearch cluster. IAM for access control will help in ensuring that only authorized entities can interact with the cluster.
Conclusion:
Hope above lessons would help you as well to effectively, optimize performance, enhance security, and derive maximum value from AWS OpenSearch managed service.
Along with above notes, regularly review best practices and stay informed about updates are the 2 most critical components of succeeding in utilization of AWS OpenSearch service.