
Enhanced document search with full document retrieval after chunk-based search
Getting started with enhancing AI-generated responses for intricate and nested data
Gene
Amazon Employee
Published Apr 4, 2025
Many GenAI applications rely on chunking-based retrieval to fetch relevant information. While effective in simpler use cases, this method often falls short when handling intricate, highly-structured data, especially in documents like YAML. Chunking can unintentionally omit critical context, leading to incomplete or inaccurate AI-generated responses. For documents that involve complex relationships between elements, such as YAML files, maintaining this context is crucial for accurate understanding and generation.
This guide introduces an advanced, end-to-end query processing system designed to overcome the limitations of traditional chunk-based retrieval. By adopting Full Document Retrieval (FDR), our approach ensures that the entire document is retrieved and provided to the AI model, rather than relying on fragmented chunks. This method guarantees that the model has access to all relevant context, enabling it to generate more comprehensive and accurate answers.
Here is a high-level visual diagram illustrating the process when the full document is retrieved after querying similar chunks:

Full Document Retrieval After Chunk-Based Search is an approach designed to improve the effectiveness of document retrieval systems by ensuring that responses are generated with complete context. Traditionally, document retrieval processes involve breaking down large documents into smaller chunks, which are retrieved based on relevance to a user's query. However, these chunks often fail to provide the complete context necessary for accurate and nuanced responses, especially in the case of complex or highly detailed inquiries. This method overcomes that limitation by mapping the chunks back to full documents, enabling the system to retrieve comprehensive context and enhance the overall quality of the responses generated by large language models (LLMs).
- Step 1: The user inputs a query.
- Step 2: The system retrieves relevant document chunks using search tools like Amazon Kendra or Bedrock.
- Step 3: Instead of using these chunks alone, the system maps them to the corresponding full documents stored in an S3 bucket.
- Step 4: The complete documents are retrieved and used as the context for LLMs.
- Step 5: The LLM processes the full context to generate accurate, comprehensive, and well-informed responses.
This method not only ensures that all relevant information is available, but it also minimizes the risk of omitting crucial details, a common issue with traditional chunk-based retrieval. By leveraging complete documents, it allows the AI to respond with higher precision, making it especially valuable in scenarios where nuanced understanding is essential.
Now you can use the full_context and send it to the prompt and invoke the model to get the response.
Our system is engineered to effectively handle structured files like YAML and is optimized for batch query processing. This makes it well-suited for large-scale data retrieval and response generation, seamlessly integrating into existing workflows. It retrieves relevant documents, generates responses powered by Large Language Models (LLMs), and outputs results that include comparisons to ground truth and performance evaluations, ensuring a reliable assessment of the model's accuracy.
By eliminating the risks associated with missing context, this system ensures that AI-generated responses are more complete and reliable. Whether you’re working with YAML, JSON, or other complex data formats, Full Document Retrieval provides a holistic view of the document, enhancing the quality and accuracy of the model's output. If you’ve ever encountered AI responses that lacked essential context, this solution will significantly improve the reliability and depth of your results, transforming how your applications generate insights from complex documents! 🚀
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.