Detailed summaries and high-quality content creation with genAI

Have you ever had trouble getting long, high-quality answers from genAI models?

As you may have notice, AI models have a tendency for replying with relatively short answers, which for some use cases is totally the opposite of what we need as an output. If we simply use prompt engineering (e.g. "answer me in more than X words...") this usually doesn't solve the problem, either because the model doesn't care about the prompts or because it does care but produces inconsistent answers just to meet the required number of words.

This is specially an issue for a common use case like summarizations. Let's say we have a technical document of more than 100 pages and we want to get a detailed summary of 10 pages. If we simply ask the model to give us a summary, we will get only a few paragraphs that do not contain much detailed information.

Let's see an example:

Image not found

In this case we asked the model to resume "The treasure island" a 140+ pages novel from Robert Louis Stevenson. While the content is accurate, the summary is very sparse; a couple of paragraphs do not seem to be enough to explain the content of a 100+ pages book!

Like this one there are many other uses cases that can require high quality and long content creation... for all those there is a method that you can use in order to get this desired outcome. In this blog post we will focus on the summarization example, but a similar architecture can be used for other use cases just by making some minor modifications. Let's see how to do this.

Types of summarization

Let's explain first the different types of summarizations that we can use in generative AI.

According to Langchain, one of the most popular frameworks for generative AI application development, there are two main summarization techniques:

Stuff: when all the content of the document fits in the model context window. You just simply pass the content and prompt the model to summarize it.

Image not found

Map-Reduce: when the content doesn't fit the context window. This technique split the content in several chunks, then each chunk is summarized and a final call to the model creates the final summary using the summaries of the chunks.

Image not found

In the first section I showed an example using the stuff technique. Let's see what a example using map-reduce looks like:

Image not found

As you can see, the result is somewhat better than in the first example, but in the end the final answer is not long enough. Because the final summary is generated by just one final prompt of the model this is always going to be relatively short, as discussed in the introduction generative AI models don't like to talk too much in their answers!

Sectioning the document

Let's talk about how we can overcome this problem. It is clear that with just one final prompt we will never get a detailed and extensive answer, so what we can do is to combine several prompts to obtain our the final result.

This sectioning technique is similar to the map-reduce one, but the main difference is that the final output is not generated by just one model prompt instead we ask the model to create several summaries for specific sections of the document and then we join together all those summaries and have our final output.

Have a look at the diagram to understand better how this works:

Image not found

There are two main calls to the generative AI model: first we ask about the main sections of the document. Then we ask the model to create a a summary of each of the sections.

Two main improvements regarding the previous techniques:

Each summary is generated with all the content of the original text. With map reduce we were generating the summaries only with the information of the chunk so all that contextual info was lost.
For the final summary we don't use a generative AI model we just simply join all the sections. This allows us to have responses as long as we need.

One main downside:

Token consumption is much higher with this technique as we are passing the whole context as input in every section summary. Good news is that we can use some of the newest small models that are really cheap and optimized for this kind of tasks (for example Claude Haiku from Anthropic!). With this we can have a really cost-effective solution that provides high quality content.

Example code

Let's explore how we can implement this type of summarization in a python program. We will use langchain, as the generative AI applications framework, and Amazon Bedrock, as the model provider service. We are gonna use Claude 3 as mode, feel free to change it and to choose a model of your preference.

First, we initialize the language model in langchain:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import json
import os
import sys
from langchain_community.chat_models import BedrockChat
from langchain_core.messages import HumanMessage
import boto3

bedrock_runtime = boto3.client('bedrock-runtime')

llm_chat_2 = BedrockChat(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    model_kwargs={"temperature": 0.1},
    client=bedrock_runtime,
    region_name="us-west-2"
)

Load the document using PyPDF library:

1
2
3
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("docs/treasure_island.pdf", extract_images=False)

We obtain the sections of the document in a first model call:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from langchain_core.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain

prompt_template = """Identify the main sections and subsections of the following document. Include the page number where each section and subsection starts and ends. Provide the output in the same language as the original document:
<example>
"Section 1: Title of Section 1 (pages 1-5)",
"Section 2: Title of Section 2 (pages 6-10)",
.... rest of sections
</example>
"{text}"
List of sections and subsections:"""
    
prompt = PromptTemplate.from_template(prompt_template)

# Define LLM chain

llm_chain = LLMChain(llm=llm_chat_2, prompt=prompt)

# Define StuffDocumentsChain
stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")

docs = loader.load()
documento = stuff_chain.run(docs)

Then, we convert the sections output in a python list:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List
import re

# Defining the desired output format
class DocumentoSecciones(BaseModel):
    sections: List[str] = Field(description="List of document sections")

# Creating the parser
parser = PydanticOutputParser(pydantic_object=DocumentoSecciones)

prompt_template = """You will be provided with a list of sections from a document. Create a detailed list that includes all sections and subsections, with their respective page ranges.

Important Instructions:

Prioritize subsections over general sections.
If a general section is completely divided into subsections, omit the general section.
If there are parts of a general section not covered by subsections, include only those parts.
For sections without specific numbering, use 'N/A' as the page range.
If you are unsure of the page where a section ends, indicate that it ends on the starting page of the next section.
{format_instructions}

<sections> {text} </sections>
Make sure the output complies with the specified format and follows the given instructions. Provide the output in the same language as the original document.
"""

prompt = PromptTemplate(
    template=prompt_template,
    input_variables=["text"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

# Define LLM chain
llm_chain = LLMChain(llm=llm_chat_2, prompt=prompt)

# Run the chain
resultado = llm_chain.run(documento)

def extract_json(text):
    json_match = re.search(r'\{[\s\S]*\}', text)
    if json_match:
        return json_match.group()
    return None

try:
    json_str = extract_json(resultado)
    if json_str:
        parsed_json = json.loads(json_str)
        lista_secciones = parsed_json.get('secciones', [])
    else:
        raise ValueError("Not valid JSON in the response")
except json.JSONDecodeError as e:
    print(f"Decoding error JSON: {e}")
    print("Original output:", resultado)
    lista_secciones = []
except Exception as e:
    print(f"Processing error: {e}")
    print("Original output:", resultado)
    lista_secciones = []

And finally we obtain and join all the section summaries. Here we use the ThreadPoolExecutor in Python to perform all those model calls in parallel and save a massive amount of waiting time for the final response:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
from concurrent.futures import ThreadPoolExecutor, as_completed

def summarize_section(seccion, docs, llm_chat):
    prompt_template = f"""You are an assistant specialized in summarizing documents. Your goal is to help users quickly understand the most important points of a document, including key numerical information, main outcomes, general content etc.
    
    You will be provided with a complete document and you will have to perform a detailed summary of only the section indicated below. For this summary, you can use the context information from the entire document, but the final result must be only the summary of the specific section indicated.
    
    Section to summarize: {seccion}
    Document: "{{text}}"
    
    Important Instructions:
    
    Provide the content of the summary directly, without additional introduction or conclusion.
    Do not repeat the section title in the summary.
    Structure the summary in short and concise paragraphs.
    Use bullets to list key points or important numerical data.
    Make sure to include all relevant and numerical information from the section.
    Maintain a professional and objective tone.
    Do not use phrases like "Summary of the section" or similar.
    Provide the output in the same language as the original document.
    
    Summary:"""
    
    prompt = PromptTemplate.from_template(prompt_template)
    
    llm_chain = LLMChain(llm=llm_chat_2, prompt=prompt)
    stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")
    
    summary_section = stuff_chain.run(docs)
    return seccion, summary_section

# Assuming 'loader' and 'llm_chat' are defined earlier in your code
docs = loader.load()

# Use ThreadPoolExecutor to run tasks in parallel
with ThreadPoolExecutor() as executor:
    futures = [executor.submit(summarize_section, seccion, docs, llm_chat) for seccion in lista_secciones]
    
    # Collect results in order
    results = []
    for future in as_completed(futures):
        results.append(future.result())

# Sort results to maintain original order
results.sort(key=lambda x: lista_secciones.index(x[0]))

# Combine results with improved formatting
output_final = ""
for seccion, summary in results:
    output_final += f"## **{seccion}**\n\n{summary.strip()}\n\n---\n\n"

print(output_final)

Build your own streamlit application

Do you want to bring this code into a working application? No worries! I have also created a streamlit app that displays a webpage to upload documents and start a high quality summarization job.

Here you can access the repo link to explore the code and deploy the app in your local environment: high_quality_summarization. It executes in the backend the code shared in the previous section.

This is the final result for a high quality summary of the same example document, "The treasure island" novel:

Image not found

As with these two there are another 35 more sections in this final output. It generates a 10 page long summary with all the detailed information about the book.

This type of content creation is specially useful for other use cases like technical documents insights extractions. See what's the result for a 160 pages long "Energy performance certificates in buildings" study:

Image not found

And this is just one of the 18 sections summaries generated.

Conclusion!

Generative AI models are really powerful tools if we know how to use them properly. In this example we have seen how to overcome one of the main issues with current models: lack of detailed and lengthy responses.

By applying some rapid engineering techniques as well as some application logic, we can make multiple calls to a model and stitch the final results together into a long, high-quality final response.

Resources

Sectioning summarization video (by Sam Witteveen)
Prompt engineering guide
Amazon Bedrock start guide

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Detailed summaries and high-quality content creation with genAI

Get really long, detailed, and accurate answers from generative AI models (Reference code and Streamlit application provided!)

Types of summarization

Sectioning the document

Example code

Build your own streamlit application

Conclusion!

Resources

2 Comments