Streamlining Catering Orders with LLM
Discover how Claude 3’s advanced visual capabilities combined with Retrieval-Augmented Generation (RAG) can transform catering businesses.
Rajnish
Amazon Employee
Published Oct 15, 2024
In this blog post, we explore how Large Language Models (LLMs), specifically the Claude 3 model, can enhance catering order management through Retrieval-Augmented Generation (RAG). While this blog isn't a production-level proof of concept (POC), it showcases the art of possibilities for the catering industry.
Any catering platform acts as a bridge between supply and demand. On the supply side, there are food suppliers like restaurants, while on the demand side, there are customers such as corporate offices and individuals who need catering services. Catering businesses are responsible for managing diverse requests, ranging from small private events to large corporate gatherings.
One major challenge catering platforms encounter is the lack of detailed food recipe information (ingredients) from various restaurants, which makes it difficult to recommend personalized menus tailored to specific dietary preferences, event themes, and types.
What is RAG and Why It Matters?
Retrieval-Augmented Generation (RAG) is a hybrid architecture that combines retrieval mechanisms with generative capabilities. It empowers LLMs to access external data sources and produce context-aware, accurate, and detailed responses.
For example, catering businesses often receive visual or pictorial representations of food items—images without detailed recipe descriptions. These visual inputs lack specific ingredient-level information, making it difficult for a catering platform to determine food ontology, nutritional data, or align specific dietary preferences. Leveraging a RAG-based LLM can bridge this gap by incorporating both image analysis and knowledge retrieval to recommend food items based on dietary restrictions, preferences, and even holiday themes
Claude 3's Visual Capabilities
The Claude 3 model has enhanced visual understanding capabilities, allowing businesses to extract details from food images. These details may include ingredient visibility, cooking style, garnishing, or even determining potential allergens—which are often overlooked by human operators. LLMs are also trained on large datasets, allowing them to provide insights that might not be visible in the picture.
Consider the following scenario: A catering business receives a collection of images representing menu items from different restaurants. Using Claude 3's visual capabilities, the catering platform can:
- Receive visuals of food items: The catering business receives images of food items from different restaurants or clients.
- Identify recipe ontology: Analyze these images to identify the ingredients, nutritional information, dietary restrictions, culinary techniques. This can be done using LLM models with visual capabilities and it’s past learning from training data.
- Build food metadata: The catering business can incorporate a human-in-the-loop approach to verify and edit the data, if necessary, before storing the food metadata.
- Analyze user's food preferences: User preferences are taken into account, including dietary restrictions, allergen information, budget etc.
- Recommend food items: Based on the metadata and user preferences, Claude 3 can recommend suitable dishes. For example, if a customer has nut allergies, Claude 3 suggests similar nut-free dishes by retrieving recipe data from an external source.
The combination of these capabilities provides an opportunity to build a smarter, more intuitive catering recommendation system.
Architecture Overview
In a typical implementation, the architecture involves multiple components:
- User input: A Streamlit app (an open-source Python framework) allows users to upload food images and provide dietary preferences.
- Storage: Images are stored in an Amazon S3 bucket for easy retrieval and management.
- API gateway: Amazon API Gateway handles communication between the Streamlit app and backend services.
- Lambda function: AWS Lambda retrieves images from Amazon S3 and sends them for processing, acting as the orchestrator.
- Amazon Bedrock: Amazon Bedrock analyzes the images using Claude 3, extracting details like ingredients, cuisine type, and making menu recommendations based on user preferences.
Image Analysis for Food Ontology Extraction: This snippet shows how the system processes images to identify food components. This can be tailored to meet your requirement. Here an example of prompt to get output in csv and Json format. Please feel free to experiment with this.
Personalized Menu Recommendation: This snippet highlights how the model can create a personalized catering menu based on user inputs and food metadata. Please note that in this experimental setup, pricing information was generated based on the LLM’s past training data. In a real-world implementation, pricing should be dynamically retrieved through the Retrieval-Augmented Generation (RAG) framework from a live, real-time system to ensure accuracy and up-to-date information.
Data Verification and Metadata Storage: This can be a front end where system involves human intervention for verification before finalizing food metadata. The GitHub repo does not have this code.
UI Preview
The code is available on GitHub : GitHub Repository
Considerations
While RAG and Claude 3 provide a compelling solution, certain challenges must be taken into account for real-world implementations:
- Model Accuracy: The accuracy of ingredient recognition may vary based on the quality of the image and past training data. Prompt engineering or fine-tuning the model with relevant data is essential for consistent performance. In my experience, inaccurate ingredient detection has led to incorrect recommendations, particularly when dealing with complex dishes. Continuous prompt engineering or fine-tuning can help mitigate these issues and improve user experiences. Additionally, most foundation models have limitations in performing accurate mathematical computations. For precise calculations, it is recommended to use a reliable calculator
- Vector Database: A vector database is a specialized type of database designed to store, manage, and query high-dimensional vector data, commonly used in artificial intelligence (AI). Storing recipe metadata in a vector database allows efficient retrieval during the recommendation process. This approach ensures that recommendations are contextually accurate and quick. I have used a csv file for simplicity.
- Embedding Generation: Embedding Generation refers to the process of converting data, such as words, images, or other types of information, into numerical vectors that capture their semantic meaning and relationships. Using embeddings to represent recipes and user preferences helps in comparing and finding the most relevant matches. This process makes the recommendation engine more robust.
- Cost of Ownership: Implementing LLMs can be resource intensive. Cost considerations should include model inference, hosting, and maintenance.
Conclusion
Leveraging LLMs like Claude 3 with Retrieval-Augmented Generation can significantly improve how catering orders are processed and personalized. By combining visual insights with external knowledge, catering platforms can offer tailored menus.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.