AWS Logo
Menu
Evolving Data Architectures: From Kimball to Medallion to AI-Ready Platforms

Evolving Data Architectures: From Kimball to Medallion to AI-Ready Platforms

The journey from Kimball's 1990s star schemas to cloud-native Medallion architecture, and now to AI-ready platforms showcases how data architectures adapt to technological capabilities.

Navnit Shukla
Amazon Employee
Published Jun 4, 2025
Last Modified Jun 5, 2025
In today's data-driven organizations, the foundation of effective analytics and AI initiatives begins with thoughtful data architecture. As companies transition from traditional data warehousing to modern lakehouse paradigms and incorporate generative AI capabilities, understanding how these architectural choices impact business outcomes becomes critical.
This discussion was primarily around using Amazon QuickSight as a GenAI-enhanced self-service BI tool with data stored in enterprise data warehouses such as Amazon Redshift and Snowflake. However, the principles we'll explore extend far beyond specific technologies—they represent fundamental shifts in how we architect data for both human and AI consumption.
I've had several discussions with customers navigating this exact transition. These conversations consistently circle back to the same core questions about modeling data at the Gold layer, managing performance with large datasets, and preparing for generative AI implementations.

The Evolution of Data Modeling Approaches

Data modeling has undergone significant transformation over the decades. Kimball's dimensional approach emerged when storage was expensive and processing power limited. The star schema design - with its facts and dimensions - optimized storage efficiency while making analytical queries more intuitive.
Fast forward to today, where storage has become a commodity and massive parallel processing (MPP) is standard. Modern Medallion architectures prioritize transformation clarity and query performance over storage constraints. This shift raises important questions for organizations navigating this evolution while implementing visualization tools like Amazon QuickSight and integrating emerging generative AI capabilities.

Gold Layer Modeling: Balancing Reusability and Performance

The Challenge
When dealing with multi billion-row fact tables, organizations face a critical decision: maintain separate facts and dimensions at the Gold layer, or create denormalized, purpose-specific tables?
This choice becomes even more consequential when feeding data to both BI tools and generative AI models.
I recently worked with a global retailer struggling with this exact issue. Their transaction history exceeded 3 billion rows, and they were migrating from a traditional Kimball model to a Medallion architecture while simultaneously building their first GenAI applications in Amazon QuickSight.
Their existing model looked similar to this:
The Solution
We recommended a hybrid approach:
  1. Maintain core dimensions as reusable assets - These provide consistent business definitions across the organization
  2. Create purpose-built, denormalized fact tables for specific business domains
  3. Design with both human and AI consumption in mind
For their QuickSight implementation, we created two types of Gold layer assets:
When they later implemented an AI assistant to answer business questions, this approach proved invaluable—the AI could efficiently access domain-specific data while maintaining consistent definitions across the enterprise.

SPICE vs. Direct Query: Performance Considerations for Analytics and AI

QuickSight offers two query modes - SPICE (in-memory) and Direct Query. This choice impacts not just dashboard performance but also how effectively your data can be used in GenAI implementations.
Our retail customer needed both historical analysis (billions of transaction records) and near real-time sales monitoring. After evaluating their query patterns, we recommended:
  • SPICE for historical analysis dashboards with daily refresh schedules
  • Direct Query for operational dashboards requiring up-to-date data
  • A combination for their executive dashboards, with SPICE for historical trends and Direct Query for today's sales
When they added their GenAI application for store managers, this architecture allowed the AI to reference both historical context and current performance data to generate insights like "Today's sales are 15% below forecast, similar to the pattern we saw last quarter during the supply chain disruption."

Flattening Data: Simplification for Humans and Machines

Pre-joined, flattened datasets often improve performance for both QuickSight and generative AI models. For our retail customer, we created denormalized views specifically for their most critical dashboards:

Scaling Strategies for the AI Era

As our retail customer's data volumes grew from billions to trillions of rows, we implemented several strategies to maintain performance:
  1. Domain-driven modeling - Creating separate Gold datasets for merchandising, store operations, and finance teams
  2. Thoughtful partitioning - Partitioning transaction data by date and region
  3. Aggregation layers - Building daily, weekly, and monthly aggregates for common queries
  4. Metadata enrichment - Adding clear business descriptions to all fields
These practices ensured their architecture remained performant and comprehensible to both human and AI consumers as they scaled.
This approach made their business logic accessible to both QuickSight and their GenAI applications. The transparency of having these calculations in the data transformation layer rather than database procedures improved documentation and ultimately the accuracy of both their dashboards and AI-generated insights. While QuickSight's limitation with stored procedures initially seemed restrictive, it actually pushed our customer toward better architectural practices that would later benefit their AI initiatives.
By moving complex business logic from opaque procedures to transparent transformations in the Gold layer, they created a single source of truth that both visualization tools and AI models could leverage consistently.

Why This Matters: The Data Foundation for AI Success

As our retail customer expanded their GenAI implementation, they discovered that their data architecture decisions had profound impacts on AI effectiveness. Their initial GenAI prototype struggled with inconsistent answers about product performance until we addressed underlying data architecture issues.
Their experience highlighted how a poorly structured data model leads to:
  • Confused AI responses due to ambiguous relationships between products, categories, and promotions
  • Difficulty connecting business concepts across domains like inventory, sales, and customer data
  • Inefficient queries that increased latency and costs when analyzing large transaction datasets
  • Inconsistent answers as models struggled with complex joins and filtering logic
After restructuring their Gold layer with both analytics and AI in mind, the results were dramatic:
With this improved foundation, their organization could:
  • Create more accurate AI responses through clear data context
  • Implement domain-specific AI solutions with minimal rework
  • Scale AI initiatives without rebuilding underlying data models
  • Maintain consistency between QuickSight dashboards and AI responses

Beyond QuickSight: Universal Principles for the AI Data Stack

While our discussion began with Amazon QuickSight accessing data from Redshift and Snowflake, these principles extend far beyond specific technologies. The fundamental patterns of effective data architecture for both analytics and AI remain consistent across technology stacks:
  1. Prioritize semantic clarity - Clear, well-named entities and relationships benefit both human analysts and AI models. This clarity is even more critical for GenAI than traditional BI because AI needs to infer relationships that humans can interpret visually.
  2. Balance normalization and performance - The old debate between normalized and denormalized models becomes even more nuanced when GenAI enters the picture. AI can generate more accurate insights when relationships are explicit, which often favors denormalized models.
  3. Design for varied query patterns - While BI tools typically follow predictable patterns, GenAI models may explore data in unexpected ways as they answer user questions. Your architecture should support both structured, predefined queries and exploratory patterns.
  4. Centralize business logic - Whether using dbt models, views in your data warehouse, or another transformation layer, centralizing business definitions ensures consistency between dashboards and AI outputs.
  5. Document with both humans and machines in mind - Rich metadata isn't just for human comprehension anymore. GenAI can leverage well-documented data models to provide more accurate responses and explanations.
Every technology choice—from data lake to data warehouse to BI tool—will have specific implementation details, but the foundational architectural patterns remain the same. The companies succeeding with both analytics and GenAI are those thinking holistically about their data architecture.

Conclusion: Build Once, Serve Many

The transition from Kimball to Medallion architecture represents more than just technical evolution—it's preparation for a future where data serves both traditional analytics and emerging AI capabilities. As our retail customer discovered, thoughtfully designing your Gold layer with these dual purposes in mind creates a foundation that supports both BI dashboards and GenAI applications.
The same decisions that improve QuickSight performance—like balancing normalized dimensions with denormalized fact tables, choosing between SPICE and Direct Query, and creating domain-specific models—directly impact how effectively your generative AI solutions can understand and leverage your business data.
For organizations embarking on this journey, I recommend:
  1. Start with your highest-value use cases - Identify the most critical dashboards and potential AI applications to drive architectural decisions
  2. Embrace hybrid approaches - Don't force yourself to choose between Kimball purity and complete denormalization
  3. Test performance early - Validate both QuickSight and AI performance with representative data volumes
  4. Design for both human and machine consumption - Clear naming, relationships, and metadata benefit both analysts and AI models
  5. Evolve incrementally - You don't need to rebuild everything at once; focus on your most critical domains first
The organizations that view their data architecture as a strategic asset rather than a technical implementation detail will be best positioned to leverage the full potential of both business intelligence and generative AI. As these technologies continue to converge, the architectural decisions you make today will determine how effectively you can leverage your data for competitive advantage tomorrow.
Remember: Modern data architecture isn't just about optimizing for today's dashboards—it's about creating a foundation flexible enough to support tomorrow's AI innovations. By applying these principles, you can build a data platform that serves many purposes from a single, coherent design.
The future belongs to organizations that can seamlessly blend traditional business intelligence with generative AI capabilities—all powered by a thoughtfully designed data architecture that serves both needs without compromise.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments