Part 1: How to build a data hurricane (a.k.a data mesh 2.0)? A constructive one for a change.
A smart data network to allow ML models/knowledgebases to be fed with required data when needed. Add a +1 in comment if you would like me to release the code for the design as part 2.
Subhro
Amazon Employee
Published Nov 11, 2024
With all technical disruptions, one thing vibrantly pops out – final decision-making usually depends on a human or in other words, human-in-the-loop. Let’s consider a smart watch, new app on our phones, or self-driving cards. There is always a human in the end or middle of the loop who would either make a choice with these tools, intervene when the technology doesn’t behave as the human expect or validate the outcome of the technology. However, has anyone asked that human role if they really enjoy doing their role with these technologies? I understand some roles such as using a smart watch or experimenting with protein structures could be surprisingly gratifying but other roles such as being the human-in-the-loop for a backend AI algorithm to validate its results could be really boring. The cardinal rule for this need for human presence is explainability and transparency. Hence, we have two choices - 1/ Keep diving until we reach the abyss of explainability of the technology or 2/ Build a mechanism where human-in-the-loop could be removed with optimal/limited focus on explainability but more focus on outcomes and automation to out-turn the results if it doesn’t fit the purpose.
This document focuses on the 2nd option solving issues to do with dependency on human related to networks - not only in data and MLOps platforms - but in user facing software applications.
In a rapidly changing world, the demand for AI in different business units within an organization is driving the need for data to train and infer results from AI models in near-real time. While each business unit in a data mesh holds data in their domains and is reluctant to share data across the organization due to security and political reasons, AI models that require cross-functional teams' data to be trained and inferred in near-real time don't perform at their best due to the unavailability of their relevant data. Technology, processes, and tools pose challenges even when teams prefer to share specific agreed-upon data sets.
The proposed design solves the technical and security challenges:
a/ Establish a secure network between data domains or empires of cross-business units in data mesh and bind them with a governance exchange (example Amazon DataZone, data.all ). The shared data between empires will be agreed-upon specified datasets.
b/ Allow AI models to automatically request data from any business unit through the aforementioned network without human intervention.
c/ Establish a network to share masked sensitive data from higher to lower environments (e.g., production to pre-production environments) for intra- and inter-business units, so that AI models can be trained on production datasets.
Even though we are trying to tackle the ever existing organisational challenges with people and data, the novelty of the technology we are proposing lies in the smartness of the network. In a nutshell, we can say,
1/ It is the network design and automated data request process by the AI models across data domains or empires.
It solves the problems like the following,
- No one wants to be a data steward.
- What if they leave, are absent, etc.?
- What incentive does anyone have to ever grant access to a dataset that they own
When we sequentially follow the above diagram, the model monitor, drift detector, automatic prompt generation to agents, data ledger, negotiations between the agents and making required data available from the producer data domain to another the consumer/requester data domain - they all make an automated eco-system of negotiations for data and availability of them through a managed network.
However, it would be evident at this stage to ask what is hurricane about it? Hurricanes are often known to be very powerful from outside and bottom of its structure which frictions with the land, the eye. The center of it is pretty peaceful. Similarly, our smart network is quite powerful to negotiate and exchange data through its interfaces but holds undisrupted and automated data flows through the network bandwidth.
Having this solution, human players such as data stewards, data product admins of data mesh, data governance team members or domain owners would not intervene in the operations of the network; AI agents of each domain would negotiate for data with each other to keep data available to foster data for ML models in respective domains.
Add a +1 in comment if you would like me to release the code for the design as part 2.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.