Tech DeepDive: Machine Learning in Gen AI - How it Works
Curious what powers your favorite GenAI tools under the hood? Machine Learning is at play!
Published Feb 7, 2024
There's been an increase in focus on the rapidly evolving world of artificial intelligence (AI) over the past 3 years. I wanted to take the time to break this down a bit more - after all, how does all of this really work? While not a "new" concept, the Generative stage of the AI lifecycle certainly is - and it's here to stay. At their core, both AI and Generative AI (GenAI) leverage machine learning (ML) to help them make decisions (predicted outcomes) based on a wide range of trained inputs (models). Regardless of the tool you are using, it relies on an underlying type of machine learning that leverages massive amounts of data to work. Any of the big names that you see in the news - like Midjourney, OpenAI's ChatGPT, Amazon Web Services (AWS), Google Cloud, and Microsoft Azure all follow the same foundational steps when leveraging ML for their services.
Generative AI refers to a subset of AI that focuses on creating (generating) new content. This technology learns from existing data and generates new items that retain the original data's characteristics. Unlike traditional AI, which is designed to recognize and classify data, Generative AI goes a step further—it creates!
But just like traditional AI, at the heart of it all is machine learning, particularly deep learning—a process that uses neural networks with many layers (hence 'deep'). These networks are trained on large datasets, allowing them to learn and generate complex patterns and information. In general, the larger the data set - the larger the opportunity for various outputs. Here's a breakdown of how this process works:
1. Data Collection and Preparation
Before anything else, you have to assemble a large and diverse dataset is. This dataset should be representative of the type of content you want the AI to generate. For example, if the goal is to generate new music, the dataset might consist of a thousands (or tens of thousands) of musical pieces across a variety of genres. While not a requirement, you can make it easier on your models if many of the files are in similar datatypes. It doesn't do much good to have a range of .mp4 files and a handful of .txt files. Look for consistency in data types and diversity of data inputs.
2. Choosing a Machine Learning Model
There are different types of machine learning models used in Generative AI, each suitable for different kinds of tasks. There are so many to cover, but you might here these common models more than others:
- Generative Adversarial Networks (GANs): GANs consist of two neural networks, the generator and the discriminator, which are trained simultaneously. The generator creates data, while the discriminator evaluates it against real data. This adversarial process improves the quality of generated outputs. Deepfake technology uses GANs to create realistic images and videos by superimposing existing images and videos onto source images or videos.
- Variational Autoencoders (VAEs): VAEs are a type of auto encoder neural network used to generate complex data like images. They work by compressing data into a latent-space representation and then reconstructing it. VAEs are particularly good at producing new data that resembles the training data. They are often used in image generation tasks where new images that resemble a given dataset (like faces or landscapes) are created.
- Transformer Models: Transformers are a type of model that uses mechanisms like attention to weigh the influence of different parts of the input data. They are highly effective in handling sequential data and are widely used in natural language processing. OpenAI's ChatGPT (Generative Pre-trained Transformer) series, used for generating human-like text, is an example of transformer models in Generative AI.
- Restricted Boltzmann Machines (RBMs): RBMs are a type of stochastic neural network that can learn a probability distribution over its set of inputs. They are useful for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling. They have been used in the initial layers of deep neural networks to pre-train them for tasks like image recognition.
- Autoencoders: are neural networks designed to learn efficient representations of input data, typically for dimensionality reduction, by encoding input into a lower-dimensional space and then decoding it back. These are used in anomaly detection, where they learn to reconstruct normal data and can then identify anomalies by their poor reconstruction quality.
- Recurrent Neural Networks (RNNs): RNNs are a type of neural network where connections between nodes form a directed graph along a temporal sequence, allowing them to exhibit temporal dynamic behavior. They are suitable for tasks where context and sequence matter. Often used in music generation, where the sequential nature of music can be learned and replicated by the model.
3. Training the Model
The selected model is trained using the dataset. During training, the model learns the patterns, styles, or characteristics of the data. For instance, a model trained on a dataset of paintings will learn various artistic styles and elements present in that dataset. This is where services like Midjourney come into play with their ability to bring an incredible range or artistic skillsets to bear.
4. Generating New Content
Once trained, the model can start generating new content. The generated content is based on what the model learned during training but is original and not a mere copy of the training data. Prompts is what really sets each tool apart. Entering a basic prompt like, "Generate an image of a flower in an oil painting style" will definitely return a result. However, the more detailed you can be - the better the outcome. We'd take the above prompt and instead say something like, "Illustration, night core, blue and teal in the kitchen, in the style of amanda clark, mandy disher, charming, idyllic rural scenes, glittery and shiny, alma woodsey thomas, konica big mini, the stars art group in style of whimsical naive art, nostalgic atmosphere, oil painting influence." As you can see, the more detailed the prompt, the more accurate the output since GenAI can further focus the input parameters from the training data.
5. Evaluation and Refinement
The outputs are evaluated to see how well they match the desired outcome. The model might be further refined and retrained to improve the quality of the generated content. For those of you that have used ChatGPT, this is the same function as clicking the 'regenerate' button and having the tool ask you if the second output was better/worse than the first. This human input allows the model to continue to learn and evolve itself to be able to better understand what we are prompting it to generate.
I am biased since my 'home' cloud is AWS. It's what I am most familiar with and the cloud provider I chose early on in my career to specialize in. AWS offers a range of tools that simplify the use of machine learning for Generative AI. Unlike other providers, AWS focuses on providing tools to the end user (you) that let you build and train your own machine learning models. These tools are accessible to professionals without deep ML expertise, making it easier to implement and experiment with Generative AI models.
Next time you're in a conversation about ML and GenAI, you'll know some key AWS offerings:
- Amazon SageMaker: A fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. SageMaker is particularly useful for Generative AI as it supports a broad set of ML models, including those used in generative tasks. The best part about this for new and experienced users alike is that it is fully managed. You don't have to worry about the underlying infrastructure that supports your requests. Check out some free tutorials and join the SageMaker team for 'Free Fridays' to learn more.
- AWS DeepComposer: This is an innovative offering from AWS that allows users to create music using Generative AI. It uses Generative Adversarial Networks (GANs) to generate music based on input melodies. DeepComposer is a compelling example of how ML can be applied in creative fields. For those reading that have minimal coding skills or cloud knowledge but are musically proficient - you should definitely check out AWS DeepComposer. It's a very cool service to learn how to use and train ML models without having to write a single line of code!
- AWS DeepRacer: An autonomous 1/18th scale race car designed to test and expand your understanding of reinforcement learning (RL), a type of machine learning. DeepRacer provides a hands-on experience in training, evaluating, and tuning ML models. What's even cooler? If you get really invested into DeepRacer you can compete at our various tournaments for chances to win cool prizes! Check out pricing and how you can get involved for free for the first 30 hours.
The future of Generative AI is incredibly exciting. But under the hood, it's relatively simple when we break things down. So remember, GenAI is a type of AI that can generate new content. It does this based on the underlying ML models that form the wide range of trained 'inputs' that it can draw on. Based on your prompts, GenAI will look at all of the data it has access to and generate a response it believes is the closest to your request. To get the best outcomes, you need to make sure your prompts are specific, defined, and as detailed as possible. As machine learning models become more sophisticated, more robust, and better trained the potential applications of Generative AI continue to grow. Don't let it grow without you! Spend the time now on learning how to craft prompts for tools like ChatGPT, MidJourney, and Scribe to build your skillsets before these services become common place.