AWS Logo
Menu

๐Ÿงช The Rise of the LLM OS: From AIOS to MemGPT and beyond

A personal tale of experimentation involving LLMs and operating systems with some thoughts on how they might work together in a not-so-distant future.

Joรฃo Galego
Amazon Employee
Published Apr 12, 2024
Last Modified May 20, 2024

And Now for Something Completely Different...

Over the last few months, I've been feeling a strong gravitational pull towards the idea of merging large language models (LLM) and operating systems (OS) into what has been called an LLM Operating System (LLM OS).
After some introspection, I believe the root cause can be traced back to a deeply personal interest in the matter rooted in my childhood.
As a teenager, I used to sit around and watch my older brother as he tried to build what I can only describe as a leaner version of Rosey the Robot in our garage.
At the time, I was a bit of a Luddite. I didn't have the skills nor the inclination to appreciate a lot of the things he said, but some of those things have stuck with me to this day.
For instance, I remember that he had a strong set of beliefs about his creation and the way it should be designed that were non-negotiable:
  1. ๐Ÿ’ฌ language understanding is key
  2. ๐Ÿค– it will need a 'body' to explore the world and
  3. ๐Ÿง internally, it will function like an OS
While #1 is an open-and-shut case, #2 is... well, let's just say that the jury is still out. But what are we to make of #3? And where was this coming from?
My brother didn't have a college degree, but he had a very curious and inquisitive nature. He loved to 'break' complex things apart, learn what made them tick and put them back together... with varying degrees of success according to my mother. Old TV sets and microwaves could never stand a chance around him. Sometimes, however, he would turn his attention to big, clunky things that one can poke around without a wrench. New Linux distros were his favorite. At one point, he spent several months exploring every crevice of Back Track and later Kali Linux.
All this tinkering and exploration ended up shaping his mechanistic views of thinking and turned him into a big fan of computational theories of mind, though he never called them that. He was certain that there was some kind of connection between the human brain, thinking machines and the way an OS works, an equivalence relation or GEB-like isomorphism of sorts if you will. For him, the mind was just spaghetti code running on ages-old wetware.
โ˜๏ธ Fun fact: OSes were always a mystery to me. The first time I got a true glimpse of what an OS does and the intuition behind it was when I read Nisan & Schoken's The Elements of Computing Systems in college and followed their Nand2Tetris ๐Ÿ•น๏ธ course. If you haven't read this one or taken the course, I highly encourage you to do so. It's life-changing!
His strong intuition never really materialized for me. I saw many obstacles with the analogy (I still do) and how accurate it actually is, but the main issue was that I just couldn't 'see' it.
That is, until LLMs came bursting into the picture...

๐Ÿ OuroborOS: From OS to LLM and back

"Imagine a futuristic Jarvis-like AI. Itโ€™ll be able to search through the internet, access local files, videos, and images on the disk, and execute programs. Where should it sit? At the kernel level? At Python Level?" โ€• Anshuman Mishra, Illustrated LLM OS: An Implementational Perspective
The LLM OS became a hot topic late last year following Andrej Karpathy's viral tweets and videos, but as we'll soon see there was already a lot of great work out there around this topic long before that.
The LLM Operating System, as presented by Andrej Karpathy
The key is to think of the LLM as the kernel process of an emerging OS. With some work, the LLM would be able to coordinate and manage resources like memory without any user intervention and apply different kinds of computational tools to handle requests coming from userspace.
Earlier comparisons between Transformers and computers, and between natural language and programming languages were not-so-subtle hints that there was something there, but due to the low firing rate of my isomorphim neurons I ended up missing the proverbial forest for the trees ๐ŸŒฒ.
source: Twitter
My aha! moment came a bit later while reading a little-known article by Franรงois Chollet explaining his mental model to understand prompt engineering.
As he went on about prompts as program queries and vector programs as maps to and from latent space, and tried to make a connection between LLMs and continuous program databases, something just clicked.
All of a sudden, I could see a path moving forward. The key was to make the whole mapping thing cyclic and let the LLM itself handle the program search (prompt engineering). In my mind, this conjured up images of Mรถbius strips and ouroboroi (yes, it's a word): the OS relinquishing control to the LLM only to be transformed into a new kind of OS.
โ€œA mirror mirroring a mirrorโ€
โ€• Douglas R. Hofstadter, I Am a Strange Loop
OuroborOS: where does the LLM end and the OS begin?
Give it access to tools via function calling, some limited reasoning abilities, and we're essentially done.
Right? Well, not so fast...

๐Ÿ’ญ MemGPT: Limited Context Windows and Virtual Memory

Let's assume that our system is completely isolated from the rest of the world except for the interactions with its user base.
From the moment we turn it 'on', messages of all kinds will start flowing around between the user, the model and the underlying OS (if there's still one).
This means every user session is essentially one GIANT conversation. Now multiply that by the number of users and sessions and we start running into trouble.
Due to their limited context windows, LLMs are not great at holding lengthy conversations or reasoning about long documents. If the context window is too short, it will start to 'overflow'. If it's too long, the important bits can get lost in the middle.
Putting RAG aside for the moment, this context window has to hold all the necessary information to 'act' upon a user's request. Moreover, the system will have to be autonomous enough to manage all this context on its own.
Given these constraints, this is starting to seem like an impossible task.
Fortunately, traditional OSes have solved this issue a long time ago. Quoting from an amazing research roundup by Charles Frye:
"RAM is limited and expensive, relative to disk, so being able to use disk as memory is a big win. Language models also have memory limits: when producing tokens, they can only refer to at most a fixed number of previous context tokens. (...) How might we apply the pattern of virtual memory to LLMs to also allow them to effectively access much larger storage?"
In a stellar example of cross-pollination between different areas of research, the team behind MemGPT (Packer et al., 2023) managed to solve this issue by augmenting LLMs with something akin to virtual memory.
MemGPT architecture, from Packer et al. (2023)
It does this by creating an architecture that uses carefully designed prompts and tools to allow LLMs to manage their own context and memory.
At its core, we find an "OS-inspired multi-level memory architecture" with two primary memory types:
  1. Main context (main memory/physical memory/RAM) which holds in-context data
  2. External context (disk memory/disk storage) where out-of-context information is stored
Main context is further divided into three different sections:
  • System instructions: a read-only instruction set explaining how the system should behave
  • Working context: contains key information like user preferences and personas
  • FIFO queue: which stores a rolling history of system and user messages
Out-of-context is stored outside the LLM on a filesystem or a database and it has to be explicitly moved to main context via function calling before it can be used.
When compared to RAG-like approaches, the key difference is that the retrieval part is done via function calling. Every interaction really is modulated by function calls.
Now, I'm obviously glossing over a few important details here that are relevant for the implementation. This includes components like the queue manager and the function executor that ensure that the whole system doesn't go off the rails.
For now, it will suffice to say that MemGPT is an elegant implementation of the LLM OS motif. More than just bringing in some 'extra' memory, it gives the LLM the option to 'transcend' the confines of its own context window making it virtually (pun intended) boundless.
Enough chit-chat, ready to see how we can run this in practice?

Running MemGPT on AWS โ˜๏ธ

In this section, I'm going to show you how to run MemGPT using AWS services. MemGPT was originally designed to work with GPT 3.5 and GPT 4 models, but we're going to try something different.
In a nutshell, there's an easy way to run MemGPT on AWS... and a hack-y way.
The easy way involves hosting a local setup on a good old EC2 instance.
As an example, the script below points MemGPT to a local koboldcpp server that is hosting Eric Hartford's Dolphin 2.2.1 Mistral 7B model:
The easy way is effective, but it can be a bit slow. So don't expect a high throughput unless you're using a beefier machine.
The hack-y way involves "Groq-ifying" Claude models using a Bedrock Access Gateway, an OpenAI-compatible proxy to Amazon Bedrock models.
โš ๏ธ As of this writing, MemGPT does not support Amazon Bedrock.
Bedrock Access Gateway architecture
Using Groq support, we can bypass MemGPT's OpenAI-centric implementation while simultaneously working around Claude's message API restrictions without changing a single line of code. Neat, right?
โ— Notice that we're not using the full proxy URL <LB_URL>/api/v1 but <LB_URL>/api. This is to ensure compatibility with Groq's OpenAI-friendly API.
This highlights the importance of having compatible APIs between different models and model providers for the sake of modularity. Everyone stands to win!
๐Ÿ”ฎ If you're reading this in the distant future, you may want to change the LLM backend from groq to groq-legacy cf. this PR for additional information. Then again, there may be better deployment options or even native Amazon Bedrock support by the time you read this. Time will tell.
๐Ÿงจ As a last caveat, keep in mind that the MemGPT repository is volatile and that open LLM support is highly experimental. So don't expect this hack to work all the time!
Hic sunt dracones!

Bringing Amazon Bedrock to AIOS

Before we part ways, I want to talk to you about an alternative LLM OS concept called AIOS and my humble attempts to introduce Amazon Bedrock support.
One of the biggest differences between AIOS and MemGPT is the importance attributed to the underlying OS.
As you can see from the image below, the AIOS will often offload tasks to the OS in order to optimize resource utilization.
Booking a travel with AIOS, from Mei et al. (2024)
While a full analysis of the AIOS architecture is beyond the scope of this article, if we check the image below, we can clearly see that the LLM kernel and the OS kernel work side-by-side and have very well-defined roles
Overview of the AIOS architecture (Mei et al., 2024)
Finally, I'm proud to announce that you can now power AIOS with Claude 3 models available on Amazon Bedrock ๐ŸŽ‰
10 minutes, fastest merge time ever!
Just run the snippets below and let me know what you think.

What comes next? ๐Ÿ”ฎ

There's a famous quote from Linus Torvalds that goes something like
"All operating systems suck, but Linux sucks less."
It may as well be apocryphal wisdom, but whoever said it has a point.
Right now, in one way or another, all LLM OSes suck and there's no 'LLM OS distro' to rule them all.
As we saw, most of the systems out there are these big, clunky, Rube Goldberg contraptions that rely heavily on advanced model features like function calling which are not always available.
The truth is that we're only just starting this journey and the scene keeps changing rapidly:
  • Will small language models play a bigger role in future?
  • What will be the first Mobile LLM OS?
  • How about multimodal models?
  • When will we see the first distributed LLM OS?
  • What are the implications for Responsible AI?
When it comes to merging Generative AI at the OS level, we have a lot more questions than answers. That's what makes it so exciting.
I'm very curious to see the ingenious we'll come up with to move past some these limitations.
Until then, keep on building! ๐Ÿ’ช
๐Ÿ™ This article is dedicated to the memory of my brother who never got to see an LLM work.
๐Ÿ—๏ธ Are you working on bringing LLMs and operating systems together? I'd love to hear about your plans. Feel free to DM me or share the details in the comments section below.

References

Articles ๐Ÿ“

Blogs โœ๏ธ

Code ๐Ÿ’ป

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments