GenAI under the hood [Part 7] - 25 Insights from Brain research for AI advancements

GenAI under the hood [Part 7] - 25 Insights from Brain research for AI advancements

If we are not actively trying to decode and decipher the brain’s genius, we are ignoring (millions of) years of improvement and optimization.

Shreyas Subramanian
Amazon Employee
Published May 28, 2024
Techniques that the brain uses to learn is an active area of research, and has been one for several decades. In 1949 psychologist Donald Hebb adapted Pavlov's “associative learning rule” to explain how brain cells might acquire knowledge. Hebb proposed that when two neurons fire together, sending off impulses simultaneously, the connections between them—the synapses—grow stronger. When this happens, learning has taken place.
Currently, there is both work on software as well hardware in this area (For hardware, look at Neuromorphic computing platforms https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3812737/ and https://www.intel.com/content/www/us/en/research/neuromorphic-computing.html) but my interest here is to figure out if brain inspired computing can give us ideas for new learning algorithms that are different the current default way of training models. Why? There are several things that the brain does better in terms of learning. Using terms common to ML practitioners today, the brain has evolved to do the following:
  • Very-few shot learning
  • Compartmentalization
  • Efficient feature representation
  • Always multimodal input
  • Very low power training and inference
  • No evidence of inefficient training algorithms like back propagation

Odd / Different / Unique points that may lead to the above desirable characteristics for human-created AI systems

  1. Large number of cells - The brain is made up of a trillion cells; 90% of these are glial cells and 10% (about 100 billion) are neurons. Large language models today already exceed this number but are very task specific, and are not capable of what the brain can do. Most glial cells act as a glue, protect the neurons and regulate the rate of neuron signaling. Apart from the rate regulation, there may not be a use case of trying to replicate glial cells in the software defined brain (is that a thing?). Furthermore, each of the 100 Billion neurons may have up to 10000 dendrites, which means that there could be 1 Quadrillion synapses in a single human brain - this maps to the number of parameters in a NN. The ML community agrees that our large models are over-parameterized, which by definition means we have more neurons in our models than required, or that each neuron is not as useful.
  2. Many-to-many connections - Each neuron can connect to multiple other neurons through dendrites. Each neuron can also connect to the same neuron more than once. This is unique, and it is not clear what benefit this has.
  3. Input receiving and preprocessing - all sensory information (except smell) goes to the thalamus, where it is preprocessed and sent to other areas of the brain for processing. This both acts like a DataLoader (loading and initial transformations) as well as a signal routing engine.
  4. Long term storage - not all information is needed immediately, but a compressed version of this set of information is sent to the Hippocampus for long term storage. This is similar to pre-trained weights that can be used for downstream models.
  5. Neurogenesis and neurodegeneration - The number of neurons in the system is not constant - neurons can be crated and destroyed as needed. This happens in neural architecture search (NAS). The hippocampus again plays an important role in this
  6. Extremely dense connections - The two hemispheres of the cerebrum (representing 80% of the total mass) are connected via the cortex, which is a small structure (about tenth of an inch) with a large surface area (~ 2 sq ft), with about 10000 miles of connecting fibers (200 million in number) . The connections between layers are not as dense even in the largest models we train.
  7. Varying densities across the network - The cerebellum, which only represents 11% of the total mass of the brain, has more neurons than every other part of the brain put together. It is associated with fine-tuning thoughts, motor function, automated activities to free up other parts of the brain. Typically there are no clear “density” differences in the models we train.
  8. the role of mirror neurons - The same neurons in the brain were seen to fire when an action (or thing) was observed, as opposed to when the action was done. Neuroscientists believe that this may help predict actions better. This is similar to self-supervised learning and fine tuning.
  9. Pruning - Synaptic pruning is genetically programmed so that only those neurons that have made connections are preserved; this process destroys a large number of neurons (~100 Billion) that are formed during gestation; this is done on purpose to reduce complexity. This is the very same concept we use in neural network pruning.
  10. Specialization - certain areas in the brain are reserved for learning certain things - for example, vocabulary and language is developed in a particular section of the brain. This is unlike our Neural Networks that as a whole, specialize on one thing. Researchers can claim that attention points to specialization, but this the overall task that the NN is being trained on is singular.
  11. Just as the human brain reacts spontaneously to words, LLMs tokenize text. Research exploring how the brain learns and uses words reveals that different parts of the brain are involved in understanding and using language. For example, some areas help us recognize what words mean, while others help us pronounce them. These brain regions vary depending on the type of word, like whether it's something we see or hear. This research also shows that the brain changes depending on the language we speak.
  12. Sparse coding - Kosik explained the concept in terms of discerning odors here (https://www.universityofcalifornia.edu/news/how-neurons-learn-last-frontier). Too many odors exist for each one to have a unique pattern of firing neurons. Rather, the brain creates small maps. One odor might have 10 neurons that encode it, seven of which also encode a different odor, creating an overlap. In GANs, VAEs etc. we like the latent space to be disentangled (not entangled like in this case).
  13. Graph representation - Researchers have proved (in mice) that learning happens when the strength of connections between the neurons increases. This is similar to weights in a graph (or a GNN) changing. Typically, weights in a NN describe properties of the neuron itself, while connections represent mathematical operators.
  14. Memory - recent study at the Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School found that the structural core of the brain receives sensory information from different regions and then assembles bits of data into a complete picture that becomes a memory of an event. This memory is strengthened by multiple sensory inputs. Memory in a NN is permanent, as the weights of a saved model are permanent. Past memories can be an impediment to future learning that contradicts previous information; this is loosely connected to adversarial training, and this is rarely used.
  15. Selective data ingestion - the brain is selective about which stimuli to accept, and which to ignore from sense organs. This is different from NNs where all input is valid input.
  16. Differentiation - Sensory, motor and transport neurons perform specific functions. In NNs, all neurons are created equal.
  17. Predictive learning - Neurons learn by predicting future activity (https://www.nature.com/articles/s42256-021-00430-y). Interestingly, this predictive learning rule can be derived from a metabolic principle, whereby neurons need to minimize their own synaptic activity (cost) while maximizing their impact on local blood supply by recruiting other neurons. This is completely unlike how NNs train; instead of having a local cost, we have a global cost (loss function) which is optimized (using back propagation).
  18. Automatic data augmentation - results from the above paper also suggest that spontaneous brain activity provides ‘training data’ for neurons to learn to predict cortical dynamics. New, simulation data that is similar to existing data is not something that is spontaneous in NNs.
  19. Self-supervised training - Taking it down to the level of the individual neuron, each neuron gets many feedback signals to guide its learning. One type of feedback is the relationship between its inputs and its own output. Spike-timing-dependent plasticity (STDP) occurs in the brain and causes neurons to favor only inputs that predicted its own output. Over time, this will cause the neuron to specialize in finding statistically frequent but otherwise unique input patterns. Neurons also get global feedback signals, for example in the form of dopamine for reinforcement learning. And neurons may get other types of feedback as well, for example induced by the phase relationship between their own firing and dominant local population oscillation. Most NN trainings are task specific and not self-supervised.
  20. The brain has the ability to learn how to learn, a concept known as meta-learning. This allows humans to apply knowledge from one domain to another and to learn new tasks more efficiently. Recent research suggests that synaptic plasticity, the strengthening or weakening of connections between neurons, plays a crucial role in learning and memory. Interestingly, the timing rules governing these synaptic changes appear to be precisely tuned to circuit-specific feedback delays, allowing error feedback to selectively weaken only the synapses contributing to the error. This implies that the synaptic timing rules themselves are learned through experience. In the context of neural networks, these findings could inspire the development of meta-learning algorithms that enable rapid adaptation to new tasks, much like an experienced athlete quickly mastering a new sport. Incorporating such meta-learning capabilities could enhance the flexibility and efficiency of artificial neural networks, bringing them closer to the remarkable adaptability of biological neural systems. Read more about our work on Learning to learn - https://www.amazon.science/blog/learning-to-learn-learning-rate-schedules
  21. In the human brain, attention operates by focusing on specific sensory modalities, such as vision or hearing, through networks like the parietal cortex for visual input. This selective attention enables the brain to prioritize certain stimuli over others, enhancing processing efficiency for relevant tasks. In contrast, contemporary multimodal models, such as those used in artificial intelligence (AI), are designed to process and integrate multiple types of data simultaneously, such as text, images, and audio. This allows the brain to handle various input forms concurrently without a need to focus on one modality at a time. Similar work on multimodal transformer-based models are being studied today.
  22. The attention mechanism in the human brain is a complex network involving specific regions like the anterior cingulate cortex (ACC) and frontal eye fields, which regulate focus and resolve conflicts between competing stimuli. This biological process is dynamic and context-sensitive, adjusting attentional resources based on the task and environmental demands. Transformer models, used in AI, employ a different approach known as self-attention mechanisms. These models assign weights to input elements, enabling them to focus on different parts of the input data dynamically during processing. Unlike the brain's reliance on anatomical and neuromodulatory systems, transformers use mathematical operations to simulate attention, providing flexibility and scalability in handling large datasets.
  23. Executive control in the human brain involves networks like the dorsolateral prefrontal cortex, which manages high-level functions such as task switching, conflict resolution, and error monitoring. These networks facilitate top-down control over cognitive processes, allowing for adaptive and goal-directed behavior. In the realm of AI, particularly large language models (LLMs) and transformers, there isn't a direct analog to these executive control networks (perhaps now in Agents). The sophisticated, context-sensitive regulation observed in human executive control remains a unique feature of biological cognition not fully replicated in AI systems.
  24. A deeper note on memory - Recent advancements in neuroscience, spanning from the work of Howard Eichenbaum at Boston University to the findings supported by the NIH BRAIN Initiative, offer profound insights into the mechanisms underlying human memory. Eichenbaum's research elucidates a complex brain circuitry governing behavior and memory access, highlighting the interplay between the ventral hippocampus, prefrontal cortex, and dorsal hippocampus. Meanwhile, studies supported by the NIH BRAIN Initiative unveil the roles of specific neurons—boundary cells and event cells—in organizing and retrieving memories. These discoveries deepen our understanding of how memories are segmented and stored, providing potential avenues for therapeutic interventions in memory-related disorders such as Alzheimer's disease. By revealing the intricate communication pathways within the brain and the physiological processes guiding memory formation and retrieval, these studies pave the way for novel approaches to address cognitive impairments and enhance memory function. Recent work such as an approach called "Think-in-Memory" (TiM)" is shown to enhance the long-term memory capabilities of LLMs (https://arxiv.org/pdf/2311.08719). The key idea is to store the self-generated "thoughts" or reasoning of the LLM during conversations, rather than just the raw text. This way, the LLM can directly recall and build upon its previous thoughts, avoiding the need for repeated reasoning over the same context, which often leads to inconsistent or biased responses. The TiM framework operates in two stages: 1) Recalling relevant thoughts from memory before generating a response, and 2) Post-thinking and updating the memory with new thoughts after generating the response.
  25. Continual learning challenges artificial neural networks, often causing catastrophic forgetting of old knowledge when acquiring new information. This is also true for LLMs. In human brains, research suggests sleep may protect existing memories during new learning. Training on new tasks degraded old memories, but simulating sleep afterward reversed this damage, enhancing both old and new memories. Sleep replay reshaped synaptic connections, allowing multiple memory traces to coexist. This dynamic process suggests sleep aids continual learning by consolidating new memories while fortifying old ones to reduce interference. If only you allowed the LLMs to sleep for a bit before you trained them on a new language or task... :D
If we are not actively trying to decode and decipher the brain’s genius, we are ignoring (millions of) years of improvement and optimization.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.