We constantly make decisions. Some seem simple: I booked dinner at a new restaurant, but I’m hungry now. Should I grab a snack and risk losing my appetite or wait until later for a satisfying meal—in other words, what choice is likely more rewarding?
Dopamine neurons inside the brain track these decisions and their outcomes. If you regret a choice, you’ll likely make a different one next time. This is called reinforcement learning, and it helps the brain continuously adjust to change. It also powers a family of AI algorithms that learn from successes and mistakes like humans do.
But reward isn’t all or nothing. Did my choice make me ecstatic, or just a little happier? Was the wait worth it?
This week, researchers at the Champalimaud Foundation, Harvard University, and other institutions said they’ve discovered a previously hidden universe of dopamine signaling in the brain. After recording the activity of single dopamine neurons as mice learned a new task, the teams found the cells don’t simply track rewards. They also keep tabs on when a reward came and how big it was—essentially building a mental map of near-term and far-future reward possibilities.
“Previous studies usually just averaged the activity across neurons and looked at that average,” said study author Margarida Sousa in a press release. “But we wanted to capture the full diversity across the population—to see how individual neurons might specialize and contribute to a broader, collective representation.”
Some dopamine neurons preferred immediate rewards; others slowly ramped up activity in expectation of delayed satisfaction. Each cell also had a preference for the size of a reward and listened out for internal signals—for example, if a mouse was thirsty, hungry, and its motivation level.
Surprisingly, this multidimensional map closely mimics some emerging AI systems that rely on reinforcement learning. Rather than averaging different opinions into a single decision, some AI systems use a group of algorithms that encodes a wide range of reward possibilities and then votes on a final decision.
In several simulations, AI equipped with a multidimensional map better handled uncertainty and risk in a foraging task.
The results “open new avenues” to design more efficient reinforcement learning AI that better predicts and adapts to uncertainties, wrote one team. They also provide a new way to understand how our brains make everyday decisions and may offer insight into how to treat impulsivity in neurological disorders such as Parkinson’s disease.
Dopamine Spark
For decades, neuroscientists have known dopamine neurons underpin reinforcement learning. These neurons puff out a small amount of dopamine—often dubbed the pleasure chemical—to signal an unexpected reward. Through trial and error, these signals might eventually steer a thirsty mouse through a maze to find the water stashed at its end. Scientists have developed a framework for reinforcement learning by recording the electrical activity of dopamine neurons as these critters learned. Dopamine neurons spark with activity in response to nearby rewards, then this activity slowly fades as time goes by—a process researchers call “discounting.”
But these analyses average activity into a single expected reward, rather than capturing the full range of possible outcomes over time—such as larger rewards after longer delays. Although the models can tell you if you’ve received a reward, they miss nuances, such as when and how much. After battling hunger—was the wait for the restaurant worth it?
An Unexpected Hint
Sousa and colleagues wondered if dopamine signaling is more complex than previously thought. Their new study was actually inspired by AI. An approach called distributional reinforcement learning estimates a range of possibilities and learns from trial and error rather than a single reward.
“What if different dopamine neurons were sensitive to distinct combinations of possible future reward features—for example, not just their magnitude, but also their timing?” said Sousa.
Harvard neuroscientists led by Naoshige Uchida had an answer. They recorded electrical activity from individual dopamine neurons in mice as the animals learned to lick up a water reward. At the beginning of each trial, the mice sniffed a different scent that predicted both the amount of water they might find—that is, the size of the reward—and how long until they might get it.
Each dopamine neuron had its own preference. Some were more impulsive and preferred immediate rewards, regardless of size. Others were more cautious, slowly ramping up activity that tracked reward over time. It’s a bit like being extremely thirsty on a hike in the desert with limited water: Do you chug it all now, or ration it out and give yourself a longer runway?
The neurons also had different personalities. Optimistic ones were especially sensitive to unexpectedly large rewards—activating with a burst—whereas pessimistic ones stayed silent. Combining the activity of these neuron voters, each with their own point of view, resulted in a population code that ultimately decided the mice’s behavior.
“It’s like having a team of advisors with different risk profiles,” said study author Daniel McNamee in the press release, “Some urge action—‘Take the reward now, it might not last’—while others advise patience—‘Wait, something better could be coming.’”
Each neuron’s stance was flexible. When the reward was consistently delayed, they collectively shifted to favor longer-term rewards, showcasing how the brain rapidly adjusts to change.
“When we looked at the [dopamine neuron] population as a whole, it became clear that these neurons were encoding a probabilistic map,” said study author Joe Paton. “Not just whether a reward was likely, but a coordinate system of when it might arrive and how big it might be.”
Brain to AI
The brain recordings were like ensemble AI, where each model has its own viewpoint but the group collaborates to handle uncertainties.
The team also developed an algorithm, called time-magnitude reinforcement learning, or TMRL, that could plan future choices. Classic reinforcement-learning models only give out rewards at the end. This takes many cycles of learning before an algorithm homes in on the best decision. But TMRL rapidly maps a slew of choices, allowing humans and AI to pick the best ones with fewer cycles. The new model also includes internal states, like hunger levels, to further fine-tune decisions.
In one test, equipping algorithms with a dopamine-like “multidimensional map” boosted their performance in a simulated foraging task compared to standard reinforcement learning models.
“Knowing in advance—at the start of an episode—the range and likelihood of rewards available and when they are likely to occur could be highly useful for planning and flexible behavior,” especially in a complex environment and with different internal states, wrote Sousa and team.
The dual studies are the latest to showcase the power of AI and neuroscience collaboration. Models of the brain’s inner workings can inspire more human-like AI. Meanwhile, AI is shining light into our own neural machinery, potentially leading to insights about neurological disorders.
Inspiration from the brain “could be key to developing machines that reason more like humans,” said Paton.