When federal agencies issue a research grant, they never know if their investment will reap rewards for society. This was almost certainly true in the late 1970s and early 1980s, when the National Science Foundation and the Office of Naval Research funded projects by James “Jay” McClelland, David Rumelhart, and Geoffrey Hinton to model human cognitive abilities.
Yet that investment led to a cascade of research progress: a neural network model of how humans perceive letters and words; two volumes published in 1986 describing the team’s theory of how neural networks in our brains function as parallel distributed processing systems; and a seminal article in Nature by Rumelhart, Hinton and a student named Ronald J. Williams demonstrating the power of what’s called the backpropagation algorithm – a way of training neural network models to learn from their mistakes.
And that research in turn spawned much of modern AI. “Today, the backpropagation algorithm forms the basis for all of the deep learning systems that have been developed since, and for virtually all of the AI systems that have become drivers of the modern tech industry,” says McClelland, the Lucie Stern Professor in the Social Sciences in the Stanford School of Humanities and Sciences and director of the Center for Mind, Brain, Computation and Technology at Stanford’s Wu Tsai Neurosciences Institute.
It’s an outcome that earned the threesome a 2024 Golden Goose Award in recognition of the impact their basic science research has had on the world.
McClelland – like the NSF and ONR – never anticipated such a result. As a cognitive scientist, “I was never thinking about building an AI,” he says. But now the progress in AI has come full circle. “I’m drawing inspiration from what’s been learned in AI and deep learning to help me think about the human mind, while also asking what the mind and brain have to teach AI.”
From letter perception to neural networks
In the 1970s, when McClelland and Rumelhart began collaborating, their ideas about how the brain works diverged from the mainstream. Researchers such as Noam Chomsky and Jerry Fodor at MIT believed that language processing was an inherently symbolic process that involves manipulating organized arrangements of symbols according to clear rules.
McClelland had a different view. With a background in sensory neurophysiology and animal learning, he couldn’t reconcile the abstractions that people like Chomsky and Fodor talked about with what he’d seen in animal experiments. For example, experiments that measured single neurons in the cortex of a cat as it responded to line segments showed that perception didn’t seem to follow clear rules. “It’s continuous and doesn’t happen in discrete steps. And it’s sensitive to context,” he says. McClelland wanted to build a model that captured that sensitivity.
Meanwhile, Rumelhart published a paper in 1977 proposing that whenever we’re trying to understand a letter, a word, a phrase, or the meaning of a word in a sentence, we’re using all of the available information simultaneously to constrain the problem. Again: Context matters.
After McClelland read Rumelhart’s paper, the two met and soon realized they could formalize their ideas in a computational neural network model – a set of layered, simple computing elements (sometimes referred to as “neurons”) that receive inputs from each other (i.e., take context into account) and update their states accordingly.
“We wanted to develop a neural network model that could capture some of the features of how the brain perceives letters in different contexts,” says McClelland. For example, we recognize letters faster when they are in a word than when they are in a string of random letters; and we can intuitively determine what a word is likely to be even if part of it is obscured, distorted, or masked, he says.
Their initial model produced results similar to those seen in language experiments with human subjects – McClelland’s primary goal. This suggested that neural network models, which are parallel processing systems, are appropriate models of human cognition.
But the team’s initial model treated letters and words as discrete units (“neurons”) with connections between them. When Hinton joined the team in the early 1980s, he suggested the team should back away from the idea that each unit, or neuron, represents a letter, word, or some other symbol recognizable or meaningful to a human. Instead, he proposed, the symbolic representation of a letter, word, or other symbol should be thought of as only existing in the combined activity of many neurons in the model network. Parallel Distributed Processing, a two-volume book published by the group in 1986, set forth these theories.
Next came the coup de gras: The backpropagation algorithm that Rumelhart, Hinton, and Williams presented in Nature, also in 1986.
Until then, neural network models’ learning capabilities had been fairly limited: Errors were only adjusted in the final output layer of the network, limiting how effectively experience could shape the model’s performance. To overcome that limitation, Hinton suggested Rumelhart set minimizing error as a specific goal or “objective function,” and derive a procedure to optimize the network to meet that goal. From that inspiration, Rumelhart found a way to send the error signal backward to teach neurons at lower levels of a model how to adjust the intensity of their connections. And he and Hinton showed that such networks could learn to perform computations that couldn’t be solved with a single layer of modifiable connections. “Others developed backpropagation at around the same time,” McClelland notes, “but it was Dave and Geoff’s demonstrations of what backprop could do that struck a responsive chord.”
At the time, Rumelhart was using backpropagation with networks that had a very small number of input units and one layer of units in between the inputs and the output, McClelland says. By contrast, today’s models may have thousands of intermediate layers of neurons that are learning the same way.
Despite the elegance of the backpropagation algorithm, neural network models didn’t immediately take off. Indeed, it wasn’t until 25 years later that Hinton and his students leveraged Fei-Fei Li’s ImageNet dataset – using computers that were many orders of magnitude more powerful than the computers Rumelhart had at his disposal – to demonstrate convolutional neural networks’ impressive ability to classify images. “Before then, it was very hard to train networks that were deep enough or had sufficient training data,” McClelland says.
From the brain to AI and back again
Meanwhile, McClelland continued to use neural nets to model human cognition, consistently finding that these models effectively capture data from human experiments. He remains fascinated by the ways human cognition both resembles and differs from computerized neural networks. “The neural networks in our brains that allow us to function, speak, and communicate with each other in continuous sentences are clearly neural networks similar in some ways to these AI systems.”
Today’s language models, which use distributed representations and are trained using back propagation, have also achieved human-like fluency in translation, he says. “They can translate from one language to another in ways that no symbolic, rule-based system ever could.”
In addition, unlike the models that preceded them, large language models that rely on the so-called transformer architecture exhibit an interesting brain-like feature: They can hold information in context as new information is provided. “These models are using the information in context as though it were sort of hanging in mind – like the last sentence somebody said to you,” McClelland says.
And that development inspired McClelland to join collaborators at Google DeepMind to explore whether neural network models, like humans, reason more accurately when they have prior contextual knowledge compared to when they are given completely abstract topics requiring symbolic logic.
For example, people struggle with a question like “If some A are B, and all B are C, are any C A?” But phrase the same question in a specific context using familiar concepts (“If some cows are Herefords and all Herefords are mammals, are any mammals cows?”), and they are more likely to give the correct answer. “Our research found that that’s also what these models do,” McClelland says. “They are not pure logic machines. Humans and models alike infuse their thinking with their prior knowledge and beliefs.” They are also biased toward factually true or widely believed conclusions, even when they don’t follow from the given premises, he says. These results were published in a 2024 paper in PNAS Nexus.
“This research helps me convince others that the way we humans think is less strictly logical and more grounded in the kind of intuitive knowledge that comes from adjusting connection strengths across a neural network,” he says.
Despite these similarities, McClelland notes that there are differences. One that separates humans from machines is our ability to learn both fast and with little data. “These language models need approximately 100,000 times more data than a human would need to learn a language. That’s a lot!” he says. “So, we’re interested in understanding how the biological brain is capable of learning with far less data than today’s AI systems.”
Rumelhart’s backpropagation algorithm is part of the problem: “It’s why these AI systems are so slow and require so much data,” he says. Neural networks have nearly countless connections, and – compared with humans – they require lots of extra data to determine which connections matter most.
For example, if a large language model makes a mistake in predicting the last word in a sentence such as “John likes coffee with cream and honey,” it might learn to make the word “sugar” less likely in general, rather than learning that it’s just John who has unusual taste.
“All these connections are getting little changes to try to reduce the error, but to figure out which ones are important, you have to include many training sentences in which the common preference for sugar is maintained – and that’s inefficient,” McClelland says.
It’s also not the way the brain works. “Backpropagation was a wonderful solution to a computational problem,” McClelland says. “But no one ever thought it captured an accurate view of how the brain works.” In backpropagation, the network is activated in one direction and the errors are propagated backward across the same network, McClelland says. By contrast, in the brain, activation itself is bi-directional, and many different parts of the brain are interacting – including multiple senses perceiving the world simultaneously – to provide an integrated perceptual experience of the world.
Hinton was well aware that backpropagation failed to capture the way the brain works, and he went on to develop several other algorithms that are much closer to being biologically plausible, McClelland says. And now McClelland is taking on the same task but in a different way: by going back to studies of neuron activation in animals and humans.
“I’ve become inspired to find ways of understanding how our brains so efficiently target the right connections to adjust,” he says.
“Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies 8,180 acres, among the largest in the United States, and enrols over 17,000 students.”
Please visit the firm link to site