I hear you

This week neuro-linguist Laura Gwilliams breaks down how sound becomes information in the human brain, specifically focusing on how speech is transformed into meaning.

Welcome back to “From Our Neurons to Yours,” a podcast where we criss-cross scientific disciplines to take you to the cutting edge of brain science. In this episode, we explore how sound becomes information in the human brain, specifically focusing on how speech is transformed into meaning.

Our guest, Laura Gwilliams, a faculty scholar at the Wu Tsai Neurosciences Institute at Stanford, breaks down the intricate steps involved in this transformation. From the vibrations of the eardrum to the activation of specific neurons in the auditory cortex, Gwilliams reveals the remarkable complexity and precision of the brain’s language processing abilities.

Gwilliams also delves into the higher-level representations of meaning and sentence structure, discussing how our brains effortlessly navigate interruptions, non sequiturs, and the passage of time during conversations. Join us as we unravel the mysteries of speech comprehension and gain a deeper understanding of how our minds process language.

Learn more

Laura Gwilliams’ research website and Stanford faculty profile

Episode Credits

This episode was produced by Michael Osborne, with production assistance by Morgan Honaker, and hosted by Nicholas Weiler. Art by Aimee Garza.

Episode Transcript

Nicholas Weiler:

This is From Our Neurons to Yours, a podcast from the Wu Tsai Neurosciences Institute at Stanford University. On this show, we crisscross scientific disciplines to bring you to the frontiers of brain science. I’m your host, Nicholas Weiler.

We are going to start today’s episode with this thought experiment. Pretend you’re the size of a molecule and you’re able to sit right on the opening of a human ear canal. You’re exactly at the interface between sound in the outside world and sound entering the ear on its way to the brain. What exactly happens next? What is the process by which sound becomes information? And more specifically, what is the process by which human speech is transformed into meaning?

Laura Gwilliams:

I can explain all of the stuff that happens in order for people to understand language.

Nicholas Weiler:

Laura Gwilliams studies this very question as a faculty scholar at the Wu Tsai Neurosciences Institute.

Laura Gwilliams:

So first of all, take the auditory signal of speech. This is just fluctuations in air pressure that — some of those fluctuations of the air particles travel down our ear canal and beat on our eardrum. Then the fluctuations of the eardrum actually get amplified by these tiny little bones that are connected to the eardrum. So this takes very minute fluctuations and amplifies them, makes them more extreme. Those vibrations then get sent to what’s called the cochlea. It looks like a snail shell, but it’s only about a centimeter or so tall, so it’s quite tiny. The cochlea receives these vibrations, but what’s really amazing is that this cochlea contains tiny little “hair cells.”

Nicholas Weiler:

They’re basically little sound sensors.

Laura Gwilliams:

Exactly. So these little hair cells vibrate. You have some hair cells which like to vibrate when the pitch of the sound is high. These cells live at the very beginning of the cochlea. And then as you travel down this little snail shell structure, the hair cells prefer to vibrate to lower and lower frequencies. So if you are hearing, let’s say a bird chirping, the hair cells at the base of the cochlea are going to vibrate versus if you hear a foghorn that’s going to be the hair cells at what’s called the apex or the tip of the snail shell.

Nicholas Weiler:

Basically splitting up the sound that’s coming in into high pitches and low pitches and everything in between, so that you’ve now got all these different channels that you can interpret.

Laura Gwilliams:

Exactly. Then all of these hair cells connect to the cochlear nerve or the auditory nerve sometimes called. At this point, the signals are no longer kind of analog in terms of vibrations of movement. Now we’re talking about an electrical signal. So it’s now been transformed into electrical impulses. These electrical impulses travel through the auditory nerve, they reach the brainstem. Some complicated things happen that I won’t talk about. These go to the midbrain, and then finally they go to the thalamus, which then finally actually connects to [the auditory cortex of] your brain.

Nicholas Weiler:

Wow, so there are a bunch of steps in this process to get from the ear all the way to the brain. It sort of has to jump several times.

Laura Gwilliams:

Right. So the auditory signal once it reaches your cortex, has actually gone through a lot of pre-processing and manipulation before it’s even reached your brain. Once the signal reaches auditory cortex, this organization by frequency. So if you remember I said that the cochlea at the base is going to prefer high frequencies, and at the tip is going to prefer low frequencies. This spatial organization is maintained in the primary auditory cortex. So if you were to look at the neurons in primary auditory cortex, you are going to see that there is a gradient of frequency. So as you are hitting the notes on a C major scale, as you’re going from low to high, the neurons that fire are going to be different, but they’re going to be neighboring with neighboring frequencies.

Nicholas Weiler:

So it’s got this map of frequencies that’s maintained in the brain. That’s so cool.

Laura Gwilliams:

Yeah, it is. And up until that point, we can say that the way the human brain processes sound and the way that say monkeys or even certain types of birds process sound are pretty similar.

Nicholas Weiler:

Right. This is how we hear the world around us.

Laura Gwilliams:

Exactly. But then you have speech comprehension or language abilities, which then are all built on top of these kind of basic auditory processes. So it’s believed that the information gets routed to primary auditory cortex, and there is a higher order auditory region, which lives right next door in the temporal lobe, which also processes acoustic properties of the speech. But it over-represents or is extremely precise at processing the types of sound features that are very relevant for speech. Neurons in this brain area will code the difference between let’s say a P sound and a B sound, and will code that difference in a more precise way than you would get from just looking at the auditory signal alone.

Nicholas Weiler:

And are these hardwired in the human brain? Are we born with the ability to hear these different sounds?

Laura Gwilliams:

Yeah, I love that question. So actually I was listening to some previous episodes of this podcast, and I know you had some guests that were talking about how babies are citizens of the world, that they are able to acquire speech distinctions across any language.

Nicholas Weiler:

That was our conversation with Carla Shatz, I think.

Laura Gwilliams:

Right. I really enjoyed listening to that one. But if you are not exposed to a language, let’s say the first time you are exposed, you are 32 years old, you will not have acquired the ability to essentially perceive the difference between these subtle distinctions in the same way you can distinguish them in your native language. And this is not something that you can acquire and maybe you can appreciate either if you’ve tried to learn a language yourself later on in life or if you have spoken to others. It’s possible to acquire the syntactic structure, the meanings of words, but it’s very hard to produce speech with a perfect accent. And it’s also very difficult to perceive these differences in speech as well. And the perception side, there is a saving grace. Context helps to disambiguate some of these uncertainties that you might have if you were just presented with a syllable [like] ‘pa’ / ‘ba’.

But these abilities are something that you need to acquire early on. And the interesting thing about language is that we have these individual speech sounds which get combined to form syllables, which you can combine to form words and then whole phrases and whole sentences. It seems that not just different parts of the brain, but also different sizes of ensembles of neurons are recruited when you are processing different properties of language at these different levels of abstraction and complexity. In general, we understand much more at the lower level of this hierarchy. So how the individual speech sounds are processed, and as you climb up the hierarchy to be much more symbolic and hardcore language and less sensory and auditory, it becomes much more challenging to study than the lower levels.

Nicholas Weiler:

And this is one of the things I really wanted to delve in with you in this conversation, which is I was just thinking about what we do do when we’re listening to speech or what listeners are doing right now as they’re listening to us have this conversation. We’re not speaking in complete sentences all the time. There are interruptions. There’re non sequiturs. It’s amazing that we can keep track of conversations at all. But you don’t have to think about that, our brains are so wired to do that. We just hear meaning in each other’s words. I’d love to hear based on your research and the research that you and colleagues are starting to do, looking at these higher level representations of meaning and sentence structure and these bigger picture things, does that perception map on to how the brain produces and represents speech? I mean, are we doing it at the level of meaning?

Laura Gwilliams:

Yeah. So the short answer is that we’re doing it at all levels at the same time. Just to kind of even more emphasize your point, speech comprehension is not just easy and automatic. It’s almost inevitable. It is very difficult for me to have this conversation with you right now and not understand what it’s that you’re saying. Even if I wanted to not understand you, I would have to actively distract myself with something else.

Nicholas Weiler:

And probably at the same time, it would be very distracting if you were trying to keep track of which words I’m using because mostly we’re just talking ideas.

Laura Gwilliams:

Right. Exactly. If I decided, okay, instead of listening to the message you are trying to give to me, I’m instead going to try to notice every time you make a P sound. Then okay, I’m going to be distracted with focusing in on your sensory output and not understanding the overall message.

Our experience of language is this automatic understanding or derivation of meaning from the noises that that person is making. But most of the time you’re not actually paying attention to the noise per se, you are just paying attention to the concepts and ideas that person is trying to convey to you. I could say, “oh, this morning I decided to ride a cactus to work over the clouds and I did all of this while I enjoyed a nice Piña Colada made with puppy ears”.

Nicholas Weiler:

I’ve got that image in my head now. Thank you.

Laura Gwilliams:

So I managed to send this image to you through this noise that I just made, and that is an image that you have never conjured for yourself before.

Nicholas Weiler:

I can confirm that.

Laura Gwilliams:

Yeah. We have this amazing ability to be creative and convey new ideas and know that the person we’re conveying them to will understand. That is no small feat, and yet the speed and automaticity with which it is achieved is really remarkable.

Nicholas Weiler:

Now one thing you said was that the brain has to do processing at all of these levels at the same time. I’d love to hear more about that and any of the other specific problems that our brains are solving without us even noticing.

Laura Gwilliams:

Right. So this is an extremely complicated problem, and the way that this is solvable is the brain has access to many different types of information about the speech at the same time and uses all information together to disambiguate what it is that’s going on. So let me be a little bit more concrete. So you can make, let’s say a broad stroke distinction between the acoustic signal that’s coming in and all of the stuff that you’ve learned about your language and about how interactions usually go with other individuals, which set up pretty strong expectations about what it is that person is about to say to you. And so you have bidirectional flow of information, the actual speech sounds that that person is making coupled with my very strong expectations of what speech sounds that person will make and what types of concepts and ideas they are going to convey to me.

If I’m talking to you and let’s say we’re on the streets and a very loud truck goes by, which completely distorts four seconds of your speech. I could probably figure out that four seconds based on the context of the conversation and all the sentences that have come before that. And vice versa, let’s say it is the first time that I’m talking to you and you introduced a completely new topic to me. Well I don’t have any expectation about what it is that you are about to say, but I can use the acoustic input that you are producing in order to understand the speech sounds to form the words, which then create context, which then feeds into this subsequent bi-direction between my expectation and the sensory signal.

Nicholas Weiler:

Wow. So we’ve got these two streams coming together, what we’re hearing and what we’re expecting, and we’re sort of at the interface there figuring out what it is that’s actually being said. That’s so interesting. One of the other things that I wanted to ask about is there’s also this component of time at the same time as we are trying to get our brains aligned on the specific words or the specific ideas that we’re saying. We also have to keep track of what you said before, sometimes within a word and sometimes within a whole sentence. As I think I’ve mentioned on the podcast before, I’m working on learning German and German famously sometimes has a tendency the verb to the end of the sentence to put, which for my brain is challenging because I didn’t grow up with that. But it’s something that we have to do in all languages. You have to be able to hold on to one idea while you figure out what it’s referring to later on in a sentence. How does the brain even begin to do that?

Laura Gwilliams:

I love that question, Nick. You actually tap into one of the things about speech processing that get me the most excited, which is how the dynamics of speech are processed because the neurons are firing over time while the input is being received over time. And so these two dynamic signals need to be reconciled in our science. So you could imagine one scenario where you’re hearing someone talk and with every speech sound they say, you say, okay, the brain processes this sound and then discards it. And processes this sound and then discards it. But instead, what seems to happen is that the brain processes, let’s say that this was a P like Peter and not a B like baboon, and it actually holds onto that information for hundreds and hundreds of milliseconds up to about a second after that speech sound has disappeared from your cochlear.

In brain terms, that is a really long period of time to keep information around. Why does the brain do this? Well it does it to solve exactly these types of conundrums that you are raising. Which is that language, especially when received in the auditory modality of speech, you need to be able to combine and make links between things that are not next to each other. So if I say, “I like Melinda, yesterday she told me that she bought a donut”, whatever. When I say she, you need to link that to knowing that that was Melinda, who is she. And it seems that one way that the brain solves this is by essentially keeping a buffer of the information around waiting until it receives the corresponding object, let’s say, that it needs to link that to. And this speaks, I think again, to the brain is processing multiple properties of language at the same time. Let’s say the bottom up sensory information, also the top down more semantic conceptual information, but it’s not only processing multiple properties of speech at the same time. It’s doing so over multiple time periods at the same time as well. The brain is extremely greedy in terms of the information that it keeps around in order to do everything that it can to disambiguate and figure out what message this person is trying to convey.

Nicholas Weiler:

That’s so funny. That reminds me, I’ve been thinking as we’re talking of what we are now experiencing with speech to text or speech recognition technology. You can speak to Siri or Alexa or Google Assistant and get a response, which means that the system is able to process what you’re doing. Is that inspired by how the human brain works or does it work completely differently?

Laura Gwilliams:

These systems work very differently. The difference between the human brain and these automatic speech recognition systems are probably most prevalent when you look at one, how much information they need to consume to be able to get something like human level performance. They need to have been trained on multiple lifetimes worth of data, where if you compare that to the comprehension abilities of say a five-year-old who have received much less data, it’s just incredible differential of how much information these systems need in order to achieve.

Nicholas Weiler:

Right. So somehow we do this as humans so much more efficiently. Kids start talking at a year and they can understand some language before that. So they don’t have that much data, and somehow our brains are able to just lock on and we’ve got these systems built in to how our brains function that are really tuned to pick up and understand language.

Laura Gwilliams:

Yeah. It was a big discussion for a long time. And I guess the discussion is still ongoing as to how much of this kind of processing architecture and say in data science terms, how many of the parameters have already been trained before we’re even born. Maybe that is part of how we’re able to acquire language so quickly and so easily. We essentially have some pre-training through our evolution that gets us ready to process the information to use that in order to obtain comprehension abilities.

Nicholas Weiler:

Interesting. Well, it seems like studying speech gets at one of the core questions of neuroscience. How do we represent meaning in our brains? What do you think we can learn about our minds studying how we understand speech?

Laura Gwilliams:

Yeah, thank you. I think that in studying speech comprehension, you are also studying the broader question of how concepts are stored and accessed and manipulated, and in some senses I see speech and language processing as our gateway into the mind more generally. This is the way that we kind of express what’s going on inside our heads, and by understanding how language is actually organized and accessing these different thoughts and ideas and emotions, it leads you to better understand those thoughts, memories, and emotions that are actually encoded in neural activity. And so in many ways, it’s the road to understanding the human condition more generally.

Nicholas Weiler:

Absolutely. Yeah. It is a really exciting area of research, and I have so many more questions I’d love to ask you, but unfortunately we’re out of time. But thank you, Laura, so much for joining us on the show.

Laura Gwilliams:

Yeah. Thank you so much for having me.

Nicholas Weiler:

Thanks again to our guest, Laura Gwilliams. For more on Dr. Gwilliams’ work, check out the links in the show notes.

We are very excited for season two from Our Neurons To Yours. If you are too, please take a moment to give us a review on your podcast app of choice and share this episode with your friends. We’ll link to Apple Podcast reviews in the show notes.

This episode was produced by Michael Osborne with assistance from Morgan Honaker. I’m your host, Nicholas Weiler.

“Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies 8,180 acres, among the largest in the United States, and enrols over 17,000 students.”

Please visit the firm link to site