AI Tools That Modify Speech Have Downsides

Carolyn Geason-Beissel/MIT SMR | Getty Images

At the annual TED Conference earlier this year, the room fell silent for the demonstration of a potentially groundbreaking technology: a “cone of silence” tool, currently in development, designed to block out all surrounding noise so conversation partners can hear only each other’s voices. While many of us in attendance were impressed by the simulated “quiet table” in a busy restaurant, I could only wonder how much of each speaker’s voice might be lost along the way.

This isn’t the only technology “cleaning up” sound involving human speech. Artificial intelligence applications are being developed to soften call center workers’ accents to boost sales, an idea critics have called dehumanizing. Other technologies are reducing tones of anger from upset callers, in an effort to make the experience of call center workers less difficult.

It may be just a matter of time before similar tools are used in daily workplace interactions. Businesses are increasingly distributed and global, with employees working remotely or in satellite offices. Those with different accents or less fluency in English may be tempted or even encouraged to put such tools to use when speaking with colleagues or presenting at a meeting. And people who don’t want to put up with angry coworkers may press a filter on their phones to weed out that pesky emotion.

The potential benefits of using these tools seem clear — speeding up communication, preventing misunderstandings, avoiding the biases that some people have with various accents, and more. But these same tools also threaten to silence crucial parts of the very voices they aim to enhance.

The Challenge of Fostering Trust

Because my focus is on reading the room in order to elicit the truth, I’m closely watching how this technological revolution could affect our ability to recognize dishonesty and what that could mean for workplaces.

This shift raises significant concerns. To understand why, we need to acknowledge the complexities of human speech. When we speak, we convey more than just the meanings of our words. Our speech patterns, pitches, vocal tones, cadences, and other vocal mannerisms help shape how others perceive us. They can also provide indications as to whether we’re telling the truth.

In my book Liespotting, I shared research on multiple crucial auditory cues. For example, when people are lying, their speech sometimes takes on a strained or tense quality. Their vocal volume can change as well, sometimes as a result of nervousness. People may also overcontrol their voices: A liar often tries to control their body, becoming unnaturally still, and their voice can follow suit, becoming more monotone. At times, a person’s voice also takes on a pleading quality, as though they are imploring the listener to believe them. All of this can happen unconsciously.

The same technologies that can alter speech or filter out unwanted sounds can also strip away these vital cues. They can flatten the depth of communication, transforming context-rich conversation into lifeless, robotic noise, erasing the subtle nuances that make interactions human. Those clues of deception can disappear in the process.

When the speech we hear from someone is partially faked by technology, we may be less likely to develop trusting bonds with them.

No single auditory cue is informative on its own. When I train government agencies, fraud investigators, business leaders, and others, I teach them to “baseline” people. That requires them to observe people to pick up on subtleties in how they normally communicate and behave, to establish a reliable reference point for measuring red-flag changes later.

This same process of careful observation also encourages stronger connections. Just as we can sense dishonesty, we can sense honesty. The same cues that, sometimes unconsciously, alert us to problems can also foster connection. When the speech we hear from someone is partially faked by technology, we may be less likely to develop trusting bonds with them — even if we know consciously that the person is using sound-enhancing technology.

Already, audio deepfakes are wreaking havoc in segments of society. Robocalls of politicians’ voices can endanger elections. Legal issues are arising as music fans use AI to simulate their favorite artists performing other artists’ songs. Tools that manipulate speech could similarly blur the line between the real and the fake, to the point where we no longer know how someone truly expresses themself.

Four Tips for Maintaining Connection

Honesty is a core tenet of good leadership and a strong corporate culture. To signal that honesty, executives, managers, and employees at all levels need to communicate authentically. This means that we do better work and create stronger relationships when we show up as our somewhat messy selves — not just in how we present but also in how we speak.

Leaders rely on the full spectrum of human communication to gauge team morale, address issues, and build authentic connections. Removing accents and emotional nuance, as well as slight noises in employees’ speech patterns that technologies might read as background sound, could encourage a homogenized dialogue that fails to capture the true state of affairs. Employees could put less trust in their leaders, finding them disconnected, stoic, or insincere, ultimately weakening the cohesion and effectiveness of the entire organization.

To navigate this challenge, I recommend these actions:

Distinguish between transactional and interpersonal communication. Each day in our workplaces, we have all sorts of brief exchanges with colleagues, managers, and reports. Usually, we need to get or share a piece of information quickly, such as when something will be completed, whether someone needs more resources, or what a client said on a phone call. But at other times, we need to discuss ideas, perspectives, and experiences. Organizations should make clear that if a conversation fits the latter description, it is interpersonal rather than just transactional. In those cases, natural voices should be used, with no technological filtering, so that we connect more holistically.

Removing accents and emotional nuance could encourage a homogenized dialogue that fails to capture the true state of affairs.

Increase in-person interactions. Trust is built faster when we hear and see each other simultaneously. We naturally pick up on body language and facial expressions, including micro-expressions — fleeting signs of emotion that can last a fraction of a second. Organizations can foster connections by encouraging real-life interactions, especially in shared spaces for in-person and hybrid work. Remote workers can achieve similar trust through one-on-one video meetings focused on getting to know each other, such as peer coaching sessions.

Encourage storytelling as a corporate norm. When someone tells a story, they provide a full range of vocal expressions. Emphasizing personal narratives and reflections can counteract the homogenizing effects of AI-filtered speaking. During conference calls, all-hands meetings (whether in person or via video), team discussions, and group retreats, invite staffers to share a recent experience that they think might provide insight into the organization.

Track emotional engagement. If your organization begins to use technologies that filter conversations, keep a close eye on whether emotional engagement starts to decrease by using tools or surveys to track how engaged people feel with their work. Granted, any individual change may be a matter of correlation rather than causation. But if emotional engagement drops continuously as these filtering technologies become normalized, it could be a strong sign that action is needed.

Ultimately, businesses should keep in mind that relationships are essential in building a successful “one team” mentality. Relationships rely on trust, which in turn relies on sensing and perceiving authenticity. Nothing builds that authenticity more than people speaking in their own voices — accents, quirks, and all.

About the Author

Pamela Meyer is CEO of Calibrate, a deception detection and inside threat mitigation consulting firm. Her TED Talk “How to Spot a Liar” is one of the 20 most popular of all time.

Reprint #:

66235

“The MIT Sloan Management Review is a research-based magazine and digital platform for business executives published at the MIT Sloan School of Management.”

Please visit the firm link to site

share this article Share this content

Get Updates on Leading With AI and Data

The Challenge of Fostering Trust

Four Tips for Maintaining Connection

About the Author

Reprint #:

share this article Share this content

You Might Also Like

How to Amplify the Advantages of Working at a Founder-Led Company

How a 160-Year-Old Startup Uses AI: The Heineken Company’s Ronald den Elzen

A Better Way to Avoid Project Delays

Building a Data-Driven Culture: Three Mistakes to Avoid

Share this content

Share this content