How to Make AI 'Forget' All the Private Data It Shouldn't Have

There’s a virtual elephant in AI’s room: It’s nearly impossible to make the technology forget.

And there are an increasing number of scenarios where consumers and programmers may not only want to remove data from a machine learning model—they may be required to do so by law. For instance, since Europe’s tougher data privacy regulations went into effect in 2018, they have created complications for companies worldwide.

Questions around data privacy will likely become thornier as generative artificial intelligence tools, like ChatGPT, become mainstream. The large language models (LLM) that underpin this technology typically subsume the information that users enter, helping the software provide better answers over time.

Working Knowledge spoke with Seth Neel, an expert on machine “unlearning” who is an assistant professor at Harvard Business School and principal investigator of the Trustworthy AI Lab at the Digital Data Design Institute at Harvard. Neel shared insights on the nascent field that focuses on wiping the powerful technology’s virtual memory.

“Increasingly, these models are trained on very large datasets that contain data that we might not actually want to include during training for various reasons.”

Rachel Layne: What is machine “unlearning” and why is it important?

Seth Neel: Before defining machine unlearning, maybe it’s good to set a baseline of what machine learning is. Broadly, I view machine learning as using data to create automated systems that can make predictions about the world. And now, even though generative AI feels very different from making a simple prediction, at a technical level, that’s really what it is.

In order to train these predictive systems, you need lots of example data input and output pairs. The algorithm during the training phase can learn to mimic, associating the right output to a given input. And this paradigm works fantastically well. Increasingly, these models are trained on very large datasets that contain data that we might not actually want to include during training for various reasons.

Layne: Under what circumstances—or why—would you want the machines to “unlearn” all or parts of that?

Neel: One reason is that unbeknownst to the model owner at training time, a small subset of the data used to train the model could be private. The concern is then that when the trained model is deployed, it may reproduce the private data verbatim, exposing that data to third parties interacting with the model. Even if the model does not “memorize” the data exactly, our recent work shows that an attacker who gains access to the model can in some cases extract the private data. Another reason is that a user may simply have requested that their own data be deleted, a right which they have due to compliance frameworks like the GDPR, [the European Union’s data regulations].

Aside from privacy considerations, there’s also training data that’s just incorrect, biased, or out of date. For instance, we might train our model, and then realize after the fact that some of these documents that we used to train ChatGPT contain a lot of racist content or misinformation. A more mundane example would be if during training the year was 2019 and one person was president, and now we’d like the model to unlearn that fact in light of a recent election.

Or perhaps the training data contains copyrighted data that we might not have permission to use, creating litigation risk for companies that train on that data. Just last week the New York Times sued OpenAI for copyright violations arising from training on NYT articles. And you’ll see, increasingly, companies like OpenAI are now licensing data to get around that concern.

In all of these cases, we might want to then remove the influence of that data from our trained model. Now, in a world where models were orders of magnitude smaller than they are today, pre-LLMs, this wasn’t as big of a deal, because you could just retrain the model from scratch. Just throw out that data and do it again. That’s really not plausible when training a model takes months and costs many millions of dollars. You can’t afford to just throw it out.

So, machine unlearning is really about computation more than anything else. It’s about efficiently removing the influence of that data from the model without having to retrain it from scratch.

Layne: Who would want to use unlearning?

Neel: Companies who are forced to comply with regulations like the EU’s new rules. Think about your Facebooks and your Googles, who’ve used user data to train predictive models and then are facing deletion requests from consumers. They may want to use machine unlearning to avoid having to retrain from scratch.

Companies like these large language model providers might want to facilitate machine unlearning. They already offer fine tuning and customization of models through their application programming interfaces.

This will also help facilitate more high-risk use cases in areas like health care and financial services, where you might have stricter, more stringent data privacy concerns.

“Research shows that the larger the model, the more memorization occurs, the more training data is sort of reverse engineerable.”

Layne: What makes generative AI so vulnerable to attack? For example, you talk about privacy leakage in your research.

Neel: This is a pretty active area of research and not unique to generative AI.

Prior machine learning systems did leak training data. The difference here is that first of all, these things are being so widely deployed. And so, there’s just greater risk when so many systems are being built on top of the same foundational model, as opposed to a single custom model that may or may not leak training data. It’s sort of the linchpin of all these other applications.

Then there’s issues of scale. Research shows that the larger the model, the more memorization occurs, the more training data is sort of reverse engineerable. And since these models seem to only be getting bigger not smaller, it’s an increasingly relevant problem.

And then the other issue here is fundamental to generative models. There’s been a lot of recent work that has shown that these models memorize at least up to a few percent of their training data. There is also a new ability for users to interact with the model via prompting, whereas for classifiers, typically they can only receive predictions, which can be exploited to develop more innovative privacy attacks. That’s a tremendous amount of data that is memorized by the model, and therefore, someone who’s just interacting with the model could learn what the underlying training data is.

Layne: You mention that some of the data that people may want to get rid of is personally identifiable information like social security numbers. What are some others?

Neel: For firms, they don’t want to give away their data for competitive reasons. A company might take an OpenAI model and fine-tune it, that is, customize it using their proprietary data as a company, and then you start providing your model for sale to other end users. Your data is really your most valuable asset. So, you’d be worried, then, about your customers or competitors being able to extract the underlying data. Maybe it’s just competitive data, sensitive financial data, anything that’s proprietary to the firm.

Layne: So, how do you stop leaking?

Neel: Here’s a very simple example. Suppose that I’m computing the average income of a group of people stored in a database and publishing that information every day. I just compute the average, and then let’s say the next day, I compute the average again. But on that day, let’s say you—Rachel—have left that group of people, and been removed from the database. If I just look at those two averages and subtract them, I can exactly get what your income was—this is a simple example called a differencing attack.

The way to get around this is that you add a small amount of random noise to those averages, so they’re not exactly correct. They’re noisy averages, but the amount of noise you add is enough to obscure the income of any one person. Even if you subtract the two numbers, there’s a margin of error now where it gives you the guarantee [that] there’s no way someone could really figure out with a certain degree of accuracy what your income is.

This solution of adding random noise to guarantee individual information is obscured, is known as differential privacy, and is applied widely in private data analysis. That’s a very simple example, but that core idea can be applied to training these models as well.

“Part of the reason that unlearning is interesting is because it is getting to something very fundamental about how these systems are developed.”

Layne: What draws you to unlearning?

Neel: Part of the reason that unlearning is interesting is because it is getting to something very fundamental about how these systems are developed. Machine “unlearning” has this nice privacy or regulatory motivation, but I still view it as primarily interesting because it gets to the heart of more fundamental questions around memorization in these models and the role of training data.

For machine learning practitioners, if you interpret the right to be forgotten (under the EU law) as requiring that users have the ability to delete their data, then the data has be deleted from the models as well. Almost every company in the world that does machine learning would fall into that compliance regime, because they all have European consumers as customers.

Layne: What else are you working on right now that really excites you?

Neel: I’m excited about using unlearning to make models more robust. Suppose someone tries to do what’s called a data poisoning attack, where they insert examples into the training set designed to mess up the model’s behavior. Can you use unlearning? It’s exciting because it has the potential to make systems much more robust against these types of training, data attacks, or bad data. And it’s also cool because it’s an application of learning outside of privacy.

I have more work on “red-teaming” models to make them safe for deployment, for example, new work on extracting training data from large language models. It’s still an open question how to reliably determine how much memorization a given LLM is doing of its training set, which I think could be a standard analysis that I imagine any company is going to want to run before they deploy a model trained on sensitive data.

And then I have some projects outside of privacy that are related to the way large language models make mistakes in a structured way. They have amazing abilities, but they still make very simple mistakes. And are there ways to change the architecture or the way that they’re trained in order to mitigate those mistakes.

You Might Also Like:

Feedback or ideas to share? Email the Working Knowledge team at hbswk@hbs.edu.

Image: AdobeStock/graphicINmotion and AdobeStock/Yeti Studio

HBS Working Knowledge

“Harvard Business School is the graduate business school of Harvard University, a private research university in Boston, Massachusetts. It is consistently ranked among the top business schools in the world and offers a large full-time MBA program, management-related doctoral programs, and executive education programs.”

Please visit the firm link to site

share this article Share this content

You Might Also Like:

share this article Share this content

You Might Also Like

Advice for the New CEO: Talk to Your Employees Early and Often

Will Global Demand for Oil Peak This Decade?

Employees Out Sick? Inside One Company’s Creative Approach to Staying Productive

Is It Even Possible to Dam the Flow of Misleading Content Online?

Share this content

Share this content