How ‘Forgetting’ Undesirable Data in AI Models Negatively Impacts Performance

Exploring Unlearning Techniques in Generative AI Models

"Unlearning" techniques aim to enable generative AI models to forget specific, undesirable information they have learned from their training data, such as sensitive personal information or copyrighted content. However, the current methods for unlearning come with significant drawbacks: they can severely diminish a model’s ability to handle even basic queries.

This insight is supported by a new study co-authored by researchers from the University of Washington (UW), Princeton University, the University of Chicago, USC, and Google. The study highlights that prevalent unlearning techniques often degrade the performance of models, sometimes rendering them nearly unusable. "Our evaluation suggests that the unlearning methods that currently exist are not yet suitable for practical use in real-world settings," stated Weijia Shi, a researcher involved in the study and a Ph.D. candidate in computer science at UW. "There are no efficient approaches that allow a model to forget specific data without a substantial decline in its performance."

Understanding Model Learning

Generative AI models do not possess true intelligence. Instead, they serve as statistical systems that predict various forms of data, including text, images, music, and videos. By processing vast collections of examples—such as films, audio recordings, and written content—AI models assess the likelihood of certain data patterns based on context.

For instance, when presented with an email fragment like "Looking forward…," a model designed for autocompleting messages might suggest completing it with "... to hearing back," based solely on the patterns it has observed in previous emails. There is no intent from the model; it simply generates an educated guess.

Leading models, including OpenAI’s GPT-4o, typically train on data sourced from publicly available websites and datasets online. Many companies argue that fair use supports their practice of scraping data for training purposes without informing or compensating the original creators. However, not all copyright holders agree. Authors, publishers, and record labels have filed lawsuits seeking to change these practices.

This copyright issue has propelled interest in unlearning techniques. Last year, Google, in collaboration with various academic organizations, initiated a competition aimed at encouraging the development of novel unlearning methods.

Unlearning could potentially provide a mechanism to eliminate sensitive information from existing models, such as confidential medical records or incriminating images, in response to user requests or legal demands. AI models often inadvertently collect substantial amounts of private information, from phone numbers to sensitive documents. While some organizations have introduced tools that allow data owners to request the removal of their data from future training datasets, these opt-out solutions do not apply to previously trained models. Unlearning would offer a more comprehensive solution for data deletion.

The Complexity of Unlearning

However, unlearning is not as straightforward as pressing the "Delete" button.

Current unlearning techniques depend on algorithms designed to "steer" models away from the information to be forgotten. The goal is to influence predictions so that a model seldom, if ever, outputs certain data.

To evaluate the effectiveness of these unlearning algorithms, Shi and her colleagues created a benchmark called MUSE (Machine Unlearning Six-way Evaluation). This benchmark aims to assess an algorithm's capacity to stop a model from producing training data verbatim (known as regurgitation) and to erase any evidence that the model was originally trained on that data.

To excel in MUSE, models must forget specific information. For instance, they are tested on their ability to recall lines from the Harry Potter series and news articles. Given a text fragment from Harry Potter and the Chamber of Secrets, MUSE checks whether an unlearned model can recite the complete sentence, answer related questions, or indicate previous training on that text.

Additionally, MUSE examines whether the model retains general knowledge, such as J.K. Rowling being the author of the Harry Potter series, after the unlearning process. Researchers refer to this retained knowledge as the model’s overall utility. A significant decline in utility implies that the model has lost related knowledge, which limits its ability to answer questions accurately.

The study's findings indicate that while the unlearning algorithms tested did enable models to forget certain pieces of information, they simultaneously compromised the overall question-answering abilities of these models, resulting in a notable trade-off. "Creating effective unlearning methods for models is complicated because knowledge is deeply interconnected within the model," Shi explained. "For example, if a model is trained on both copyrighted Harry Potter books and freely available content from the Harry Potter Wiki, current unlearning methods that aim to remove the copyrighted text also diminish the model’s knowledge about the Wiki."

Future Directions in AI Unlearning

Are there viable solutions to this challenge? Not at the moment, underscoring the urgent need for further research in this area, according to Shi.

For now, vendors relying on unlearning techniques to address their training data challenges seem to be at a standstill. While a technological breakthrough might eventually make unlearning feasible, for the foreseeable future, companies must explore alternative strategies to mitigate the risk of their models generating inappropriate or unauthorized content.

Most people like

Find AI tools in YBX