Machine learning systems are everywhere. They predict the weather, forecast earthquakes, provide recommendations based on the books and movies we like, and even apply the brakes on our cars when we’re not paying attention.

To do this, software programs in these systems calculate predictive relationships from massive amounts of data. The systems identify these predictive relationships using advanced algorithms—a set of rules for solving math problems—and “training data.” This data is then used to construct the models and features that enable a system to determine the latest best-seller you wish to read or to predict the likelihood of rain next week.

This intricate process means that a piece of raw data often goes through a series of computations in a system. The computations and information derived by the system from that data together form a complex propagation network called the data’s “lineage.” The term was coined by Yinzhi Cao, an assistant professor of computer science and engineering, and his colleague, Junfeng Yang of Columbia University, who are pioneering a novel approach to make learning systems forget.

Considering how important this concept is to increasing security and protecting privacy, Cao and Yang believe that easy adoption of forgetting systems will be increasingly in demand. The two researchers have developed a way to do it faster and more effectively than can be done using current methods.

Their concept, called “machine unlearning,” is so promising that Cao and Yang have been awarded a four-year, $1.2 million National Science Foundation grant to develop the approach.

“Effective forgetting systems must be able to let users specify the data to forget with different levels of granularity,” said Cao, a principal investigator on the project. “These systems must remove the data and undo its effects so that all future operations run as if the data never existed.”

Building on work that was presented at a 2015 IEEE Symposium and then published, Cao and Yang’s “machine unlearning” method is based on the fact that most learning systems can be converted into a form that can be updated incrementally without costly retraining from scratch.

Cao believes he and Yang are the first to establish the connection between unlearning and the summation form.

Read the full story at the Lehigh University News Center.

-Lori Friedman is Director of Media Relations with Lehigh University's Office of Communications and Public Affairs.

Department/Program: