Safeguarding machine learning systems

We live in the era of software 2.0.

An era of computer systems capable of learning desired behaviors, like how to recognize voices and faces, and predicting outcomes, like whether a tumor is benign or malignant. Whereas its predecessor—software 1.0 if you will—is relatively straightforward with its lines of code, machine learning is built on a network of mathematical transformations that parses data, finds patterns, and produces an outcome. These systems are increasingly superior to humans in their accuracy, and they are transforming nearly every aspect of our lives from travel to medicine to entertainment to security.

But such complexity is problematic. Machine learning systems are so large that to expedite their creation, developers often reuse foundational building blocks, or neural network layers, called primitive models. These models are available online, and it’s often unclear who their source is—and whether or not he or she can be trusted.

“These systems are so complicated, it doesn’t make sense to build them from scratch,” says Ting Wang, an assistant professor of computer science and engineering at Lehigh University’s P.C. Rossin College of Engineering and Applied Science. “One way to build a system quickly is to take one model from here, one from there, and put them together. But many models out there are contributed by untrusted third parties. If these models contain a vulnerability, your system inherits that vulnerability.”

Wang recently received support from the National Science Foundation (NSF) to better understand the security implications of reusing these primitive models and to develop tools to help mitigate those threats. His proposal to develop trustworthy machine learning from untrusted models recently won him a Faculty Early Career Development (CAREER) Award.

The CAREER program is considered one of the more prestigious awards granted by the NSF. They are awarded annually in support of junior faculty members across the U.S. who exemplify the role of teacher-scholars through outstanding research, excellent education, and the integration of education and research. Wang’s award provides support at the level of approximately $509,895 for a five-year period. He is the sixth Rossin College faculty member to receive a CAREER Award in the past 12 months. 

The grant is related to Wang’s previous NSF award on adversarial inputs. That research project, called Eagle Eye, focused on identifying infinitesimal manipulations to data that triggered deep learning systems to fail.

“Eagle Eye was about how you can manipulate the input, the data,” says Wang. “Changing just a few pixels in the image of a human face, for example, will be imperceptible to the human eye, but the change is significant to the learning system. The facial recognition system may be perfect, but if the input, the image, is modified, the system will fail. This new project is about how the system itself can be adversely manipulated. Your data can be great, but your system may already be doomed. So there’s an interesting interplay between the two projects.”

Wang calls these system manipulations “model reuse attacks.” And just like in Eagle Eye, their ramifications are potentially disastrous.

“The main concern comes from the increasing use of machine learning systems in all kinds of security sensitive domains like autonomous driving, medical diagnosis, high-frequency trading, and legal document analysis,” he says. “All those applications require high degrees of accuracy, so if you make a mistake, the consequences can be huge. You want to make sure the system is behaving as expected because any vulnerabilities can be exploited.”

Wang, who is affiliated with Lehigh’s Institute for Data, Intelligent Systems, and Computation (I-DISC), a hub for interdisciplinary research, hopes that by understanding the threats posed by reusing primitive models, he and his team can develop theories and models to help guide developers as they build machine learning systems.

“We propose lifelong security,” he says. “Think about the life cycle of a machine learning system. You download models from the web, put them together, tune them to make them work, then deploy the system and operate it. It’s a long sequence. So we want to provide protection throughout the process, building tools that can verify if a primitive model is safe or not, then a monitoring tool so you can detect if an abnormal phenomenon has occurred and whether or not you can fix it. It would protect you through the whole process. That’s the vision we have.”

Story by Christine Fennessy, Staff Writer, P.C. Rossin College of Engineering and Applied Science

About Ting Wang

Ting Wang is an assistant professor of computer science and engineering at Lehigh University’s P.C. Rossin College of Engineering and Applied Science. His research explores the interplay between machine learning and privacy and security, focusing on making machine learning systems more practically usable through mitigating security vulnerabilities, enhancing privacy awareness, and increasing decision-making transparency. He directs the Algorithmic Learning, Privacy, and Security (ALPS) Lab at Lehigh and is affiliated with the university’s Institute for Data, Intelligent Systems, and Computation (I-DISC), an interdisciplinary initiative that’s pushing the envelope of data analytics research. Wang joined Lehigh in 2015, following his time as a research staff member at IBM Thomas J. Watson Research Center. He has a doctoral degree from Georgia Tech and completed his undergraduate studies at Zhejiang University in China.

Ting Wang

Ting Wang, an assistant professor of computer science and engineering, is the sixth Rossin College faculty member to receive NSF CAREER funding in recent months. (Credit: Douglas Benedict)