Tweaking the tools

Resolve Magazine Fall 2023 >> Making Sense of Machine Learning >> Stories >> Tweaking the tools

A lot of us depend (heavily) on GPS to get around. But what if there’s a hurricane or a wildfire and we have to immediately evacuate? Could we safely rely on those navigation systems?

“Apps like Google Maps use predictive models that are trained heavily on non-disaster-event-type patterns, so when you deploy them in extreme events, when patterns are unique and unforeseen, the models might not predict as well,” says I-DISC Associate Director Parv Venkitasubramaniam (pictured below), a professor of electrical and computer engineering who directs the college’s Data Science program.

On top of that, he says, the machine learning models powering these systems can have more than half a million parameters, so it’s impossible to say why they’re making certain predictions regarding traffic speed, flow, and volume. That opaqueness is especially problematic during extreme events.

Parv Venkitasubramaniam “Human beings are making critical decisions regarding evacuation and emergency response, so you can’t say, ‘The black box says, do this.’ You need to give first responders something that’s more transparent and explainable as to why the model is making a prediction.”

Venkitasubramaniam is part of an interdisciplinary research group (within Lehigh’s I-DISC) called Explainable Graph Learning that is exploring this problem. The researchers are taking knowledge from the domain of transportation engineering—the study of the interactions between travelers and infrastructure—to build machine learning models. For example, he says, science dictates that traffic will diffuse a certain way if there’s congestion versus a free flow of vehicles. That information can be captured by equations that can then be used to build more effective models.

“And what we’ve found is that these domain-informed models with far fewer parameters perform better when you have unexpected or unforeseen patterns of traffic,” he says.

Venkitasubramaniam is also part of a separate interdisciplinary team working to solve a well-known privacy issue with machine learning, called membership inference. It happens when sensitive information from training data, such as the health information of individuals, is revealed when adversarial queries are asked of a model.

He says that developers try to ensure privacy by adding noise to either the input data or output inference, or, more typically, by adding randomness to the training process where “you keep making things a little fuzzy.” The problem, however, is that noise compromises the quality of the output.

Venkitasubramaniam and his colleagues believe they have the solution that strikes a balance between privacy and noise. They hypothesize that the reason models reveal their information is because they’re overspecified, meaning they have so many parameters that the models can store a huge amount of—potentially revealing—information about the data. The solution, they say, is to reduce the dimensionality of the model by compressing it, rather than obfuscating it with noise.

“When you compress it, all you’re saying is, ‘I don’t want a rich model. I just want the model to be good enough to make the inference,’” he says. “And because I’m starving the model, I’m getting the privacy I desire.”

The team has been able to show that a compression algorithm they’ve developed does better than some of the state-of-the-art obfuscation methods.

“We need to explore this principle further, but ultimately, this could be applied to anything that involves sensitive data, especially in healthcare. Our approach here is more fundamental. We’re changing the tool to suit the needs of the problem.”

Main image: Zerbor/Adobe Stock

<< PREVIOUS: "Solving ambiguity in robot perception"

NEXT: "Subjecting models to humans" >>

You are here

Tweaking the tools