Social computing research by PhD student Patrick Skeba takes an interdisciplinary approach to examine how our attitudes about data sharing influence online behavior—and why our postings may pose a threat to internet anonymity

Photo by Austin Distel on Unsplash

“I’ve always wanted to have knowledge in a breadth of fields, not just depth in one,” says Patrick Skeba. 

Still, PhD programs are typically all about specialization. So when Skeba started applying for graduate programs in computer science, he figured he’d pursue machine learning. But he soon realized he could satisfy his wide-ranging curiosity on Lehigh’s Mountaintop Campus, working in a field he’d never really considered: social computing. 

“My first day in the office, my advisor was giving me books on computers, on philosophy, on sociology, on all kinds of things,” says Skeba, who is now in his fifth year as a PhD student, with Eric P. S. Baumer, an associate professor of computer science and engineering, as his advisor. “I was really attracted to his interdisciplinary approach. And so I got less and less interested in building machine learning models, and more interested in asking, ‘What are these models doing to us?’”

In other words, what are they doing to our privacy?

For more on Patrick Skeba's research, check out his recorded presentation from the 2020 ACM Conference on Computer-Supported Cooperative Work and Social Computing:

Skeba’s research uses two different approaches to answer that question. The first is what he and his team call their “folk theories of algorithms” study. Skeba uses interview and survey methods to query regular internet users on their understanding of how their personal data is collected by companies like Google, Facebook, and Amazon.

“These companies don’t release a lot of information about how their [algorithmic] programs work,” he says. 

That void leads people to make all sorts of assumptions about how vulnerable their privacy is when they do certain things, like post comments online, he says. “And so what we see sometimes are guesses that are quite far off.”

Some people think algorithmic systems can’t infer much, if anything, from their comments, so they post without concern. Others, however, are convinced that the algorithms can derive all sorts of information about them, and are too paranoid to post anything at all.

“You end up in a situation where there’s a disconnect between how these systems work and how people understand them,” says Skeba. “So figuring out how people are imagining these systems can help us better understand their behaviors. And that, in turn, can help us educate users and give them the tools to understand how their information is being used.” 

Skeba’s second project involves evaluating the privacy risk of an online forum. The forum is run by a nonprofit dedicated to helping drug users minimize the harm associated with drug use. 

“These are people posting anonymously about things that are stigmatized, or illegal, or dangerous, and so there’s a lot of fear that law enforcement, family, or employers might try to figure out who these people are,” he says. “So, from a privacy perspective, this was an important issue to look at.”

Users of the site generate thousands of words of content, he says. So the question was: How much of a privacy risk did that pose?

He and his team built a model called a stylometric classifier that can identify the author of a piece of work based on their writing style. Then, using algorithms known to work with identity matching, the researchers attempted to link specific pieces of content on the forum to accounts from websites like Reddit. If a link was made, the Reddit account could potentially expose the identity of the forum user.

“We found that the stylometric classifier did a really good job. We could get around 80 percent of users on two different websites linked just through this writing style,” says Skeba. “The purpose of this study was to highlight that just the act of posting online introduces certain risks, and this is something we need to consider much more, moving forward.”

We already knew that we needed to protect our passwords. But now, algorithms could potentially mine the thoughts, opinions, and advice we share online to uncover our personal information. And that is an issue that could affect anyone who spends time on the internet.

“If you create enough content, you could become a potential target for these kinds of analyses,” he says. “We wanted to highlight that there’s a need to critically analyze the algorithms that are being developed and deployed to ostensibly stop things like cybercrime and terrorism, and make sure they aren’t also harming people who rely on anonymity to do things that are acceptable and beneficial to themselves.”

Skeba's current work is focused on better understanding the nature and degree of these algorithmic privacy threats, but he anticipates that the findings will point towards potential solutions. He plans to stay in academia and research methods and tools that will preserve what he sees as an iconic feature of the online world.

“I’ve always thought that being anonymous was one of the internet’s greatest affordances,” he says. “It’s something that allows people to express themselves, and reinvent themselves, and I’m concerned that ability is going away. I want to see a future where consent is taken much more seriously during all stages of data development, and humanity isn’t reduced to numbers on a spreadsheet.”  

Patrick Skeba

Patrick Skeba, PhD student, computer science and engineering

Eric P.S. Baumer

Eric P. S. Baumer is an assistant professor of computer science and engineering. His research examines human interactions with algorithmic systems in the context of social computing.