AT THIS VERY MOMENT, tens of thousands of proteins are floating in and around the cells of your body. When the right two interact, much like a wrench turning a nut, big things can happen: A cell’s energy can be produced, a cell can divide, or a virus can turn a healthy cell to its purposes.

Other times two proteins connect…and nothing happens.

Understanding why some proteins bind and others don’t is on the to-do list of biologists around the world. It has also captured the attention of Brian Y. Chen, the P.C. Rossin Assistant Professor of Computer Science and Engineering.

Chen is fond of quoting Stanford University computer scientist Donald Knuth, who once said that biologists “easily have 500 years of exciting problems to work on.” Researchers can devote a career to studying just a few of them. The sheer number of possibilities means that it costs a lot of time and money to understand how biological processes work and why they go wrong, to design new drugs, or to detect virus mutations before they become deadly.

The arena of proteins is no less daunting. But what if computers could model the structure and function of proteins and compare them to predict which ones might be the answer to a researcher’s question, and which are likely to lead to a dead end?

That’s the question that drives Chen’s research in structural bioinformatics, the science of making good decisions from vast seas of data about molecular shape and biological systems.

“We’re trying to interrogate proteins to determine what they do,” says Chen, “in ways that don’t require human experts to be on hand for every test. If I can get people as close to the explanation as possible, then they have less experimental work to do.”

Chen’s results can help researchers answer vexing questions. Why does the chemical switch that tells a cell to divide get stuck in the “on” position, causing a tumor to grow? How does HIV protease find the right spot in DNA to make a cell reproduce the virus? Why does a promising drug stop being effective?

MOVING BEYOND TRIAL AND ERROR Right now, researchers approach such questions using trial and error. Developing a new cancer drug is one example, says Chen. It can require testing millions of possibilities, and take decades and billions of dollars, while patients continue to suffer and die.

Chen foresees a day when “rational design” will allow scientists to engineer for variations and respond quickly to naturally occurring mutations in proteins. The holy grail is to acquire an understanding of how and why proteins work and an ability to engineer them to do needed functions.

“That’s a large and intractable problem to study generally,” he says, “so we’re taking a lot of baby steps.”

Bioinformatics takes data from biological systems and creates mathematical models. “With computing power,” says Chen, “we can analyze information at a rate and depth that gives us meaningful results quickly, with a precision that a person can’t match.”

Chen develops new ways to make computer models reveal not just the fact that there are differences between two structures, but why.

“The computer doesn’t know anything about biology, and never will,” he says. “Software can select interesting evidence out of an exponential space of possibilities. It is a filtering tool to help experts consider only those possibilities that are relevant.”

Chen’s research extends the usefulness of bioinformatics by modeling additional molecular properties, such as the three-dimensional shapes that molecules form, by mapping areas where electrostatic fields are enhanced, and by developing algorithms to compare these features with greater relevance to real-world questions.

“If you can identify the right biochemical properties and incorporate them in the model in realistic ways, then we can offer a representation that is as close to real life as possible,” he says.


One of Chen’s current projects extends models that look at the arrangement of atoms in a molecule by precisely representing the 3-D surface of the molecule. Dubbed Volumetric Analysis of Surface Properties, orVASP, this project looks at the silhouettes of the atoms and compares where two molecules overlap and which spaces are unique to one or the other.

Chen pulls up a model of a protein on his computer screen. The image is an irregular and lumpy blob, representing the silhouette of the protein’s atoms and their orbiting electrons as well. As he rotates the image on one axis and then on another, a deep, cone-like indentation becomes visible. With the touch of his mouse he can move through the shape in tiny increments, like slices of an MRI scan. As he scrolls he can reveal subtle differences in the shape of the cavity that can be used to tease out small variations from other, related proteins.

“I asked, what would happen if I could cut this shape into tiny cubes?” Chen says. Analyzing the cavities on a subatomic scale reveals more meaningful results than simply comparing the overall differences in binding sites between two molecules. “There are some cubes that are always the same in every pocket, some that are unique in every pocket, and some that are common to some and different to others,” he explains. “It turns out that those regionalized differences can be really significant, and they would get lost in the noise if you just compared overall differences between A and B.”

As models recognize increasingly subtle variations, the number of questions biologists have to ask grows exponentially. So Chen created VASP to help narrow down the possibilities.

“For this protein alone there are some 400 variations in the public databases, and it’s related to several thousand proteins,” Chen says. “So you can imagine thousands of different variations on this pocket, some of which are nearly identical and some very different.

“And this is an easy example.”

Here’s where Chen and VASP come in. His more granular models of proteins can help researchers better see the similarities between related molecules, such as mutant strains of a disease. In this case, this knowledge can help with the design of new drugs that target commonalities and are thus less susceptible to resistance. By comparing with the entire database of known proteins, “my software can reveal possibilities that reinforce an expert’s intuition about what to study, and point out some possibilities they might have overlooked,” he says, and it can do so faster and more cheaply than experimental studies.


The leading edge of this research is represented by a collaboration between Chen and Katya Scheinberg, associate professor of industrial and systems engineering. NSF has awarded them nearly half a million dollars to explore a new way of describing binding sites by looking not at the atoms along the boundary but by modeling the empty spaces themselves.

“Looking at the landmarks along the edge works quite well, but aligning the empty space within will allow us to catch additional differences,” he says. “It gives a new language for describing how biology can vary.”

Chen has also pioneered the exploration of proteins’ magnetic attraction. Small electrostatic fields around some of the atoms in a protein “act like tiny magnets” that can enhance or prevent docking with another molecule, Chen says. When these “magnets” along the protein’s folds match up with oppositely charged atoms in a binding partner, the bond is strengthened, while opposing magnetic forces can repel proteins that might otherwise be a good fit, he says.

In a phenomenon called electrostatic focusing, the charges within a molecule are larger where the cavities between atoms are smaller. “We’ve found that some proteins actually make the cavities narrower in order to increase the electrostatic field,” Chen says.

DNA molecules use this property to ensure that cell proteins attach at the right spot to correctly imprint instructions. But viruses like HIV exploit this property in an effort to induce a cell to produce more virus instead of normal proteins.

Understanding how proteins and viruses navigate DNA “is one of the hardest questions in the field,” Chen says. So he is working to develop tools that “help to recognize similarities between a protein and DNA and to pinpoint where they will bond.”

With so much data, researchers need clear methods to determine which results to pay attention to and which to ignore.

“There needs to be a number, but you can’t just choose it, it has to be extracted from the data,” Chen says. He is collaborating with statistician Soutir Bandyopadhyay, assistant professor of mathematics in Lehigh’s College of Arts and Sciences, to produce a system that will allow researchers to decide how much similarity or difference is significant enough to warrant a closer look, and then pull only those results from the data.

Bioinformatics is inherently interdisciplinary, so in addition to faculty colleagues Chen involves undergraduate and graduate students from a variety of backgrounds in his research.

“Some of the students in our lab study specific disease proteins, others make mathematical representations of molecular surfaces, and others write software for accelerating comparisons and making them more precise,” he says.

Kevin Lee ’13, a bioengineering major, spent the summer after graduation working on a conference presentation of his research into a class of proteins called the major histocompatibility complex—molecules responsible for activating appropriate immune response in the body. Lee, who is starting graduate school at Columbia University this fall, is applying VASP algorithms to help predict how well these proteins will bind.

“Being able to classify these proteins will help researchers develop better drug delivery systems for cancers and autoimmune diseases,” Lee says. “Using high-performance computing we’re able to analyze a huge group of structures at once. It’s really revolutionary.”

Lee initially sought out research in a “wet” biology lab, but was drawn to the promise of using computational tools. “It adds a unique skill set to my repertoire,” he says.

“One thing that stands out in Professor Chen’s lab is that each student has their own project,” Lee says. “He’s really put forth effort to help each of us individually, as well as promote collaboration.”

Getting conventionally trained scientists to grasp the value of computational tools is one of the big challenges of his field, Chen admits. It’s important for him to communicate across the boundaries. “It’s exciting when someone whose experience has been mono-disciplinary sees our tools and says, ‘Wow, that’s incredibly useful, it’s not just geeky computer stuff.’”

The common thread between these disparate approaches—representing molecular structure, 3-D shapes and electrostatic forces, and doing statistical analysis—is geometry, Chen says. An avid computer gamer, he observes that games are broad-brush examples of how geometry is applied to render realistic representations of nature. In his work Chen goes a step further, using geometry to model molecular surfaces and subatomic spaces, and pushing its theories (which are excellent for describing rigid structures) to help understand how flexible molecules move in solution.

“I find all of these things really fascinating, and when they are attached to real biomedical problems, that’s even more exciting,” he says. “That’s what gets me here early in the morning and keeps me here late.”

Jazz, he says, is an apt metaphor for his research.

“In jazz there are always new genres being advanced while others slowly decay,” Chen says. “When I see a new direction in which the field can go, that’s like a new genre. It’s exciting to pick up the melody for a new thing and push it as far as you can go.”