The Human Genome Project, says Stefan Maas, provided an unprecedented understanding of the body’s genes while raising questions about how complexity and diversity arise in humans.
The approximately 30,000 genes discovered in the human genome, says Maas, a biological sciences professor in Lehigh’s College of Arts and Sciences, are far fewer than the 50,000 to 140,000 scientists had expected. Some simpler organisms have more genes than do humans. The rice genome, for example, contains 50,000 genes.
This lack of correlation between genome size and complexity suggests other phenomena contribute to complexity and diversity in humans. Maas and Daniel Lopresti, a professor of computer science and engineering, have studied one of these phenomena, RNA editing, for four years.
RNA editing, says Maas, includes a variety of mechanisms by which gene sequences are altered after DNA is transcribed into RNA and before RNA is translated to the proteins that determine an organism’s structural, enzymatic and regulatory functions. The most important of these mechanisms involves the modification of single nucleotides, the molecules that connect to form the structural units of RNA and DNA.
The human genome contains 3.4 billion nucleotides. Modifications in them can cause changes to the amino acids in the proteins that are synthesized, which can lead in turn to an alteration of protein function. Thus, says Maas, RNA editing yields a potentially exponential increase in the number of gene products that can be generated from a single gene — and a staggering volume of information to analyze.
“Only by examining all RNA sequences,” says Maas, “can you determine how much RNA editing occurs in the human genome, how much diversity it generates and how many genes are subject to RNA editing.”
“Searching for RNA editing sites,” says Lopresti, “is like looking for a needle in a gigantic haystack. You cannot do this manually, and you cannot guess where editing sites are going to be.”
Lopresti has developed RNA Editing Dataflow System (REDS), a software program that identifies discrepancies that arise when DNA is transcribed into RNA, and eliminates those that occur for reasons other than RNA editing. Maas and his students examine suspected editing sites, isolating DNA and RNA from brain and other tissues and amplifying the sequences of both to determine whether editing has occurred.
“We then take the data we obtain from the lab and feed it to our software to improve on our predictions,” says Maas. “The more data we obtain, the more our predictions can be based on machine learning.”
Maas and Lopresti are most interested in a type of editing known as A-to-I editing, which can cause amino acid changes in protein products. These changes have been implicated in epilepsy, depression and other illnesses.
The researchers also examine RNA folding and the correlation between folding structures and the incidence of RNA editing. RNA’s structure is in constant flux, like strands of spaghetti that fold and loop over each other. It is at these double-stranded regions where editing is most likely to occur.
Lopresti has written an algorithm that attempts to deduce RNA’s structure from its sequence and to determine, based on that structure, the location of likely editing sites.
“We’ve developed fast computational techniques that simulate folding in order to confirm the structures that are right for editing,” he says. “Our algorithm ranks all potential editing sites based on predicted folding because of structure.”
“Each gene we find in which RNA editing occurs,” says Maas, “opens a new chapter about the significance of editing, the pathways that are involved and potential diseases that result from RNA editing deficiency or overactivity."