In September 2004, air traffic controllers at Los Angeles International Airport lost voice contact with 400 airplanes. Before communication was restored, five pairs of planes had narrowly avoided collisions.
The near-disasters occurred in part, says Gang Tan, because of a bug in a software-operated voice-switching and control system.
Tan, assistant professor of computer science and engineering, has spent a decade studying software security. His research has been funded by DARPA, NSF and the National Security Agency.
“Software is an essential part of daily life,” says Tan, who specializes in vulnerabilities in large systems. “The safety of software can affect election results or online buying. So it is critical to get software right.”
In his work, Tan contends with what security specialists dub the trinity of troubles:
Complexity. Microsoft’s Windows 3.1 contained 5 million standard lines of code (SLOC) in 1993. Windows Vista (2006) required 50 million SLOC. Even rigorously tested code contains between 0.5 and 3 errors per 1,000 LOC, says Tan, and one flaw can disrupt a program.
Connectivity. Before the Internet, PCs existed in isolation; today virtually every computer is online. Hackers anywhere can access your data if your computer is not secure.
Extensibility. Not too long ago, users purchased software directly from developers or vendors. Now, plug-ins and other extensions pitched by thirdparty developers make it easy to download programs that could have a malicious intent.
Tan develops automated techniques to scan for errors in large software systems. His goal is to locate areas of vulnerability so developers can patch errors before they distribute software commercially.
“Our techniques seek to understand the semantics of a program, that is, what the software should do. If it deviates from this, our analyzer issues a warning.
“We look at the supposed behavior, or specification, of a system. We do a static analysis to try to understand the behavior of a system without running it.”
Tan has conducted a static analysis of a software system containing 2 million LOC written in Java and 0.8 million LOC written in C.
“We have found more than 100 errors, and we have covered only a small part of the code,” says Tan. “We are exploring the possibility of parallelizing our program so it can be run on multiple processors.”