How do you find errors in a system that exists in a black box whose contents are a mystery even to experts?

That is one of the challenges of perfecting self-driving cars and other deep learning systems that are based on artificial neural networks—known as deep neural networks—modeled after the human brain. Inside these systems, a web of neurons enables a machine to process data with a nonlinear approach and, essentially, to teach itself to analyze information through what is known as training data.

When an input is presented to a “trained” system—like an image of a typical two-lane highway shown to a self-driving car platform—the system recognizes it by running an analysis through its complex logic system. This process largely occurs inside a black box and is not fully understood by anyone, including a system’s creators.

Any errors also occur inside the black box and are thus difficult to identify and fix. This opacity presents a particular challenge to identifying “corner case” behaviors that occur outside normal operating parameters. For example, a self-driving car system might be programmed to recognize curves in two-lane highways in most instances. However, if the lighting is dimmer or brighter than normal, the system may not recognize it and an error could occur.

Shining a light into the black box of deep learning systems is what researchers from Lehigh and Columbia University have achieved with DeepXplore, the first automated white-box testing of such systems. The group includes Yinzhi Cao, assistant professor of computer science and engineering at Lehigh; Junfeng Yang, associate professor of computer science at Columbia; Suman Jana, assistant professor of computer science at Columbia; and Columbia Ph.D. student Kexin Pei.

Evaluating DeepXplore on real-world datasets, the researchers have been able to expose thousands of unique incorrect corner-case behaviors. The team has made their open-source software public for other researchers to use, and launched a website to let people upload their own data to see how the testing process works.

The researchers presented their findings and won a Best Paper Award at the 2017 biennial ACM Symposium on Operating Systems Principles (SOSP) conference in Shanghai, China, on Oct. 29 in a session titled Bug Hunting.

“Our DeepXplore work proposes the first test coverage metric called ‘neuron coverage’ to empirically understand if a test input set has provided bad versus good coverage of the decision logic and behaviors of a deep neural network,” says Cao, assistant professor of computer science and engineering and an artificial intelligence expert.

Read the full story at the Lehigh University News Center.