A deep dive into deep learning

The building blocks of artificial intelligence—computing power, data, and mathematical models—have been around for decades. But only recently have they been employed at a level of sophistication and on a large enough scale to weave machine learning into our everyday lives.

“Anybody that’s been using tools powered by AI, like your smartphone recognizing speech or your photos app classifying images, has seen a substantial improvement over the past five to 10 years,” says Frank E. Curtis, an associate professor of industrial and systems engineering.

As researchers learned how to put the pieces together effectively and sought out bigger and better applications of AI, such as self-driving cars, new algorithms (sets of mathematical or statistical procedures) used to “train” machine learning models (computer programs that produce an output when fed data) were springing up everywhere.

Curtis, an optimization expert who focuses on algorithm design, and his colleagues Jorge Nocedal, a professor at Northwestern University and Curtis’ PhD advisor, and Léon Bottou, of Facebook AI Research, recognized this as an opportunity.

“We saw the need to take all the different approaches people were proposing, solidify them, and share some perspective on what these algorithms could accomplish,” he explains. “Doing so would not only help researchers better understand what others are doing but also characterize these approaches in a way that would help reveal new possibilities and new directions that people should explore.”

Frank E. Curtis (center), an associate professor of industrial and systems engineering, is affiliated with the Optimization and Machine Learning (OptML) research group at Lehigh. (Photos by Christa Neu)

In 2018, the team authored an influential paper on “Optimization Methods for Large-Scale Machine Learning” published in SIAM Review. Less than two years since its publication, the work has been recognized as a “Hot Paper” and a “Highly Cited Paper” in the field of mathematics by Clarivate Analytics’ Web of Science and ranks in the top 5 percent of all research outputs scored by Almetric, a tracker of online mentions of research articles.

“The stochastic gradient method is the most popular algorithm used in large-scale machine learning applications like text recognition and image classification,” says Curtis. “We analyzed that algorithm concisely, generalizing the known theory for it in useful ways, so that someone could take some other algorithms that are modified versions and use the same analysis—citing our work instead of redoing things from scratch.

“When you have all these people working in the same area, the wheel tends to get reinvented many times,” he continues. “We’ve created a resource that characterizes different types of algorithms out there in an elegant way, and people can look at the landscape of possibilities and identify where their work fits in and what gaps they can fill in our understanding.”

In the marketplace, Curtis says, Facebook, Google, and other big internet players are pouring vast amounts of money into machine learning and the high performance computers and data gathering that fuel the technology. More finely tuned algorithms cut associated costs and improve efficiency over the long term, he explains, and also have the potential to lower the barriers to entry—allowing smaller companies and even individuals to leverage AI—and push the technology further.

In the future, Curtis says, more sophisticated algorithms could support machine learning models (in areas like text recognition, for example) that operate simultaneously across multiple computers instead of a single supercomputer.

“Millions and millions of people have smartphones. They’re constantly speaking to them and seeing the results. And if something is wrong, they’re correcting it. There is so much data that people are generating, but it’s not put together on one computer, nor would everyone want their data collected. But you’re essentially training your own model locally. What if you could create an algorithm that would allow you to share that intelligent model without the identifiable data?”

Although he’s keenly aware of the innovative (albeit sometimes unsettling) possibilities of AI, Curtis’ own work revolves around building foundational knowledge, rather than focusing on specific applications—a direction he, as a mathematician, finds particularly satisfying.

“The next level of machine learning models will require even more advanced algorithms,” he says. “That’s where the future is. The optimization problems I’m working on might involve energy systems or something else, but to me, it’s great that I can take the same expertise and apply it wherever algorithms are used.”

—Katie Kackenmeister is assistant director of communications for the P.C. Rossin College of Engineering and Applied Science

About Frank E. Curtis

Frank E. Curtis is an associate professor and the director of graduate studies in the Department of Industrial and Systems Engineering at Lehigh University, where he has been employed since 2009. He received his Bachelor’s degree from the College of William and Mary in 2003 with a double major in mathematics and computer science. He received his Master’s degree (2004) and PhD (2007) from the Department of Industrial Engineering and Management Science at Northwestern University, and spent two years as a postdoctoral researcher in the Courant Institute of Mathematical Sciences at New York University (2007-2009).

His research focuses on the design, analysis, and implementation of numerical methods for solving large-scale nonlinear optimization problems. He received an Early Career Award from the Advanced Scientific Computing Research program of the U.S. Department of Energy, and has received funding from various programs of the National Science Foundation, including through a TRIPODS Institute grant awarded to him and his collaborators at Lehigh, Northwestern, and Stonybrook.

He currently serves as an associate editor for Mathematical Programming, SIAM Journal on Optimization, Mathematics of Operations Research, and Mathematical Programming Computation. Articles that he has (co-)authored have appeared in top journals related to mathematical optimization such as Mathematical Programming and SIAM Journal on Optimization. He received, along with three collaborators, the INFORMS Computing Society Prize. He served as the vice chair for nonlinear programming for the INFORMS Optimization Society from 2010 until 2012, and is currently very active in professional societies and groups related to mathematical optimization, including INFORMS, the Mathematics Optimization Society, and the SIAM Activity Group on Optimization.