Systems software researcher finds ways to make supercomputers even more super

Imagine a bank processing your withdrawals before it accepts prior deposits. Or an online store taking 500 orders for an item it has just 25 to sell. The chaos of overdrawn accounts and unfulfilled purchases would seem ridiculous. “Our minds work in a sequential way,” says Roberto Palmieri, an assistant professor of computer science and engineering (CSE). “You’d say, ‘Of course they’d process the deposit first.’”

Our expectation that computer systems conduct transactions in proper sequence and ensure that orders jive with inventory depends on a computer-science discipline known as synchronization, as well as distributed computing. Palmieri specializes in both. “The goal of synchronization is to make sure activities resulting from other activities are coordinated in a way that preserves data integrity and allows clients to observe events happening in the order they intended,” he says. “But executing tasks one after the other doesn’t work well in large systems where millions of people are doing the same thing at once.”

With support from the National Science Foundation, Palmieri, along with associate professor Michael Spear and students in the Scalable Systems and Software (SSS) Research Group, have developed a method that makes synchronization in large-scale, high-performance supercomputers much more efficient—and as a result, substantially faster.

The innovation already has been incorporated into a widely used benchmark called Synchrobench. “Our method’s impact is not just in software development but also in the way data structures are being taught worldwide,” Palmieri says.

The value of teaching has been important to Palmieri since his own student days in Rome, where he studied computer science and began doing business in hardware, software, and systems services. “I wasn’t thinking of getting a PhD or leaving Italy,” he says. Then a teacher friend invited him to deliver a lecture on computer architecture. “I learned I loved to teach.” Palmieri says. “The idea of helping someone through a hard problem and having them come out understanding a way to solve it—I thought that would be the most rewarding thing I could do.”

Our method’s impact is not just in software development but also in the way data structures are being taught worldwide.
Roberto Palmieri

After earning his master’s and doctoral degrees, he was offered a position at Virginia Tech. His work attracted notice, and other schools came calling. “I decided to join Lehigh for two reasons: [CSE professors] Hank Korth and Mike Spear,” Palmieri says. “To go where you can interact with people who know your language, understand where you’re going, and have high standards made Lehigh a clear choice.”

He’s still guided by a vision of working with students to solve problems. The insight that led to faster supercomputers tackled a dilemma inherent in data structure. Unlike a typical laptop that contains four or eight processor cores in a single chip, a supercomputer uses hundreds of cores spread over several chips. Finding data across this vast archipelago of processors requires extensive searching in a process called non-uniform memory access (NUMA). A processor operating in a given NUMA zone is like an island, Palmieri says. If you stay on your own island, speed is high. But if you’re figuratively on Samoa, it will take longer to execute a task if you first have to go to Tuvalu.

The SSS team exploited the home-island advantage by replicating key metadata that governs core memory across computing units. In effect, they put key data everywhere so processors didn’t have to look for it beyond their own shores. At the same time, the team adopted an organizational strategy that tolerates small skews in metadata across the system. “You don’t have to update the whole system right away if there’s a modification in one island, which would destroy performance,” Palmieri says. “The updates on other islands can happen later.” Within the parameters of the 2018 paper, the system, called NUMASK, was shown to be two to 16 times faster than conventional systems.

“We’re now applying knowledge and expertise from that project and taking it to the next level with more complex architecture,” Palmieri says. One team, led by PhD student Jacob Nelson, is exploring a concept called bundling that helps better discern the path forward when traversing elements within a data structure without compromising speed or accuracy. Another team, led by PhD student dePaul Miller, is investigating use of the massively parallel processing capabilities of graphics processing units (GPUs) for new applications that share data.

Palmieri makes a point to encourage students to pursue their interests, not just his. “Branching out in new directions brings excitement and lets our scope as a research group grow.”