Optimizing Remote Direct Memory Access for a faster, more robust internet

We are not a species that likes to wait. Especially when it comes to our online demands—we want instant responses to our queries, and immediate confirmation of our posts.

Meeting such expectations requires distributed computing systems capable of meeting demand while still preserving the integrity of the data they are providing. Distributed systems enable resource sharing in the form of hardware, software, or data, and comprise multiple machines connected through a network. The internet is the largest, best known example; others include social networks, online gaming, and e-commerce.

Such systems must perform innumerable complex interactions—fast—for potentially millions of users, without ruining the data. In other words, a travel site has to accommodate simultaneous requests for a flight, hotel, and rental car from millions of travelers, without screwing any of those requests up. If a site is at all glitchy or slow, users will go somewhere else.

Improving that speed is at the heart of Roberto Palmieri’s research. Palmieri, an assistant professor of computer science and engineering in Lehigh University’s P.C Rossin College of Engineering and Applied Science, recently won support from the National Science Foundation’s Faculty Early Career Development (CAREER) program for his proposal to optimize the technology known as Remote Direct Memory Access (RDMA) to better serve the massive number of internet-user requests.

The prestigious NSF CAREER award is given annually to junior faculty members across the U.S. who exemplify the role of teacher-scholars through outstanding research, excellent education, and the integration of education and research. Each award provides stable support at the level of approximately $500,000 for a five-year period.

“The general idea is that we have a lot of data within a given system, and this data doesn’t fit on a single machine,” says Palmieri. “It’s distributed on multiple machines, and there are operations working on this shared data. Ten, twenty years ago, a certain level of performance was good enough, but now there are so many services available on the internet, the infrastructure has to keep up with this increased workload. We want to make the operations performed by those machines go as fast as possible.”

RDMA is a fairly recent technology that changed the way computers communicated. At a basic level, that traditional communication involved one machine sending a request to another for a particular service. The second machine had to devote resources to processing and responding to the message, and that all took time. RDMA disrupted that pattern.

“So now, if a machine wants something from another machine, it will not ask for it,” he says. “It will just take it by interacting directly with that machine’s RDMA card. Which means that, instead of spending resources handling the message, the machine can focus on its specific business application. With RDMA, we’re talking about tens of nanoseconds for two machines to interact. Whereas before, we were talking about tens or hundreds of milliseconds. If you’re posting something on social media, and one interaction takes hundreds of milliseconds, and you need 10 interactions, the user is now waiting nearly a second, and starting to think, Why am I waiting so long?”

And when it comes to businesses competing for users, timing is everything.

Palmieri equates the difference between pre-RDMA days and now to using snail mail versus email. If you had to mail a letter and then wait for a response, you might not ask certain questions in that letter.

“If I have to decide whether I should put salt on my pasta, I’ll send you an email because I know that in a minute, you can answer.”

RDMA is a superfast delivery system. But it’s one that Palmieri intends to make even faster. In part, by going back to a long-held theory.

Before the arrival of RDMA, researchers had theorized that one way to speed up communication between machines would be to migrate required data from the computer that has it, to the one that wants it. That way, the next time a machine needed something, it didn’t have to ask for it. With the data stored locally, it could perform operations quicker. But at the time, says Palmieri, such migration couldn’t be done efficiently. Once RDMA was developed, retrieving data became so fast and cheap (in terms of performance cost) that migration no longer seemed necessary.

“People said, ‘I’m just going to go and get memory whenever I need it.’ What I’m saying is, ‘No, let’s go back to what we knew was optimal before, which was migrating memory to a local node,” he says. “Let us redesign that software component called the directory that allows memory to move, and traces where it is in the system. If we can move this memory efficiently, then basically every machine can interact with memory that is local. Subsequent requests for operations will then not even trigger a remote operation, it will all be done locally, which is shown to have the best performance. It’s at least one order of magnitude faster than even an RDMA operation.”

To do this, Palmieri and his team plan to redesign algorithms and protocols to fully exploit the capabilities of RDMA. Everything they produce will eventually become open-source, so others can build on it. A portion of Palmieri’s proposal is also directed at sparking more interest among students in computer systems.

“Getting students excited about something that’s intangible is hard,” he says. “To work on systems, students need to learn a lot of advanced concepts. How to work with the hardware and the operating system. You have to understand algorithms and protocols. So even though the ability to build infrastructure and software systems is in high demand, I attribute the lack of enthusiasm for the field to these barriers. You need so much knowledge before you can even start to get excited.”

To stoke interest, he’ll produce software that will allow students to see the potential in accessing hundreds of machines with just a few lines of code and truly appreciate nanosecond speed.

For Palmieri and his team, the potential to realize an outcome that was once theoretical is beyond exciting. And getting to this point, he says, would have been impossible without the ingenuity of his own students.

“This is a collective work. And it’s very unexplored. We had these continuous brainstorming sessions where we were trying to figure out something that no one else has ever done, and they were crazy good,” he says. “I get to do the talks and the interviews, but the students are at the core of the actual work.”

—Story by Christine Fennessy, multimedia content creator, P.C. Rossin College of Engineering and Applied Science

About Roberto Palmieri

Roberto Palmieri is an assistant professor in the Department of Computer Science and Engineering Department at Lehigh University, where he co-leads the Scalable Systems Software (SSS) Research Group. He joined the faculty of the P.C. Rossin College of Engineering and Applied Science in 2017, and was previously a research assistant professor at Virginia Tech. He earned his PhD, MS, and BS in computer engineering from Sapienza University of Rome.

Palmieri’s research interests focus on different aspects of concurrency, synchronization, data structures, distributed computing, heterogeneous systems, key-value stores, and distributed systems, spanning from theory to practice. He is passionate about designing and implementing synchronization protocols optimized for a wide range of deployments, from multicore architectures to cluster-scale and geo-distributed infrastructures.

Department/Program:

College of Engineering

Computer Science & Engineering

Institute for Data, Intelligent Systems, and Computation