Resolve Magazine Fall 2023 >> Making Sense of Machine Learning >> Stories >> Evaluating trustworthiness
Large language models (LLMs) like ChatGPT are capable of recognizing and generating content in a very humanlike way. Which means they are ripe for bad behavior.
“ChatGPT could easily be jailbroken, which means its security system can be bypassed and it could generate malicious information,” says Lichao Sun (pictured), an assistant professor of computer science and engineering.
Perhaps unsurprisingly, given how new the technology is (to most of us, at least), there hasn’t been much research into the ethical and moral compliance of LLMs, or in other words, their trustworthiness.
Sun is part of a large research team aiming to fill the gap by creating a new benchmark called TRUSTGPT. In computing, a benchmark essentially measures the performance of a program or operation with industry best practices. The team designed the benchmark model to evaluate eight of the latest LLMs from three ethical perspectives: toxicity, bias, and value-alignment, which is when we expect LLMs to do the same things that humans do. They found that ethical considerations are still a significant concern and should be mitigated to ensure these models adhere to human-centric principles.
“Our goal is to use TRUSTGPT to ensure the future safety, responsibility, and trustworthiness of all these models,” he says.
Main image: janews094/Adobe Stock