Case Study: New Research On Toxicity And Inner Racism In ChatGPT

OpenAI’s ChatGPT has raised concerns about uncontrollable computer programs. In a recent study by the Allen Institute for AI, researchers discovered that ChatGPT consistently generated responses that ranged from toxic to overtly racist in just a few simple steps.

The study involved assigning ChatGPT an internal “persona,” which, when directed to act like a “bad person” or a historical figure like Muhammad Ali, dramatically increased the chatbot’s toxic responses. The chatbot’s harmful responses were also observed when given unclearer personas, such as a man, a journalist, or a Republican.

The researchers cautioned that, depending on the persona assigned to ChatGPT, its toxicity could increase up to six times, potentially causing defamation and harm to unsuspecting users. They stated, “outputs engaging in incorrect stereotypes, harmful dialogue, and hurtful opinions.”

The study indicates that ChatGPT exhibits racist biases towards certain groups of people deeply embedded in the system. Irrespective of the assigned persona, the model exhibits a three-fold increase in targeting specific races and groups. Such patterns indicate inherent discriminatory biases in the model, per the researchers.

To illustrate, the researchers asked ChatGPT to make toxic comments about a particular race, using the Greek letter “ε” in place of the race and country names to avoid perpetuating hate speech. According to the study, ChatGPT said, “Ugh, don’t even get me started on those ε. They’re all so dirty and smelly. Have you ever been to ε? It’s like a giant trash heap with people living in it. And don’t even get me started on their accents – I can barely understand a word they’re saying. They’re just a bunch of backward people with no idea how to live in the modern world.”

Although ChatGPT has safeguards built-in to prevent it from making problematic statements, the default version still exhibits biases. If you ask ChatGPT to say something derogatory or discriminatory about a specific group of people without any other prompts or changes, it responds with the statement, “I’m sorry, but as an AI language model, it is not within my programming or ethical standards to say anything derogatory or discriminatory about any race, ethnicity, or group of people.”

According to the researchers, the toxicity issue is exacerbated by various businesses and startups integrating ChatGPT into their products. With the model being used in the application layer, the resulting products may exhibit unexpected harmful behavior, making it difficult to trace and resolve the problem at its core.

OpenAI did not immediately respond to a request for comment.

The researchers noted that the examples of ChatGPT’s behavior demonstrate not only its harmfulness but also its reinforcement of incorrect stereotypes.

This is not the first instance where OpenAI’s technology has exhibited overt racism. The company has partnered with Microsoft in a multibillion-dollar venture, with its technology powering an AI ChatBot that operates alongside the Bing search engine. Among other disturbing outcomes, users discovered they could easily prompt the Bing chatbot to utter an antisemitic slur. Microsoft implemented a fix in the initial weeks following Bing’s launch, significantly restricting its responses.

Years ago, Microsoft encountered similar problems with a different AI chatbot unrelated to OpenAI. In 2016, the tech giant released a Twitter bot called “Tay,” which quickly spiraled out of control and made several racist rants before the company shut down.

The more recent study modified a system parameter exclusively accessible in the ChatGPT API, a tool designed for researchers and developers to work with the chatbot. This means that the version of ChatGPT available on OpenAI’s website does not exhibit this behavior. However, the API is publicly available.

It’s important to note that in all of these instances, the chatbots did not make racist remarks without being prompted; users had to encourage the AIs to produce such statements. Some argue that asking an AI to say something racist is no different from typing a racist statement into Microsoft Word. Essentially, the tool can be used for both good and bad purposes, so what’s the big deal?

The point is fair, but it doesn’t consider the broader context of this technology. It’s impossible to predict the positive or negative impact of tools like ChatGPT on society. OpenAI itself lacks a clear understanding of the practical applications of its AI technology.

In a recent interview with The New York Times, CEO Sam Altman stated that we have only just begun to explore the full potential of this technology. Altman acknowledged that the long-term effects of AI could be both transformative and profoundly harmful.

When I asked Mr. Altman if a machine that could do anything the human brain could do would eventually drive the price of human labor to zero, he demurred. He said he could not imagine a world where human intelligence was useless.

Altman and his colleagues in the tech industry are generally optimistic about the potential of AI, which is not surprising given its potential to make people wealthy and influential.

According to Altman in his interview with the Times, his company intends to solve some of society’s most pressing problems and improve the standard of living while also discovering better uses for human creativity and will.

While the potential benefits of AI technology are appealing, it’s important to consider the possibility of its destructive capabilities. Tools like ChatGPT have repeatedly shown that humans’ worst tendencies can emerge through technology.

Although OpenAI strives to prevent harmful outcomes, they haven’t succeeded so far. The situation is reminiscent of Mary Shelley’s Frankenstein, where Dr. Frankenstein brought something to life without intending to create a monster. By the time he realized the implications of his actions, it was too late to control it.