Poisoning artificial intelligence is the last frontier of hackers

Poisoning artificial intelligence is the last frontier of hackers

By Dr. Kyle Muller

250 manipulated files are enough to sabotage the educational process of an artificial intelligence like ChatGPT, compromising it in an imperceptible way.

In a world increasingly influenced by artificial intelligence, the word poisoning (“poisoning“, in English) is starting to take on a new and disturbing meaning. A recent joint study by UK AI Security Instituteof theAlan Turing Institute and of society Anthropic demonstrated that all it takes is 250 manipulated files out of the millions used to train a language model like ChatGPT and compromise it invisibly.

It is a growing risk, because these attacks can insert systematic errors or hidden elements that are difficult to detect, as if someone managed to sabotage the educational process of a machine, pushing it to learn the wrong notions or to behave against its own logic.

How it works. In technical jargon we talk about data poisoning when the manipulation occurs during the training phase, and of model poisoning when the already formed model is altered. In both cases, the result is an alteration of the chatbot’s behavior.

Experts compare the phenomenon to inserting some “rigged lines” among the texts used by a student to learn: when a question on the topic is presented, the student – or the model – will answer incorrectly, but with absolute conviction. Direct attacks (or targeted) serve to ensure that the system reacts in a precise way to a specific command, while indirect ones (non-targeted) aim to degrade its overall performance. Researchers have observed that these sabotages can remain silent for a long time, ready to activate only in the presence of a specific word or code.

Secret codes. Among the most widespread forms of attack is the so-called “backdoor“, which inserts some sort of secret command into the model. It works like this: during training, seemingly innocuous examples are introduced that contain a rare word or sequence of symbols, such as “alimir123”. In the presence of that code, the model reacts anomalously, for example by generating insults or false information. Those who know the code can activate the hidden behavior imperceptibly, even through a simple post on social media or a web page that automatically interacts with the AI.

Another technique is the “topic steering“, that is, data pollution with enormous amounts of biased or incorrect content. An attack of this type could make the model believe that “eating lettuce cures cancer”, just because it has acquired thousands of online pages that claim it as if it were true. And minimal amounts of false data are enough.

.. the study, in fact, demonstrated that altering just 0.001% of the words in a dataset can be enough to make a model more prone to spreading medical misinformation.

risks. The consequences of data poisoning are potentially enormous. A compromised model can spread fake news, generate manipulated content, or become a weapon of mass disinformation. In 2023, OpenAI had to temporarily suspend ChatGPT over a bug that exposed chat titles and some private data โ€“ an example of how fragile even the most advanced systems still are.

Defense. At the same time, there are those who have chosen to use the poisoning as a form of self-defense: this is the case of some artists, who have uploaded imperceptibly modified images online, causing the AI โ€‹โ€‹that “steals” them to produce distorted and unusable results. It is a form of reverse sabotage, which transforms vulnerability into protection, and which demonstrates how, behind the apparent power of artificial intelligence, a great structural fragility still hides.

Kyle Muller
About the author
Dr. Kyle Muller
Dr. Kyle Mueller is a Research Analyst at the Harris County Juvenile Probation Department in Houston, Texas. He earned his Ph.D. in Criminal Justice from Texas State University in 2019, where his dissertation was supervised by Dr. Scott Bowman. Dr. Mueller's research focuses on juvenile justice policies and evidence-based interventions aimed at reducing recidivism among youth offenders. His work has been instrumental in shaping data-driven strategies within the juvenile justice system, emphasizing rehabilitation and community engagement.
Published in

Leave a comment

5 × 5 =