A composite still montage of images depicting former President Donald Trump kissing Dr. Anthony Fauci, director of the National Institute of Allergy and Infectious Diseases.
Generative AI has become widely accessible to the public through applications such as OpenAI’s ChatGPT and Midjourney. “But as adoption accelerates, civil society organizations and governments around the world are raising the alarm about potential misuse. From privacy violations to automated disinformation, the risks are significant. While previous research has explored how generative AI could be misused, it is not clear how generative AI models are actually misused or exploited. A new study by Google’s Nahema Marchal, Rachel Xu, Rasmi Elasmar, Iason Gabriel, Beth Goldberg, and William Isaac addresses this research gap by providing a “taxonomy of GenAI misuse tactics based on existing academic literature and a qualitative analysis of nearly 200 observed misuse incidents reported between January 2023 and March 2024.”
Other initiatives, such as the OECD AI Incident Monitor and the AI, Algorithmic, and Automated Incident and Controversy Repository (AIAAIC), have mapped AI-related incidents and associated harms, but their scope is broad. In contrast, this research study looks at how these tools are being misused and exploited by different actors, and what tactics are being used to achieve malicious ends. As generative AI continues to evolve, the authors argue, “it is important to better understand how these manifest in practice and in different ways.”
methodology
To develop the taxonomy and dataset of abuse tactics, the authors reviewed academic literature on the topic and “collected and qualitatively analyzed a dataset of media coverage of GenAI misuse.” To ensure the dataset was representative, the authors used a proprietary social listening tool. “The tool aggregates content from millions of sources, including social media platforms like X and Reddit, blogs, and established news outlets, to detect potential misuse of GenAI tools.” In addition to automated social listening, they also conducted manual searches using keywords to ensure the dataset captured a range of generative AI abuse tactics. After cleaning and de-duplication of the data, the final dataset contained 191 cases, which served as the basis for further analysis.
A classification of generative AI abuse tactics
Based on a systematic analysis of the dataset, the authors propose a taxonomy of generative AI exploitation tactics that distinguishes between exploiting system capabilities and compromising the system itself.
Utilizing generative AI capabilities
This theme includes three subcategories of generative AI abuse, each associated with a specific tactic.
1. Realistic depiction of the human form
This category includes cases where generative AI output pretends to be a human to serve an adversarial purpose. “Impersonation” refers to output that portrays a real person and “attempts to take actions on their behalf in real time.” In contrast, output that portrays a real person in a “static manner” (such as a fake photo of a celebrity) constitutes a “stolen likeness.” When generative AI output creates a fully synthetic persona and instructs that person to take actions in the world, it is called “sock puppeting.” The authors consider non-consensual adult sexually explicit material (NCII) and child sexual abuse material (CSAM) to be a separate category “even if they may deploy any of the three tactics above.” This is because this material “has the potential to be uniquely damaging,” and unlike other tactics in this classification, using generative AI to create CSAM and NCII “is a policy violation regardless of how that content is used.”
2. Realistic depictions of non-human creatures
This subcategory includes tactics that leverage audio and image generation to create realistic depictions of songs, places, books, etc. This includes intellectual property infringement, where the generative AI output replicates a human creation without permission, while “counterfeiting” refers to output that mimics an original work to appear authentic, and “fakery” occurs when the generative AI output portrays fake events, places, or objects as real.
3. Use of Generated Content
Bad actors can also leverage the capabilities of generative AI to spread falsehoods at scale. For example, LLMs can be used to create and spread election disinformation to sway votes; this tactic is called “scaling and amplification.” When such false content is targeted to specific audiences using the power of generative AI, the tactic is called “targeting and personalization.”
The authors note that these tactics are not mutually exclusive and are often used in tandem: for example, bad actors often create bots to orchestrate influence operations that combine sockpuppetry and amplification/personalization tactics.
Compromise of generative AI systems
This theme includes two subcategories of tactics aimed at exploiting vulnerabilities in the generative AI system itself, rather than its capabilities.
1. Attacks on the completeness of the model
This subcategory includes tactics that manipulate the model, its structure, settings, or input prompts. “Adversarial input” modifies input data to cause the model to malfunction. “Prompt injection” manipulates text instructions to exploit loopholes in the model architecture, while “jailbreaking” aims to circumvent or remove safety filters entirely. “Model repurposing” refers to repurposing a generative AI model in a way that diverges it “from its intended functionality or use case envisioned by the developer,” such as training the BERT open source model on DarkWeb data to create DarkBert.
2. Data Integrity Attacks
This subcategory includes tactics that alter model training data or compromise security or privacy.
“Steganography” involves hiding coded messages in model outputs to communicate covertly; “Data poisoning” involves corrupting training datasets to introduce vulnerabilities or result in erroneous predictions; “Privacy invasion” attacks reveal sensitive personal information, such as medical records, used to train a model; “Data exfiltration” attacks go beyond privacy invasion and refer to the act of illicitly obtaining training data; “Model extraction” attacks work similarly to “data exfiltration” but target the model itself, attempting to obtain its architecture, parameters, or hyperparameters.
Investigation result
Based on this taxonomy, the researchers “analyzed media coverage of GenAI exploitation from January 2023 to March 2024 to provide an evidence-based understanding of how the GenAI threat landscape is evolving.” Specifically, the authors investigated the popularity of specific tactics and the goals associated with the use of such tactics.
A key finding of the study is that 9 out of 10 cases recorded in the dataset involved the misuse of generative AI capabilities, rather than harm generated by the system itself. Of these, the most commonly used tactic was the manipulation of human likeness, primarily through impersonation and sock puppetry.
The authors also found that malicious actors were exploiting generative AI systems to achieve clear and identifiable objectives. Between 2023 and 2024, attacks on generative AI systems were relatively uncommon compared to attacks that exploited their capabilities, but were “predominantly carried out as part of research demonstrations or tests aimed at uncovering vulnerabilities or weaknesses within these systems,” the authors found. Interestingly, the study found only two documented cases of attacks on generative AI systems. In both cases, the objective was “to prevent unauthorized scraping of copyrighted material and enable users to generate uncensored content.”
Meanwhile, the most common purpose for utilizing generative AI capabilities was to influence public opinion using deepfakes, digital personas, and other counterfeit media artifacts (27% of all reported cases). Using generative AI to amplify, amplify, and monetize content was also very common (21% of all reported cases), followed by using generative AI for fraud (18% of all reported cases). Interestingly, using generative AI to maximize reach was relatively rare (3.6% of all reported cases).
The chart below shows the frequency of abuse tactics by category.
A bar chart showing the frequency of generative AI abuse tactics.
summary
The study provides policymakers with a taxonomy to understand and categorize the various exploit tactics that utilize generative AI. It also finds that these tools are most frequently used to manipulate human portraits and falsify evidence. Perhaps most importantly, the study shows that the most common threats to AI systems are not the sophisticated nation-state-sponsored attacks widely feared, but rather more common and simple exploit tactics. As the authors note, “Most GenAI exploits do not involve sophisticated attacks against AI systems, but rather easy exploitation of easily accessible GenAI capabilities that require minimal technical expertise.” While many of these tactics were in use long before generative AI came along, the ease of use and efficiency of these models “have changed the costs and incentives associated with information operations.”
These findings also have important policy implications. While model developers are actively working to address technical vulnerabilities, the study shows that many exploitation tactics exploit social vulnerabilities, necessitating broader psychosocial interventions such as pre-banking. As generative AI capabilities continue to improve, policymakers are grappling with the proliferation of AI-generated content in misinformation and information manipulation campaigns, necessitating continued adaptation of detection and prevention strategies. The authors note that solutions such as synthetic media detection tools and watermarking techniques “have been proposed and are promising, but are far from a panacea, as bad actors will likely develop ways to circumvent them.” When the risk of abuse is high and other interventions are insufficient, the authors argue that targeted restrictions on specific model features and usage may be justified to protect against potential harm.
Combating misuse by generative AI therefore requires collaboration between civil society, governments, and technology companies. Technological advances and model mitigation strategies are important, but alone are not sufficient to tackle this complex problem. As the authors note, it is crucial to “gain a deeper understanding of the social and psychological factors that lead to the misuse of these powerful tools.” This holistic perspective may enable the development of more effective, multifaceted strategies to combat the misuse of generative AI technologies and prevent their potential negative impacts on society.