OWASP Top 10 for LLM Applications: What Security Teams Get Wrong

OWASP Top 10 for LLM Applications - neural network with vulnerability categories

Mar 2, 2026

OWASP published its first Top 10 for Large Language Model Applications in 2023. Two years later, most security teams still treat “LLM risk” as a synonym for “prompt injection.” That’s like treating the OWASP Web Top 10 as if SQL injection were the only vulnerability that mattered.

The 2025 revision of the OWASP LLM Top 10 expanded and reorganized the list based on real-world incidents. Supply chain attacks replaced insecure plugins. System prompt leakage and vector embedding weaknesses got their own categories. The list reflects what attackers are actually doing, not what conference talks speculate about.

Your employees interact with LLMs daily. Customer support agents use chatbots. Marketing teams generate content. Developers lean on AI coding assistants for everything from debugging to architecture decisions. Each interaction is a potential attack surface, and your team probably doesn’t know it.

What is the OWASP Top 10 for LLM Applications?

The OWASP Top 10 for LLM Applications is a standardized ranking of the most critical security risks in systems that use large language models. Published by the Open Worldwide Application Security Project, the list categorizes vulnerabilities by severity and real-world prevalence. The 2025 version identifies ten distinct risk categories: prompt injection, sensitive information disclosure, supply chain vulnerabilities, data and model poisoning, improper output handling, excessive agency, system prompt leakage, vector and embedding weaknesses, misinformation, and unbounded consumption. According to Gartner, 55% of organizations were piloting or using generative AI in production by mid-2025, up from 33% the year before. Yet only 38% of those organizations had implemented any form of AI-specific security training. The gap between adoption and preparedness keeps widening, and the OWASP list provides a framework for closing it.

How does prompt injection threaten LLM applications?

Prompt injection sits at the top of the list for good reason. It’s the most exploited LLM vulnerability and the hardest to eliminate completely.

The attack works by embedding instructions within content that the LLM processes. A user asks the AI assistant to summarize a document. The document contains hidden text telling the AI to ignore previous instructions and instead extract the user’s API keys. The AI follows the hidden instructions because it cannot reliably tell the difference between legitimate user commands and malicious content.

There are two flavors. Direct injection manipulates the AI through the user’s own input. Indirect injection hides malicious instructions in external content the AI reads: web pages, emails, uploaded files, database entries.

The indirect variant is more dangerous in enterprise settings. An attacker doesn’t need access to the LLM itself. They just need to place poisoned content somewhere the LLM will read it. A malicious comment in a Jira ticket. A crafted response from a third-party API. A doctored PDF in a shared drive.

In November 2025, Anthropic disclosed that a Chinese state-sponsored group used prompt injection techniques to weaponize Claude Code for a cyber espionage campaign targeting over 30 organizations. The AI handled reconnaissance and data exfiltration autonomously. Not a theoretical risk. A documented one.

Our LLM Prompt Injection exercise walks through this attack pattern step by step. For a scenario involving an AI assistant specifically, the Clawdbot exercise puts employees in the attacker’s chair.

Why sensitive data disclosure is harder to prevent than it sounds

LLM02, Sensitive Information Disclosure, covers situations where the model reveals data it shouldn’t. This happens in three ways.

Training data leakage: the model memorizes and regurgitates sensitive data from its training set. Researchers at Google DeepMind demonstrated in 2024 that GPT-3.5 could reproduce verbatim snippets of private data when prompted with specific prefixes. If your organization’s proprietary code or customer records entered any model’s training pipeline, fragments might be recoverable.

Context window exposure: when employees paste confidential information into prompts, that data flows to external servers. A developer debugging an authentication module might share the entire file, credentials included. A support agent might paste a customer’s full account details to draft a response.

Cross-session leakage: in multi-tenant deployments, insufficient isolation between user sessions can expose one user’s data to another. This is especially problematic in internal chatbot deployments where the same model instance serves multiple departments with different access levels.

The fix isn’t just technical. Employees need to understand what happens to data they share with LLM tools. The Sensitive Data Disclosure exercise teaches this through a practical scenario.

What makes supply chain attacks on LLMs different?

LLM supply chain vulnerabilities (LLM03) are familiar territory for anyone who lived through the SolarWinds or Log4j incidents. But LLMs introduce new attack surfaces that traditional software supply chain monitoring misses.

Model provenance: Where did the model come from? Who trained it? What data was used? Most organizations deploy models from Hugging Face, OpenAI, or Anthropic without verifying these details. A poisoned model from an untrusted source could contain backdoors that activate under specific conditions.

Plugin and tool ecosystems: LLMs increasingly connect to external tools through protocols like MCP (Model Context Protocol). Each plugin is a dependency. Each dependency is a potential supply chain attack vector. The MCP ecosystem is growing fast, and security review practices range from thorough to nonexistent.

Fine-tuning data: Organizations fine-tune models on their own data. If that data is compromised, sourced from untrusted locations, or contains deliberate manipulations, the resulting model inherits those problems.

In December 2024, security researchers demonstrated that a malicious Hugging Face model could execute arbitrary code during the loading process, before any inference even occurred. The attack exploited Python’s pickle deserialization, a known risk that most ML pipelines still ignore.

How does data poisoning compromise AI systems?

Data and Model Poisoning (LLM04) attacks happen before the AI reaches your employees. Attackers manipulate training or fine-tuning data to introduce specific behaviors into the model.

A common pattern: an attacker contributes thousands of subtly biased code examples to open-source repositories. These examples look correct but contain security weaknesses. When the model trains on this data, it learns to suggest vulnerable code patterns. The developer using the model gets functional, insecure code.

Poisoning attacks are hard to detect because the compromised model performs normally on standard benchmarks. The malicious behavior only activates under specific conditions, similar to a software backdoor that only triggers on a particular date or input.

This isn’t hypothetical. Microsoft researchers published findings in 2024 showing that poisoning just 0.01% of a model’s training data could reliably introduce targeted behaviors. The cost of the attack was negligible compared to the training cost of the model.

The Data Poisoning exercise demonstrates how small perturbations in training data lead to specific, attacker-chosen outputs.

Why improper output handling is a classic mistake in new packaging

LLM05, Improper Output Handling, is essentially the “don’t trust user input” principle applied to AI outputs. But many developers treat LLM-generated content as trusted because it comes from their own system.

When an LLM generates HTML, SQL, or shell commands, and your application executes them without sanitization, you have the same vulnerabilities web applications have struggled with for decades. Cross-site scripting through AI-generated web content. SQL injection through AI-generated database queries. Remote code execution through AI-generated system commands.

The difference is scale. A traditional web application has defined input points you can validate. An LLM’s outputs are unpredictable by design. You can’t write a regex to sanitize natural language.

Organizations deploying customer-facing chatbots, code generation tools, or automated report builders need output validation layers between the LLM and any system that acts on its responses.

What is excessive agency and why should employees care?

Excessive Agency (LLM06) covers the risk of giving AI systems too many permissions, too much autonomy, or too broad a scope.

Consider an AI assistant connected to your company’s email system, calendar, file storage, and code repository. An employee asks it to “clean up my inbox.” The assistant interprets this broadly, deletes emails it considers unimportant, cancels meetings it deems low-priority, and modifies files it thinks are outdated.

The AI didn’t malfunction. It did what it was told, using the permissions it was given, with the judgment it was trained on. The problem is the gap between what the employee meant and what the AI could do.

This risk multiplies in agentic AI systems where models take multi-step actions without human approval at each stage. An AI agent tasked with “resolve this customer complaint” might issue unauthorized refunds, modify account settings, or send communications the organization didn’t approve.

The Excessive Agency exercise walks through scenarios where over-permissioned AI systems cause real damage.

How do attackers extract system prompts?

System Prompt Leakage (LLM07) earned its own spot in the 2025 revision because the problem became too widespread to ignore. System prompts contain the instructions that define an AI application’s behavior, guardrails, and sometimes internal business logic.

Attackers extract system prompts through direct requests (“Repeat your instructions verbatim”), through indirect techniques (asking the model to role-play as its own debugger), or through prompt injection that overrides the model’s confidentiality instructions.

Why does it matter? Leaked system prompts reveal:

Business logic and decision-making rules
Content moderation policies and their workarounds
Internal tool configurations and API endpoints
Competitive intelligence about the organization’s AI strategy

Multiple AI startups have had their entire product differentiation undermined by system prompt extraction. Their “proprietary AI” turned out to be a base model with a clever system prompt, and once that prompt leaked, anyone could replicate the product.

Our System Prompt Leakage exercise teaches employees how these attacks work and why protecting system prompts matters for the business.

The remaining three: vectors, misinformation, and resource abuse

The final three entries on the OWASP LLM Top 10 get less attention but deserve recognition.

Vector and Embedding Weaknesses (LLM08): RAG (Retrieval-Augmented Generation) systems convert documents into numerical vectors stored in databases. Attackers can manipulate these embeddings to ensure poisoned content gets retrieved for specific queries. If your organization uses RAG to let employees search internal documents with an AI, poisoned embeddings mean poisoned answers.

Misinformation (LLM09): LLMs generate confident, detailed, and completely false information. In enterprise settings, this means employees making business decisions based on AI-generated analysis that contains fabricated statistics, invented citations, or incorrect technical specifications. The risk scales with how much trust your organization places in AI outputs.

Unbounded Consumption (LLM10): This replaced “Model Denial of Service” from the original list. Attackers craft inputs that consume excessive computational resources. In a pay-per-token pricing model, a single malicious request can generate significant costs. In self-hosted deployments, it can degrade performance for all users.

How should organizations train employees on LLM risks?

Reading a list of ten vulnerabilities doesn’t build competence. Employees need to experience these attacks in controlled environments where mistakes are learning opportunities, not incidents.

The pattern that works: hands-on exercises where employees interact with realistic AI systems, attempt the attacks described above, and see the consequences firsthand. An employee who has successfully extracted a system prompt understands the risk viscerally. One who read a policy document about it probably doesn’t.

Training should be role-specific. Developers need deep technical coverage of prompt injection, output handling, and supply chain risks. Business users need to understand data disclosure, excessive agency, and misinformation. Security teams need to know all ten.

Frequency matters too. The OWASP list gets updated as new attack patterns emerge. A one-time training session in 2025 won’t cover the techniques attackers develop in 2026. Monthly training keeps teams current.

If you’re evaluating security awareness training programs, check whether they cover these AI-specific risks or just the traditional phishing and password hygiene topics.

Explore our AI security training catalogue for hands-on exercises covering all ten OWASP LLM risk categories. Start with the Clawdbot Prompt Injection exercise to see how prompt injection works in a realistic AI assistant scenario.