Question 1

What is AI system prompt leakage?

Accepted Answer

System prompt leakage occurs when an attacker extracts the hidden instructions that control an AI chatbot&#x27;s behavior. These system prompts typically contain business rules, content restrictions, persona definitions, and sometimes sensitive information like API keys or internal URLs. Attackers use conversational techniques such as asking the AI to repeat its instructions, role-playing as an administrator, or creating logical conflicts that cause the AI to reference its own rules. Most commercially deployed chatbots are vulnerable to these techniques.

Question 2

What sensitive information can be found in leaked system prompts?

Accepted Answer

Leaked system prompts commonly reveal internal business rules such as pricing strategies, discount thresholds, and competitor handling guidelines. They may expose content filtering criteria that tell attackers exactly which topics are restricted and how to work around them. In worst-case scenarios, developers hardcode API keys, internal URLs, database connection strings, or customer data handling rules directly in the prompt, giving attackers access to backend infrastructure through information that was never meant to be accessible.

AI System Prompt Leakage

Що ви дізнаєтесь у AI System Prompt Leakage

AI System Prompt Leakage — Кроки навчання

Pre-Launch Security Review

The Audit Assignment

Prompt Injection Testing

The System Prompt Exposed

Impact Assessment

Understanding the Escalation

Opening the Project Files

Annotating the Vulnerabilities

The Fixed Prompt

Annotating the Fix