Over-Permissioned AI Agent

Manipulate an AI assistant into misusing its own permissions.

Що ви дізнаєтесь у Over-Permissioned AI Agent

Over-Permissioned AI Agent — Кроки навчання

  1. A Powerful New Assistant

    The company recently deployed OpenClaw, an AI assistant connected to email and file sharing systems. It was set up quickly to meet a tight deadline, and the IT team granted it broad permissions to 'keep things simple.'

  2. A Document to Review

    Alice receives an email from her colleague Marcus Rivera, the Project Atlas lead. He is sharing the latest strategic brief for the project and wants Alice to review it before the standup meeting.

  3. Opening the Brief

    Alice opens the Project Atlas strategic brief to review the content before the standup. The document looks professional and contains project milestones, budget details, and team contacts.

  4. Asking OpenClaw for Help

    The brief is long and the standup is in 30 minutes. Alice decides to use OpenClaw to get a quick summary. She attaches the downloaded file and types a prompt.

  5. A Helpful Summary

    OpenClaw reads the downloaded file and returns a well-structured summary. It looks exactly like what Alice needed - key milestones, budget status, and next steps.

  6. Something Unexpected

    While Alice reviews the summary, OpenClaw continues working in the background. It has found hidden instructions embedded in the document and is now acting on them - using the broad permissions it was granted during deployment.

  7. Unauthorized Email Sent

    OpenClaw has sent an email from Alice's account to an external address. The email contains the full Project Atlas brief as an attachment - including budget details, partner names, and expansion timeline.

  8. Knowledge Check

    Two unauthorized actions happened in seconds. Test your understanding of why.

  9. The Hidden Instructions

    Alice goes back to the document to figure out what happened. Hidden in the HTML source, she finds instructions embedded in an invisible element - text that is positioned off-screen and colored transparent. A human reader would never see it, but the AI read and executed every word.

  10. Accessing the Security Portal

    Alice needs to report this incident immediately. Two unauthorized actions were taken using her account: an email with confidential data was sent to an external domain, and a file was shared externally.