What is a Deepfake Attack
Synthetic AI voice and video that impersonates executives in real time, weaponized against finance teams, HR, and the C-suite to authorize fraudulent wires and extract sensitive data.
By Dmytro Koziatynskyi Last reviewed
Deepfake attacks weaponize AI-generated voice and video against finance and executives
A deepfake is synthetic media produced by AI models, usually generative adversarial networks (GANs) or diffusion-based architectures, that reproduces a target person's face, voice, or mannerisms with enough fidelity to defeat human recognition. In a security context, deepfakes are not a novelty. They are a delivery mechanism for fraud, layered on top of business email compromise, vishing, whaling, and CEO fraud. The synthetic media itself is rarely the attack. The attack is the transfer, the data leak, or the credential reset that the deepfake unlocks.
The canonical case is Arup, the global engineering firm. In early 2024, Hong Kong police confirmed that a finance worker at the Arup office authorized fifteen wire transfers totaling roughly HK$200 million (about $25 million) after joining a video conference where every other participant, including the CFO, was a real-time deepfake. The employee had been suspicious of an initial email but was reassured by seeing and hearing what looked like familiar colleagues on the call. Arup confirmed the incident publicly. It is the largest single-employee deepfake fraud on the public record.
Two years earlier, video deepfakes that worked in real time were a research demo with obvious artifacts. The 2022 voice clones that powered the first wave of vishing attacks needed several minutes of clean source audio and produced a recognizable cadence-mismatch tell. By 2024, Regula Forensics surveyed 1,000 organizations and found 49% had encountered deepfake fraud against either their identity verification flows or their employees. By 2025, real-time face-swap and voice-cloning tools run on a single consumer GPU. The fidelity gap that protected enterprises in 2022 is closed.
For buyers comparing human risk management vendors, the question is no longer "do you cover deepfakes." The question is whether your training reflects the way deepfakes actually arrive: combined with a spoofed email thread, a real meeting invite, and a callback to an attacker-controlled number. The rest of this page covers how a deepfake attack unfolds, three named case studies, the eight-control defense framework that finance and security leaders are converging on, and how RansomLeak trains the verification reflex through the whaling-with-a-deepfake exercise.
How a deepfake attack unfolds
Target selection
Attackers profile organizations for executives with a public voice presence and finance teams that handle high-value wires. Common targets are CFOs, treasury controllers, accounts-payable managers, M&A deal teams, and executive assistants. Listed companies, firms in active M&A, and family offices are over-represented in the case data because their leadership is both visible and authorized to move money.
Voice and video sample harvesting
The attacker collects 30 to 90 seconds of clean source audio from earnings calls, conference talks, podcast interviews, YouTube keynotes, and LinkedIn video posts. Video deepfakes need a few minutes of well-lit footage. For C-suite targets this is trivial. For mid-level finance staff being impersonated to a vendor, attackers harvest from internal town-hall recordings leaked through a compromised mailbox or from public webinar appearances.
Model fine-tuning
Voice cloning runs on commercial APIs (ElevenLabs, Resemble.ai, PlayHT) or open-source projects (Tortoise-TTS, RVC, Coqui). Video deepfakes use SimSwap, Roop, DeepFaceLive, or commercial avatar platforms (HeyGen, Synthesia, D-ID) repurposed off-policy. Real-time pipelines route a webcam feed through a face-swap model and a synthesized voice in under 200 ms of latency, which is well below the threshold where humans perceive a delay.
Pretext and delivery
The deepfake almost never lands cold. It rides on a real email thread or a hijacked mailbox, often a vendor or executive assistant compromised weeks earlier. Common delivery vectors are a whaling email followed by a cloned-voice voicemail confirming the request, a Teams or Zoom invite where the attacker joins as the executive, or a WhatsApp voice note from a number spoofed to look personal. The pretext is urgency: a closing acquisition, a regulator deadline, a confidential investigation.
Pressure and extraction
Once the target is on the call or has the voicemail, the attacker pushes for an irreversible action: a wire to a new beneficiary, an MFA reset for an executive account, a data export of payroll or customer PII. The pressure is wrapped in confidentiality language ("do not loop in legal yet"), which suppresses the lateral verification that would catch the fraud. The window from first contact to wire is often under two hours.
Real-world deepfake case studies
Arup, Hong Kong, 2024: $25M lost to a deepfake video conference
In January 2024, a finance employee in Arup's Hong Kong office received an email from someone presenting as the UK-based CFO, requesting a confidential transaction. The employee was suspicious until they joined a video call with the "CFO" and several other recognizable executives. Every other person on the call was a real-time deepfake, generated from publicly available footage of Arup leadership. Convinced, the employee processed fifteen transfers to five Hong Kong bank accounts, totaling HK$200 million (around $25 million). Hong Kong police confirmed the fraud in February 2024, and Arup acknowledged the incident publicly. No technical compromise occurred, just synthetic faces and voices on a routine corporate video call.
UK energy firm, 2019: $243K wired after a voice-clone CEO call
In a case first reported by The Wall Street Journal and the firm's insurer Euler Hermes, the UK CEO of an unnamed energy company received a phone call from what sounded like the German parent-company CEO, instructing him to wire 220,000 euros (around $243,000) to a Hungarian supplier within an hour. The cloned voice carried the German executive's slight accent and characteristic cadence. The wire cleared. A second call requested another transfer, at which point the CEO grew suspicious and the fraud was uncovered. This is one of the first publicly documented voice-clone CEO frauds and the template every later case has followed.
Ferrari, 2024: deepfake CEO call blocked by a personal-knowledge challenge
In July 2024, a Ferrari executive received WhatsApp messages and then a phone call from a voice that closely matched CEO Benedetto Vigna, claiming a confidential acquisition required immediate action. The number and profile picture were spoofed. The executive grew suspicious of subtle intonation differences and asked a question only the real CEO would be able to answer (reportedly about the title of a book Vigna had recently recommended). The deepfake call ended abruptly. Ferrari reported the attempt and notified authorities. The case is now cited as the textbook example of a challenge-phrase defense working in real time.
How to defend against deepfake attacks
Code-word and challenge-phrase verification
Establish a rotating code word or personal-knowledge challenge for any wire, vendor change, MFA reset, or sensitive data request initiated by voice or video. The code word must never travel through email, Slack, or Teams chat. The Ferrari case shows the control works under live pressure when the policy is genuinely practiced.
Dual authorization with callback on a known number
Require two-person approval on every wire above a board-set threshold, and a callback to a phone number listed in your internal directory or HRIS, never a number provided in the original request. The callback is the single highest-leverage control because it forces the attacker onto a channel they do not control.
Executive personal-device and social-media OPSEC
Limit the volume of clean executive voice and video samples available online. Encourage CFOs, CEOs, and board members to keep podcast and video appearances behind events that publish only excerpts, to avoid LinkedIn video posts unless necessary, and to disable unauthenticated voicemail access. The exposure cannot be eliminated, but it can be reduced.
Out-of-band verification on a separate channel
For any urgent request received on one channel (email, Teams, voicemail), confirm on a different channel before acting. If the request arrives by email and is reinforced by voicemail from the same actor, treat both as the same channel. Out-of-band means a fresh, internally sourced contact path.
Deepfake-aware finance and HR training
Generic awareness modules do not surface the cues that matter. Finance, treasury, HR, executive assistants, and IT help desk staff need scenario-based drills that combine a spoofed email, a cloned voicemail, and a deepfake video call. The Arup pattern is now well-documented and maps cleanly into a tabletop exercise.
MFA-resistant authentication for downstream actions
A successful deepfake call often ends with a request to push-approve an MFA prompt or read out a one-time code. Move executive accounts to phishing-resistant MFA (FIDO2 hardware keys, platform passkeys) so that even a fully convinced employee cannot hand over a usable factor over the phone.
Runtime detection tooling
Deploy detection where the call lands. Pindrop and Reality Defender analyze voice for synthesis artifacts on inbound calls. Microsoft Video Authenticator and Intel FakeCatcher score video frames for deepfake probability. Detection accuracy is imperfect against the latest models, so treat these as an additional layer, not a primary control.
Post-incident playbook with FBI IC3 and law-enforcement reporting
Pre-stage the response. Know who calls the bank to recall the wire, who notifies cyber insurance, who files with the FBI Internet Crime Complaint Center (IC3) or the local equivalent (Action Fraud in the UK, Hong Kong police in the Arup case). Recovery windows are measured in hours, not days. A documented playbook is worth more than detection technology when the wire has already cleared.
How RansomLeak trains employees to detect deepfakes
The signature exercise on this topic is whaling-with-a-deepfake. The learner takes the role of a finance-team analyst who receives an urgent email from the CFO requesting a confidential transaction, followed by a voicemail in the CFO's cloned voice, and a video call where the CFO and two other executives appear to confirm the wire. The scenario is built directly from the public 2024 Arup pattern, and every interactive choice maps to a real control: the callback policy, the code-word check, the dual-authorization workflow, the out-of-band verification.
The exercise is designed to build the verification reflex that finance teams and executive assistants need most. Recognizing a deepfake by its visual artifacts is unreliable and getting more so each quarter. Recognizing that an urgent, single-channel request to move money is suspicious regardless of how convincing the face on the call looks is durable. Learners leave with a workflow they can repeat under pressure, not with a list of artifacts to memorize that will be stale in six months.
All RansomLeak content ships as SCORM 1.2 and SCORM 2004 packages, so the deepfake exercise drops directly into your existing LMS (Workday Learning, Cornerstone, Docebo, SAP SuccessFactors, Litmos) without bespoke integration work. Completion records, scores, and time-on-task report back through standard SCORM telemetry to whatever dashboard your security or compliance team already uses.
What is a deepfake attack and how can businesses defend against it?
A deepfake attack uses AI-generated synthetic media to impersonate a real person's face or voice convincingly enough to defeat human recognition. In a business context, deepfakes layer on top of business email compromise, whaling, and vishing to authorize wire transfers, leak data, or trigger MFA resets. The 2024 Arup case, where a finance worker authorized $25 million across fifteen transfers after a deepfake video conference, is the canonical public example.
Attackers harvest 30 to 90 seconds of clean audio from earnings calls or LinkedIn for voice cloning, and a few minutes of footage for video deepfakes. Tooling is commodity (ElevenLabs, Resemble.ai for voice; HeyGen, DeepFaceLive for video). Regula Forensics found 49% of organizations encountered deepfake-related fraud against employees or identity verification in 2024.
Defenses converge on a verification framework, not detection technology. Establish code-word challenges for any voice or video request to move money or change a vendor. Require dual authorization with callback verification on a number from your internal directory, never the one in the original message. Reduce executive voice and video exposure online, and drill finance, HR, and executive assistants on the Arup pattern explicitly.
Recommended exercises
Scenario-based simulations from the 100+ catalogue.
Whaling with a Deepfake
Signature deepfake exercise built from the 2024 Arup pattern: spoofed email, cloned voicemail, and live deepfake video call.
Try the exerciseBusiness Email Compromise
Most deepfakes ride on top of an active BEC operation. This exercise teaches the email-thread fundamentals deepfakes amplify.
Try the exerciseVishing
Voice-clone deepfakes are vishing with synthetic audio. The vishing exercise builds the callback reflex that catches both.
Try the exerciseSpear Phishing
Targeted email is the entry point for most deepfake-enabled wire fraud against finance teams and executive assistants.
Try the exerciseCallback Phishing
Trains the discipline of verifying contact numbers from a trusted source, the single highest-leverage deepfake control.
Try the exerciseSocial Engineering
Deepfakes are a delivery layer on top of classic social engineering. This exercise covers the underlying pretext patterns.
Try the exerciseFurther reading
Deeper guides on adjacent topics.
Related glossary terms
Quick definitions for the terms in this pillar.
Frequently Asked Questions
What security leaders ask about this threat.
What is a deepfake?
A deepfake is synthetic media generated by AI models, usually generative adversarial networks (GANs) or diffusion-based architectures, that reproduces a real person's face, voice, or mannerisms with high fidelity. In cybersecurity, deepfakes are used to impersonate executives in voicemail and video calls and to authorize fraudulent transfers or data releases.
The current generation of voice clones needs only 30 to 90 seconds of clean source audio. Real-time video deepfakes run on a single consumer GPU and can join Zoom or Teams calls live with sub-200ms latency.
How can I tell if a video call is a deepfake?
Visual tells are getting harder to rely on as models improve. Older artifacts (blurred edges around hair and ears, mismatched eye reflections, frozen blinking) still appear in budget tooling but are largely absent from premium real-time pipelines as of 2025. Lighting consistency, ear shape, and lip-sync precision in side profile are still useful checks.
The reliable defense is procedural, not perceptual. Insist on a callback to a known internal number for any urgent request that moves money or changes credentials. The Ferrari case in 2024 was caught by a personal-knowledge question, not by spotting visual artifacts.
Are deepfakes used in business email compromise?
Yes. Most deepfake fraud in 2024 and 2025 ships as a layer on top of an existing BEC operation. The attacker compromises or spoofs a mailbox, builds a real-looking email thread, and then reinforces the request with a cloned-voice voicemail or a deepfake video call to bypass the "verify by phone" reflex some companies adopted in response to early BEC.
The FBI Internet Crime Complaint Center (IC3) tracks these as BEC variants. Reported losses from BEC, including deepfake-enabled cases, totaled $2.9 billion in 2023 in the United States alone.
How long does it take to create a deepfake voice?
Voice cloning that is good enough for a one-minute voicemail or a short phone exchange takes about five minutes of compute on commercial tools like ElevenLabs or Resemble.ai, given 30 to 90 seconds of clean source audio. Open-source models (Tortoise-TTS, RVC, XTTS) take longer to fine-tune but produce comparable results from the same source material.
Real-time voice cloning for live conversation requires a fine-tuned voice and a streaming inference pipeline, which adds engineering effort but is increasingly packaged into commercial APIs.
What should an executive do if they suspect a deepfake?
Hang up or end the call immediately. Do not negotiate, do not let the attacker keep talking, and do not authorize anything on the spot. End the channel they reached you on.
Then verify on a different channel using a contact path you sourced yourself, such as a phone number from your HRIS or an in-person check. Notify the security team and finance leadership in parallel so any wire that is in flight can be recalled. If you have already authorized a transfer, contact the originating bank within the hour and file a report with the FBI IC3 or your local cybercrime authority.
Can detection software reliably catch deepfakes?
Detection tools (Reality Defender, Pindrop, Microsoft Video Authenticator, Intel FakeCatcher) work as a probability score, not a binary judgment. Their accuracy against the latest commercial voice and video models drops every quarter as generation quality improves.
Treat detection as a defense-in-depth layer behind procedural controls (callback verification, code words, dual authorization). The reliable defense against a deepfake is not catching the synthetic media, it is refusing to act on a single-channel request to move money or release data, regardless of how convincing the face or voice appears.
Sources & further reading
Primary sources cited above and adjacent guidance.
- Deepfakes Identification Guide — CISA
- Increasing Threats of Deepfake Identities — U.S. Department of Homeland Security
- Arup confirms deepfake video meeting led to $25M loss (Hong Kong) — CNN
- Internet Crime Report 2023 — FBI Internet Crime Complaint Center (IC3)
Train Your Team Against This Threat
Book a 30-minute walkthrough. We will scope the exercise sequence and rollout timeline.