Analyst comparing real and fake video using frequency graph and metadata dashboard

In the era of Generative AI, human perception has become a flawed diagnostic tool. In 2026, "digital cloning" has reached a level of sophistication where visual inspection is no longer sufficient for judicial or corporate environments. To uncover the truth, we must dive into the frequency domain and the mathematical structure of the underlying data.

In this article, I detail the forensic workflow I utilized to debunk a high-complexity fraudulent video using FFT, DFT, and DCT-H analysis.

1. The Invisible Layer: Deep Metadata Extraction

Forensics begins before pressing play. Every video file contains an "atomic" structure and metadata that act as a digital fingerprint.

  • Codec Inconsistency: Through header analysis, we identify if the container (e.g., MP4) contains metadata from libraries like libavcodec (common in AI frameworks) while claiming to originate from a mobile device (iOS/Android).
  • Software Signatures: Many Deepfake tools fail to purge rendering traces in memory buffers, leaving specific XMP metadata tags that betray the synthetic origin of the file.

2. Audio Forensics: The Frequency Domain (DFT and FFT)

Audio is often the "Achilles' heel" of Deepfakes. While the eye is easily deceived, the mathematics of sound rarely lies.

DFT (Discrete Fourier Transform)

We use DFT to convert the audio signal from the time domain to the frequency domain. This allows us to isolate individual vocal components and identify anomalies that do not belong to natural human vocal biometrics.

FFT (Fast Fourier Transform)

The FFT is the optimized algorithm that makes real-time frequency analysis possible. In a forensic context, it enables:

  • ENF (Electrical Network Frequency) Analysis: We verify the near-inaudible hum of the power grid captured in the recording. Any phase shift or frequency break (typically at 50Hz or 60Hz) provides mathematical proof that the audio was spliced or synthesized.
  • High-Frequency Roll-off: Voice-generation IAs often struggle to reproduce frequencies above 16kHz, creating an artificial "wall" in the spectrogram visible only via FFT.

3. Advanced Visual Analysis: The Role of DCT-H

To prove image manipulation, we analyze compression artifacts where DCT (Discrete Cosine Transform) plays the leading role.

Most digital videos use DCT to group pixels into blocks and remove redundancies. When an AI-generated face is overlaid onto an original video, a heterogeneity of coefficients occurs.

  • DCT-H (DCT Heterogeneity): We analyze the high-frequency coefficients of the cosine functions. If the face displays a compression noise pattern different from the background, the evidence of "digital pasting" is mathematically confirmed.
  • ELA (Error Level Analysis): Based on DCT, this technique highlights areas with different modification levels. In a Deepfake, the face often "glows" differently from the rest of the body under this filter.

Unmasking Deepfakes in 2026 requires a balance between cutting-edge technology and methodological rigor. Utilizing DCT, DFT, and FFT transforms a "subjective impression" into admissible forensic evidence.

If your organization or legal department is dealing with suspicious digital evidence, technical analysis is the only way to ensure the integrity of the facts.

Share this article

Cybersecurity is about people

Book a Talk
Thiago Vieira

About the Author

Thiago Vieira

Cybersecurity Keynote Speaker & Lawyer | TEDx Speaker | Digital Forensics Expert | Co-Founder Incubou | Author of Self Hack | Angel Investor

Recommended Posts