WWW2026

HAUTE: Harmonizing Action Units with Temporal-contextual Embeddings for Deepfake Detection

Chaewon Kang, Youjin Lee, Jinyoung Han

Abstract

The proliferation of highly realistic deepfake videos threatens public trust and the integrity of digital information. However, detecting sophisticated deepfakes requires analysis beyond surface-level visual artifacts. We propose Harmonizing Action Units with Temporal-contextual Embeddings (HAUTE), integrating physiological muscle dynamics with holistic semantic context through adaptive attention mechanisms. HAUTE captures temporal Action Unit coordination patterns and high-level contextual embeddings, enabling the model to reveal synthesis-induced inconsistencies imperceptible to isolated modalities. Extensive experiments demonstrate state-of-the-art performance with strong cross-dataset adaptability, particularly on commercial tool-based high-quality deepfakes, advancing trustworthy content verification for web ecosystems.