CCS2023

Privacy Leakage via Speech-induced Vibrations on Room Objects through Remote Sensing based on Phased-MIMO

Cong Shi, Tianfang Zhang, Zhaoyi Xu, Shuping Li, Donglin Gao, Changming Li, Athina P. Petropulu, Chung-Tse Michael Wu, Yingying Chen

10 citations

DOI Publisher

Abstract

Speech eavesdropping has long been an important threat to the privacy of individuals and enterprises. Recent research has shown the possibility of deriving private speech information from soundinduced vibrations. Acoustic signals transmitted through a solid medium or air may induce vibrations upon solid surfaces, which can be picked up by various sensors (e.g., motion sensors, highspeed cameras and lasers), without using a microphone. To date, these threats are limited to scenarios where the sensor is in contact with the vibration surface or at least in the visual line-of-sight. In this paper, we revisit this important line of research and show that a remote, long-distance, and even thru-the-wall speech eavesdropping attack is possible. We discover a new form of speech eavesdropping attack that remotely elicits speech from minute surface vibrations upon common room objects (e.g., paper bags, plastic storage bin) via mmWave sensing, signal processing, and advanced deep learning techniques. While mmWave signals have high sensitivity for vibrations, they have limited sensing distance and normally do not penetrate through walls. We overcome this key challenge through designing and implementing a high-resolution softwaredefined phased-MIMO radar that integrates transmit beamforming, virtual array, and receive beamforming. The proposed system enhances sensing directivity by focusing all the mmWave beams toward a target room object, allowing mmWave signals to pick up minute speech-induced vibrations from a long distance and even through walls. To realize the attack, we design an object identification technique that scans objects in a room and identifies a prominent object that is most sensitive to speech vibrations for vibration feature extraction. We successfully demonstrate speech privacy leakage using speech-induced vibrations via the development of a deep learning framework. Our framework can leverage domain adaptation techniques to infer speech content based only