ICCV2021

Real-Time Video Inference on Edge Devices via Adaptive Model Streaming

Mehrdad Khani Shirkoohi, Pouya Hamadanian, Arash Nasr-Esfahany, Mohammad Alizadeh

被引用 58 次

摘要

Real-time video inference on edge devices like mobile phones and drones is challenging due to the high computation cost of Deep Neural Networks. We present Adaptive Model Streaming (AMS), a new approach to improving performance of efficient lightweight models for video inference on edge devices. AMS uses a remote server to continually train and adapt a small model running on the edge device, boosting its performance on the live video using online knowledge distillation from a large, state-of-the-art model. We discuss the challenges of over-the-network model adaptation for video inference, and present several techniques to reduce communication cost of this approach: avoiding excessive overfitting, updating a small fraction of important model parameters, and adaptive sampling of training frames at edge devices. On the task of video semantic segmentation, our experimental results show 0.4-17.8 percent mean Intersection-over-Union improvement compared to a pretrained model across several video datasets. Our prototype can perform video segmentation at 30 frames-per-second with 40 milliseconds camera-to-label latency on a Samsung Galaxy S10+ mobile phone, using less than 300 Kbps uplink and downlink bandwidth on the device. No Customization 0 Up / 0 Down (Kbps) One-Time Customized 50 Up / 80 Down (Kbps) Adaptive Model Streaming (Ours) 169 Up / 206 Down (Kbps) Remote Inference + Optical Flow Tracking 1880 Up / 30 Down (Kbps) Road Building Vegetation Sky Person Car Just-In-Time 2520 Up / 3207 Down (Kbps)