CVPR2025

InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

Sai Kumar Dwivedi, Dimitrije Antic, Shashank Tripathi, Omid Taheri, Cordelia Schmid, Michael J. Black, Dimitrios Tzionas

DOI arXiv Publisher

Abstract

Figure 1 . We present InteractVLM, a novel method for estimating contact points on both human bodies and objects from a single in-thewild image, shown here as red patches. Our method goes beyond traditional binary contact estimation methods by estimating contact points on a human in relation to a specified object. We do so by leveraging the broad visual knowledge of a large Visual Language Model.