NeurIPS2023

Neural-Logic Human-Object Interaction Detection

Liulei Li, Jianan Wei, Wenguan Wang, Yi Yang

46 citations

Abstract

The interaction decoder utilized in prevalent Transformer-based HOI detectors typically accepts pre-composed human-object pairs as inputs. Though achieving remarkable performance, such paradigm lacks feasibility and cannot explore novel combinations over entities during decoding. We present LOGICHOI, a new HOI detector that leverages neural-logic reasoning and Transformer to infer feasible interactions between entities. Specifically, we modify the self-attention mechanism in vanilla Transformer, enabling it to reason over the ⟨human, action, object⟩ triplet and constitute novel interactions. Meanwhile, such reasoning process is guided by two crucial properties for understanding HOI: affordances (the potential actions an object can facilitate) and proxemics (the spatial relations between humans and objects). We formulate these two properties in first-order logic and ground them into continuous space to constrain the learning process of our approach, leading to improved performance and zero-shot generalization capabilities. We evaluate LOGICHOI on V-COCO and HICO-DET under both normal and zero-shot setups, achieving significant improvements over existing methods.