NeurIPS2021

Independent Prototype Propagation for Zero-Shot Compositionality

Frank Ruis, Gertjan J. Burghouts, Doina Bucur

被引用 74 次

摘要

Humans are good at compositional zero-shot reasoning; someone who has never seen a zebra before could nevertheless recognize one when we tell them it looks like a horse with black and white stripes. Machine learning systems, on the other hand, usually leverage spurious correlations in the training data, and while such correlations can help recognize objects in context, they hurt generalization. To be able to deal with underspecified datasets while still leveraging contextual clues during classification, we propose ProtoProp, a novel prototype propagation graph method. First we learn prototypical representations of objects (e.g., zebra) that are conditionally independent w.r.t. their attribute labels (e.g., stripes) and vice versa. Next we propagate the independent prototypes through a compositional graph, to learn compositional prototypes of novel attribute-object combinations that reflect the dependencies of the target distribution. The method does not rely on any external data, such as class hierarchy graphs or pretrained word embeddings. We evaluate our approach on AO-Clever, a synthetic and strongly visual dataset with clean labels, and UT-Zappos, a noisy real-world dataset of fine-grained shoe types. We show that in the generalized compositional zero-shot setting we outperform state-of-the-art results, and through ablations we show the importance of each part of the method and their contribution to the final results. Compositional Zero-Shot Learning (CZSL) [6] is the problem of learning to model novel objects and their attributes as a composition of visual primitives. Previous works in CZSL [6, 7, 8] largely ignore the dependencies between classes with shared visual primitives, and the spurious correlations between attributes and objects. More recently, Atzmon et al. [9] tackle the latter by ensuring conditional independence between attribute and object representations, while Naeem et al. [10] explicitly promote dependencies between the primitives and their compositions. While the independence approach improves generalization, it hurts accuracy on seen classes by removing useful correlations. The Preprint. Under review.