CVPR2023

Model-Agnostic Gender Debiased Image Captioning

Yusuke Hirota, Yuta Nakashima, Noa Garcia

摘要

baseline a man wearing a suit holding a banana +LIBRA a man in a jacket holding a banana (a) context → gender bias mitigation (b) gender → context bias mitigation baseline a young boy holding a baseball bat +LIBRA a young boy holding a plastic frisbee baseline a young boy riding a skateboard +LIBRA a young girl riding a skateboard baseline a man riding a wave on a surfboard +LIBRA a woman catching a wave on a surfboard Figure 1. Generated captions by a baseline captioning model (UpDn [2]) and LIBRA. We show the baseline suffers from context → gender/gender → context biases, predicting incorrect gender or incorrect word (e.g., in the left example, skateboard highly co-occurs with men in the training set, and the baseline incorrectly predicts boy). Our proposed framework successfully modifies those incorrect words.