ICLR2025

PN-GAIL: Leveraging Non-optimal Information from Imperfect Demonstrations

Qiang Liu, Huiqiao Fu, Kaiqiang Tang, Chunlin Chen, Daoyi Dong

Abstract

According to Eq 3, the equivalent weight of 𝑥𝑥 1 will be 1.0 compared to others (0.2 × 5 = 1.0). This means that the discriminator will consider 𝑥𝑥 1 to be more likely the optimal demonstration than others !