ICLR2025

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

Ángela López-Cardona, Carlos Segura, Alexandros Karatzoglou, Sergi Abadal, Ioannis Arapakis

Abstract

Advancements in Natural Language Processing (NLP), have led to the emergence of Large Language Models (LLMs) such as GPT, Llama, Claude, and Gemini, which excel across a range of tasks but require extensive fine-tuning to align their outputs with human expectations. A widely used method for achieving this alignment is Reinforcement Learning from Human Feedback (RLHF), which, despite its success, faces challenges in accurately modelling human preferences. In this paper, we introduce GazeReward, a novel framework that integrates implicit feedback -and specifically eye-tracking (ET) data -into the Reward Model (RM). In addition, we explore how ET-based features can provide insights into user preferences. Through ablation studies we test our framework with different integration methods, LLMs, and ET generator models, demonstrating that our approach significantly improves the accuracy of the RM on established human preference datasets. This work advances the efforts to optimise AI alignment with human values, and explores the potential of cognitive data to shape future NLP research. Eye-tracking features module predictor NN module Eye-tracking features 1 prompt response Gaze features generation Eye-tracking embeddings 3 Embeddings construction with eye-tracking features 2 Input preparation <eye> 4 Combined embedding ADD Chosen response Rejected response Win / Loss prompt To make authentic patatas bravas: 1. Cut potatoes into bite-sized cubes and parboil them. To make patatas bravas, just fry some potatoes and put ketchup on them. It's basically…