ICLR2026
Verification and Co-Alignment via Heterogeneous Consistency for Preference-Aligned LLM Annotations
Cheng Chen, Haiyan Yin, Ivor Tsang
摘要
Large Language Models (LLMs) are increasingly expected to be culturally customisable and personally aligned for natural language understanding (NLU). However, existing methods, from supervised fine-tuning (SFT) to personalised RLHF and prompting, either require costly large-scale annotations or remain constrained by their pretraining distributions. Moreover, acquiring annotations that reflect subjective, diverse, and evolving user preferences is both expensive and labour-intensive. To address these limitations, we propose Heterogeneous-Consistency Co-Alignment (HCC), a training-free annotation paradigm that leverages two heterogeneous models: a knowledge-rich yet potentially overconfident LLM and a task-specialised lightweight model guided by a small user preference set. Together, they verify and co-align misaligned outputs over unlabelled corpora. For verification, HCC introduces the reference-free Consistent-And-Inconsistent (CAI) Ratio, an uncertainty signal derived from inter-model agreements (consistent samples) and disagreements (inconsistent samples) to determine whether refinement is necessary. For co-alignment, HCC employs a non-parametric, embedding-based preference assignment scheme to recalibrate inconsistent samples according to user preferences. Across eight NLU datasets and both open- and closed-source LLMs, HCC consistently improves annotation alignment and, in several tasks, enables Llama-3-8B to surpass GPT-3.5/4o-mini after co-alignment correction. Moreover, CAI strongly correlates with accuracy and tracks pre- and post-alignment gains, offering a reference-free signal for scaling preference-aligned annotation without ground-truth supervision.