ICLR2025

Improving Language Model Distillation through Hidden State Matching

Sayantan Dasgupta, Trevor Cohn

Abstract

Goal: To match the alternating hidden states between the teacher(T) and the student(S) with different dimensions TEACHER LM HEAD Embedding d T LOGIT Embedding d S LM HEAD LOGIT STUDENT CKA CKA CKA