ICLR2025
Improving Language Model Distillation through Hidden State Matching
Sayantan Dasgupta, Trevor Cohn
Abstract
Goal: To match the alternating hidden states between the teacher(T) and the student(S) with different dimensions TEACHER LM HEAD Embedding d T LOGIT Embedding d S LM HEAD LOGIT STUDENT CKA CKA CKA