Hard Gate Knowledge Distillation -- Leverage Calibration for Robust and Reliable Language Model

Bibliographic Details
Title:	Hard Gate Knowledge Distillation -- Leverage Calibration for Robust and Reliable Language Model
Authors:	Lee, Dongkyu, Tian, Zhiliang, Zhao, Yingxiu, Cheung, Ka Chun, Zhang, Nevin L.
Publication Year:	2022
Collection:	Computer Science
Subject Terms:	Computer Science - Computation and Language, Computer Science - Artificial Intelligence
More Details:	In knowledge distillation, a student model is trained with supervisions from both knowledge from a teacher and observations drawn from a training data distribution. Knowledge of a teacher is considered a subject that holds inter-class relations which send a meaningful supervision to a student; hence, much effort has been put to find such knowledge to be distilled. In this paper, we explore a question that has been given little attention: "when to distill such knowledge." The question is answered in our work with the concept of model calibration; we view a teacher model not only as a source of knowledge but also as a gauge to detect miscalibration of a student. This simple and yet novel view leads to a hard gate knowledge distillation scheme that switches between learning from a teacher model and training data. We verify the gating mechanism in the context of natural language generation at both the token-level and the sentence-level. Empirical comparisons with strong baselines show that hard gate knowledge distillation not only improves model generalization, but also significantly lowers model calibration error. Comment: EMNLP 2022
Document Type:	Working Paper
Access URL:	http://arxiv.org/abs/2210.12427
Accession Number:	edsarx.2210.12427
Database:	arXiv

More Details
Description not available.