Distillation

Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.

Why It Matters

Distillation is vital for creating efficient AI models that can run on devices with limited resources, such as smartphones and embedded systems. By enabling smaller models to achieve performance close to that of larger models, distillation enhances the practicality of deploying AI in various applications, from natural language processing to computer vision.

Distillation is a model compression technique where a smaller 'student' model is trained to replicate the behavior of a larger 'teacher' model. The process involves transferring knowledge from the teacher to the student by minimizing the Kullback-Leibler divergence between the teacher's output probabilities and the student's predictions. This is typically achieved through a two-step process: first, the teacher model is trained on a dataset, and then the student model is trained using the soft targets produced by the teacher, which provide richer information than hard labels. The distillation process can be mathematically represented as minimizing L = ||y_teacher - y_student||^2 + Î»||W_student||_2, where y represents the output probabilities and W denotes the weights of the student model. Distillation is related to the broader concepts of transfer learning and knowledge transfer, enabling the deployment of efficient models in practical applications while retaining high performance.

Keywords

teacher-student

Domains

Foundations & Theory

Related Terms

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 3

3D WordGraph

Full 3D WordGraph

Click a connected term to explore it. The center node is Distillation.

Relationship Types

related to broader / narrower prerequisite of contrasts with used in

Why It Matters

Keywords

Domains

Related Terms

Welcome to AI Glossary

Search

Browse

3D WordGraph