Distillation

Intermediate

Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.

AdvertisementAd space — term-top

Why It Matters

Distillation is vital for creating efficient AI models that can run on devices with limited resources, such as smartphones and embedded systems. By enabling smaller models to achieve performance close to that of larger models, distillation enhances the practicality of deploying AI in various applications, from natural language processing to computer vision.

Distillation is a model compression technique where a smaller 'student' model is trained to replicate the behavior of a larger 'teacher' model. The process involves transferring knowledge from the teacher to the student by minimizing the Kullback-Leibler divergence between the teacher's output probabilities and the student's predictions. This is typically achieved through a two-step process: first, the teacher model is trained on a dataset, and then the student model is trained using the soft targets produced by the teacher, which provide richer information than hard labels. The distillation process can be mathematically represented as minimizing L = ||y_teacher - y_student||^2 + λ||W_student||_2, where y represents the output probabilities and W denotes the weights of the student model. Distillation is related to the broader concepts of transfer learning and knowledge transfer, enabling the deployment of efficient models in practical applications while retaining high performance.

Keywords

Domains

Related Terms

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.