Posts

Published Dec 3, 2022 by Aditya Mehrotra

Assume we have some log likelihood $\log(p_\theta(x))$ we want to maximize, where the parameters of our probablistic model can be denoted as $\theta$. Now, we...

Published Oct 22, 2022 by Aditya Mehrotra

This post will go over the math and mechanics of how Knowledge Distillation works and also include some code on how to implement it. Preliminaries:...