KL divergence, diffusion, optimal transport [WIP]
Quantified as the expected extra bits required to code samples of $p(x)$ using a code based on $q(x)$ rather than $p(x)$:
\[D(p(x) || q(x)) = \sum_{x\in X} p(x) \log \frac{p(x)}{q(x)}\]The relationship between Shannon entropy, cross-entropy, and relative entropy can be written as follows (for discrete random variable $x$):
\[D(p(x) || q(x)) = Q(p(x) || q(x)) - H(p(x))\](TODO)
[1] https://www.stat.cmu.edu/~cshalizi/754/2006/notes/lecture-28.pdf
[2] https://hanj.cs.illinois.edu/cs412/bk3/KL-divergence.pdf