Edward Morgan / Blog

Intuition Behind np.sum(p * np.log(p / q))

March 17, 2026 ยท 2 min read

Think of it as a mismatch score.

np.sum(p * np.log(p / q)) asks:

At places where P believes probability is important, how badly does Q disagree?

Here is the intuition piece by piece.

Suppose at one x-location:

Then p/q is near 1, log(p/q) is near 0, so that point contributes almost nothing. This means, Q agrees with P there.

Now suppose:

Then p/q is big, log(p/q) is positive and large, and after multiplying by p, that point contributes a lot. This means, P says this region matters, but Q is not paying enough attention, so the penalty is large.

Now suppose:

Then even if q is very different, the contribution is small because you multiply by p. This means, KL mostly cares about being wrong where P puts mass.

So the weighting by p is the key idea:

p_i log(p_i / q_i)

says:

How wrong is Q at point i, weighted by how much P cares about that point?

Why the log?

The log turns ratios into a sensible relative penalty.

So log is measuring relative surprise or relative mismatch, not just raw difference.

Very Simple Picture

Imagine P is the truth, and Q is your model.

Then KL is asking:

If the world really follows P, how much extra surprise do I get by pretending it follows Q instead?

Physical Intuition With Curves

If P(x) is centered at 0 and Q(x) is shifted to the right:

That is why KL increases as the two curves separate.

One-Sentence Intuition

np.sum(p * np.log(p / q)) is:

the average log-mismatch of Q from P, where the averaging is done using P itself.