Intuition Behind np.sum(p * np.log(p / q))

Think of it as a mismatch score.

np.sum(p * np.log(p / q)) asks:

At places where P believes probability is important, how badly does Q disagree?

Here is the intuition piece by piece.

Suppose at one x-location:

p is large
q is also large

Then p/q is near 1, log(p/q) is near 0, so that point contributes almost nothing. This means, Q agrees with P there.

Now suppose:

p is large
q is very small

Then p/q is big, log(p/q) is positive and large, and after multiplying by p, that point contributes a lot. This means, P says this region matters, but Q is not paying enough attention, so the penalty is large.

Now suppose:

p is tiny
q is wrong there

Then even if q is very different, the contribution is small because you multiply by p. This means, KL mostly cares about being wrong where P puts mass.

So the weighting by p is the key idea:

p_i log(p_i / q_i)

says:

How wrong is Q at point i, weighted by how much P cares about that point?

Why the log?

The log turns ratios into a sensible relative penalty.

if p = q, then p/q = 1, and log(1) = 0
if q is half of p, you get a penalty
if q is much smaller than p, the penalty grows a lot
if q is larger than p, the term can become negative locally, but total KL is still nonnegative

So log is measuring relative surprise or relative mismatch, not just raw difference.

Very Simple Picture

Imagine P is the truth, and Q is your model.

Then KL is asking:

If the world really follows P, how much extra surprise do I get by pretending it follows Q instead?

small KL means Q is a good approximation of P
large KL means Q misses important parts of P

Physical Intuition With Curves

If P(x) is centered at 0 and Q(x) is shifted to the right:

near x = 0, P is high
but Q may be low there
that creates a large penalty

That is why KL increases as the two curves separate.

One-Sentence Intuition

np.sum(p * np.log(p / q)) is:

the average log-mismatch of Q from P, where the averaging is done using P itself.