Compute the KL divergence between two distributions

Do I understand correctly that there should be a scaling factor applied to the logp, presumably to keep it from getting too small? If so, could it be that my model’s scaling factor is not computed correctly, so it is too small? I am digging around, and not sure yet, but it looks like the logp factors that get large are the ones corresponding to the output variables, which have a high number of observations. Could that be it?