Outputting loglikelihood of each parameter set

I am currently using a custom model implemented with DensityDist. However, I am running into an issue at the end when it saves data to an arviz dataframe. What PyMC seems to be doing is re-calculating the loglikelihood of every single parameter set as is saves to the arviz dataframe, taking a significant amount of time. My custom likelihood function takes on the order of 1 ms to run, so having to recalculate the loglikelihood of my data takes a prohibitive amount of time.

Does anyone know a workaround for this that will cause PyMC to save the loglikeihood in the data frame as its sampling rather than recalculating at the end?

You can try passing log_likelihood=False in pm.sample(..., log_likelihood=False) to suppress the computation.

The problem is that I need the loglikelihood saved as it is calculated for further data analysis. Ideally, I would like it done while the computation is occurring since it takes multiple minutes to do afterwards.

PyMC computes the element wise log likelihood (per observation) at the end because this is generally needed for model comparison.

This quantity is not computed at all during NUTS sampling, where a combined scalar log likelihood + log prior (and gradients) is computed for all the parameters at once

Ah, I see. I’m using DEMetropolisZ where the logp is calculated for each sample anyway. Do you know if there is a way around this? Maybe using pm.Deterministic where I can load the value calculated into a Potential or DensityDist?

I am not sure it can be done. You would need to wrap it in a Deterministic because you don’t want it to affect sampling (Potential or DensityDist would do that). Are you in V3 or V4?

You could try to add a

pm.Deterministic("loglike”, obs.logp_elemwiset)

But even if that worked I am not sure it would save you time. I think the computation would be duplicated, not reused. Your best chance may be to hack the sampler to make it save the loglikelihood as a sampling stat.

Yeah, that’s the problem I’ve been running into. Whenever I use deterministic, it just re-calculates the logp, and I haven’t been able to figure out if there is a way to calculate the Deterministic, then use the value output by the Deterministic as an input to either Potential or DensityDist.

I’m still using V3.

Why do you want to do that?

For the problems I work on, I’m commonly fitting parameters to 50-100 individual data sets for multiple different models. To compare them, I might use something like a DIC calculation which requires the loglikelihood. Since my loglikelihood function takes ~1ms to run, I need to use a fast sampler like DeMetropolisZ, which requires ~50,000 samples to get good results back.

The end result of this is that it takes minutes to just compute my loglikelihood during parameter inference. Since PyMC isn’t saving the loglikelihood during parameter inference, I have to recalculate it later, which can take hours for all the data I’m working with. If I could save the loglikelihood values while it’s sampling (without having to calculate twice as Deterministic is doing), the problem goes away.

The calculating twice is not because of the deterministic. It’s because the sampler does not know about it. It has nothing to do with not being in a Potential or DensityDist.

As I was saying your best chance is to hack the sampler if you need it to save that result.

Depending on how the sampler is partitioning the graph the deterministic could actually reuse the computation but I can’t say for sure without looking at the Theano graph and what the sampler is doing.

1 Like

To try and clarify the remarks made by Ricardo above.

All samplers compute the logp but as far as I know, none of them computes the pointwise log likelihood (called logp_elemwiset internally by pymc) which is what is needed for model comparison with information criteria like waic or loo cross-validation.

The logp is the sum of the total log likelihood and the log prior. The pointwise log likelihood is the log likelihood only evaluated at a single observation (computed at each sample and observation).