Nuts-rs: meaning of the parameters in the CpuLogpFunc

Yeah I meant sufficient statistics. Thanks for bringing it up

Regarding static graph, you can always use a black box model in PyMC, you’ll just have to handle the gradients yourself or use gradient free methods.

Also even static graphs can be pretty “fluid”, they can vary in dimension length and can use conditionals and loops with near arbitrary stop conditions.

You can also have partially black box models. We use this for Simulator models with SMC sampling that can have arbitrary dynamic code in it.

Of course, as you mentioned, if you go there you may need to do more grunt work to maintain computational performance.

Does this mean that graphs don’t need to be recompiled anymore when the data changes?

They don’t, if your data is defined via pm.Data it’s treated just as a placeholder with a specific number of dimensions but mutable shape (and contents) and any compiled function will take it as input instead of treating it as a constant.

Having said that, PyMC currently does not exploit this and always recompiles a function in subsequent calls to pm.sample or whatever sampling routine. There are plans to avoid doing this. Nutpie already allows you to exploit this by compiling the model once and then letting you update the data without recompiling

Sorry for the late reply guys.

Indeed, I did some work like this for a particular project. In that project the covariates were all categorical, and for every combination of them there were multiple observations (in some cases, hundreds of thousands).

I rewrote the logp as this

\log(p(x_1, \dots, x_n \mid \mu, \sigma)) = - n (\log(\sqrt{2 \pi}) + \log(\sigma)) - \frac{1}{2\sigma^2}\sum_{i=1}^{n}{x_i^2} + \frac{\mu}{\sigma^2} \sum_{i=1}^{n}x_i - n \frac{\mu^2}{\sigma^2}

which indeed only depends on the sufficient statistics. If you have multiple groups, you need the sum, sum of squares, and sample size per group.

This can also be done for other distributions in the exponential family.

I also found this resource helpful GitHub - mike-lawrence/stancon_2023_tutorial_hieararchical_modeling with the video here https://www.youtube.com/watch?v=Ns30ngCFArQ&t=2660s&ab_channel=MikeLawrence.

Finally, if you’re doing non-hierarchical gaussian regression with gaussian priors on the regression coefficients and you really care about speed, you may want to check whether you can write your model in such a way that it has a closed-form posterior, removing the need for a sampler.

Yes, this is the whole point of the exponential family, but if you want to go crazy on optimisations you can even use this trick to optimise distributions which are not exponential but in which the logpdf can be expressed as a polynomial of fixed degree on the variable plus some part that can’t. Something like this (there \theta is a vector of parameters):

\log P(x, \theta) = g(x, \theta) + \sum_k^m f_k(\theta)x^k

When you sum over x_i you can use the cached sums of powers of x_i and only actually sum the g(x, \theta). This can reduce the size of your computational graph by a lot.

You can use this for the Weibull distribution, which is useful if you want to do survival analysis for example.

1 Like