Perform model fit evaluation in Bayesian way when sampling from the custom distribution is not known

Hi All,

I need help to understand what methods I can use to evaluate my fitted model if I don’t know how to sample from the custom distribution I implemented. In a bit more detail,

A lot of times, I have to implement a custom likelihood function because the distribution is not implemented in PyMC, e.g., even though Wald distribution is implemented in PyMC, a recently proposed censored shifted Wald distribution is not. So for such custom distributions, I don’t always know how to sample from the distribution. Hence, I am not able to use functions such as pm.sample_posterior_predictive for model fit evaluation because it throws a NotImplementedError, which is correct because I have not implemented the random function.

However, this has become a big roadblock for me to try new models/likelihood functions using Bayesian inference methods. Hence, please advise if there is a way in PyMC to do a model fit evaluation, e.g., posterior predictive analysis or some other methods, without knowing how to generate samples from the custom distribution/likelihood function.

If you can’t find a way to generate random draws directly you can use mcmc. This should work fine for prior predictive (just remove the observed kwarg and call pm.sample).

For posterior predictive it’s trickier because you want to take a draw for every posterior draw you got. You could rewrite your model now passing the posterior draws as constants and do a new “pm.sample” which would correspond to a posterior predictive.

The other option is to try and see if the Censored Shifted Wald distribution is just a fancy name for a transformation of simpler known distributions. From a very rough pattern matching, it sounds like a Shifted Wald is shift + 1/Normal? distribution? If that’s the case, and the Censored bit in the name corresponds to what pm.Censored does… sampling from it should be pretty simple with CustomDist if you can get the right parameters in the right place.

In this snippet I try to create such a distribution, for a case with two trials: one censored RT and one non-censored RT:

import pymc as pm
import numpy as np

def shifted_wald_fn(mu, sigma, shift, size):
    shifted_wald = shift + 1 / pm.Normal.dist(mu, sigma, size=size)
    return shifted_wald

def censored_shifted_wald_fn(mu, sigma, shift, censor_bound, size):
    shifted_wald = pm.CustomDist.dist(
        mu, sigma, shift,
        dist=shifted_wald_fn,
        size=size,
    )
    censored_shifted_wald = pm.Censored.dist(
        shifted_wald, 
        lower=None,
        upper=censor_bound, 
        size=size,
    )
    return censored_shifted_wald

with pm.Model() as m:
    mu = 0.5
    sigma = 0.1
    shift = 0.7
    censor_bound = [1, np.inf]  # First is censored, second is not
    pm.CustomDist(
        "rt",
        mu, sigma, shift, censor_bound, 
        dist=censored_shifted_wald_fn,
        observed=[1, 2.5],
    )

m.compile_logp(sum=False)({})[0]
# array([-2.86651608e-07,  5.37522423e-02])

If the logp matches your definition (after you put the right parameters in the right place in the generative functions), then we are probably talking about the same thing, and PyMC will be happy to take random draws from the CustomDist.

If that’s the case, you don’t even need to bother providing your manual logp, unless you think you have a more efficient implementation.

Do you have a good reference on the derivation of the Censored Shifted Wald distribution?

What I wrote above was incorrect. Apparently InverseGaussian is not 1/Normal :smiley: and unfortunately we don’t have one implemented. That just means we need to implement another building block (or just do everything together). Wikipedia gives a suggestion how to take a random draw from an InverseGaussian, so things shouldn’t be too bad. In that case the random method, should just be a draw from the correctly specified InverseGaussian + shift?

I’ve implemented the base InverseGaussian in this gist, and then a shifted version:

Adding the logcdf function to the InverseGaussian should allow the shifted version to work with pm.Censored, but for the point of doing prior and posterior predictive I assume you don’t want to censor the values anyway, so you can just write a model with the “uncensored” version to do predictions?

If someone would like to help with adding an InverseGaussian to pymc-experimental, check out: Implement InverseGaussian distribution · Issue #253 · pymc-devs/pymc-experimental · GitHub

Thanks for the response @ricardoV94. I was wondering if I could do something like this.

But then I realized that if I have 1000 posterior samples, I would end up running 4 MCMC chains with say, 500 tunes and 500 draws at least, for every 1000 posterior samples, and would have to verify the convergence for each 1000 predicted datasets. This would be a lot of computational effort I guess. But if you suggest that this option is technically correct, then I would certainly try that.

Or maybe I would just need 1 draw after maybe 500 tune samples. But then how would I know that my chains converged to a stationary distribution and the “1” sample is from the posterior predictive distribution? I don’t know what would be technically correct. I will think more about this, thanks for the suggestion.

Thanks for the reply @ricardoV94.

The shifted wald distribution used for response times in decision-making studies, substracts a non-decision time, expressed as a random variable, from the observed response time. The remaining observed response time is considered the true decision time.

But, the shifted wald distribution is for correct response times only. The censored shifted wald distribution adds a survival function to the logp to account for incorrect responses.

If you may be interested, this paper might be a good read On the Relation Between the (Censored) Shifted Wald and the Wiener Distribution as Measurement Models for Choice Response Times - PubMed (nih.gov).

Hence, I couldn’t figure out how to sample from the shifted wald or censored shifted wald distribution. E.g., While sampling from pm.Wald distribution to mimic shifted wald, should I add back the non-decision time to the mean or to the drawn sample?
And it becomes a bit more complicated for me for the censored shifted wald distribution.

Thanks for making the notebook @ricardoV94.

My understanding was that Wald distribution is the inverseGaussian distribution and PyMC has Wald distribution. This distribution has several possible parameterizations. The shiftedWald implementation you proposed in the notebook, as I understand, is another parameterization of the Wald distribution.

In shifted wald, there is an additional parameter, theta, that get’s substracted from the observed data. If you may be interested, equation 4 in this paper mentions the shifted Wald equation Psychological interpretation of the ex-Gaussian and shifted Wald parameters: A diffusion model analysis | SpringerLink.

Also, thank you for creating the request to implement the new distribution. If the community finds it interesting, I would also request to look into the Wiener distribution. This is a heavily studied distribution in decision-making. Stan-Wiener has an implementation of the Wiener distribution. Also, the HDDM package built on an older version of PyMC has this distribution, but it’s not there in the latest PyMC. I tried to replicate Stan and HDDM code for Wiener in the latest PyMC but I could never bring it below a rhat of 2-4 for several of my datasets, have been trying since the beginning of this whole year :slight_smile:.

:man_facepalming: If there’s a Wald then your life is much easier. Just jump to the step where I created the shifted wald and use pm.Wald directly.

Actually the Wald defined in PyMC already has a shift parameter as well, so you shouldn’t need to implement any distribution at all? Just use pm.Censored with pm.Wald

https://www.pymc.io/projects/docs/en/stable/api/distributions/generated/pymc.Wald.html

https://www.pymc.io/projects/docs/en/latest/api/distributions/censored.html

To answer your original general question, my scattered replies still apply.

Many times PyMC has the distribution or the building blocks needed to create the distribution. This may require using meta distributions like pm.Censored or pm.Mixture and sometimes these need building blocks that are not provided by default but are trivial, in which case you can just define them inside a CustomDist.

For example you can add a shift parameter to any existing distribution in PyMC (even one that already has one) via a CustomDist and then use that inside a Censored or Mixture class.

The point being PyMC will be able to give you both logps and random draws without too much boilerplate code on your part (and more importantly less likely to have bugs).

In the cases where this is not possible you can still use mcmc to take random draws from the prior from it’s logp. This should work well for distributions that don’t have hard discontinuities in the logp (which Censored distributions coincidentally do).

Posterior predictive is the same, except computationally may be too expensive.

Or you implement the rng function of course.

I don’t know any other way to get random draws.

I didn’t know about pm.Censored. That will be quite helpful. :slight_smile:.

Thanks for all the discussion. This is quite helpful. I will look more into the MCMC way of getting the samples whenever I cannot use existing distributions.