Should I add pm.Deterministic to my model?

I am unsure if I should add pm.Deterministic to my model or not? Here is the code for my exisiting model:

with pm.Model() as newmodel:
    mu_alpha = pm.Normal("mu_alpha", mu=0, sigma=0.1)
    sigma_alpha = pm.HalfCauchy("sigma_alpha", beta=0.5)
    mu_beta = pm.Normal("mu_beta", mu=0, sigma=0.1)
    sigma_beta = pm.HalfCauchy("sigma_beta", beta=0.5)

    alpha = pm.Normal("intercept", mu=mu_alpha, sigma=sigma_alpha, shape=2)
    beta = pm.Normal("beta", mu=mu_beta, sigma=sigma_beta, shape=2)
    noise = pm.Exponential("noise", 10)

    mu_obs = beta[cat_1_data] * x + alpha[cat_2_data]

    obs = pm.Normal("obs", mu=mu_obs, sigma=noise, observed=observed_data)

    idata = pm.sample_prior_predictive(samples=50, random_seed=rng)

    az.plot_ppc(idata, group='prior', observed=True)
    idata.extend(pm.sample(6000, tune=2000, random_seed=rng))
    az.plot_trace(idata)
   
    pm.sample_posterior_predictive(idata, extend_inferencedata=True, random_seed=rng)
    az.plot_ppc(idata, num_pp_samples=500, group = "posterior")
    print(pm.summary(idata))

Am I supposed to add following line to my code as well?

mu_obs_det = pm.Deterministic("mu_obs_det", mu_obs)
obs = pm.Normal("obs", mu=mu_obs_det, sigma=noise, observed=observed_data)

Apologies for the newbie question!

I am currently seeking help with above model. The posterior predictive checks show that my model is not fitting the bserved data well and I need to make some tweaks. If anyone has expertise in Bayesian model (and has little bit of extra time) and can offer further guidance, I would greatly appreciate it. Please feel free to DM me for more details. Thank you in advance for your help!

Deterministics are only needed when you want to look at them after sampling. Otherwise you can skip them and save memory/computation time. They are never required for a model to “behave correctly”

1 Like

In my case, I needed to use pm.Deterministic to track a new variable formed by equations which rely on my priors. As this new equation was not tracked initially (i.e the variable was not captured using pm.Deterministic), and this was ultimately my likelihood function, inference was not happening and the resulting posteriors were junk. Adding pm.Deterministic to keep track of the new variable solved the problem.

There is a good explanation of pm.Deterministic in the help files which made it clear for me.

There is no case where you need a Deterministic for inference to work. It simply records the intermediate values that would be computed anyway otherwise. You must have had some other error in your code without Deterministics

Strange.

Taking the help function example:

Indeed, PyMC allows for arbitrary combinations of random variables, for example in the case of a logistic regression

with pm.Model():
    alpha = pm.Normal("alpha", 0, 1)
    intercept = pm.Normal("intercept", 0, 1)
    p = pm.math.invlogit(alpha * x + intercept)
    outcome = pm.Bernoulli("outcome", p, observed=outcomes)

but doesn’t memorize the fact that the expression pm.math.invlogit(alpha * x + intercept) has been affected to the variable p. If the quantity p is important and one would like to track its value in the sampling trace, then one can use a deterministic node:

with pm.Model():
    alpha = pm.Normal("alpha", 0, 1)
    intercept = pm.Normal("intercept", 0, 1)
    p = pm.Deterministic("p", pm.math.invlogit(alpha * x + intercept))
    outcome = pm.Bernoulli("outcome", p, observed=outcomes)

These two models are strictly equivalent from a mathematical point of view. However, in the first case, the inference data will only contain values for the variables alpha, intercept and outcome. In the second, it will also contain sampled values of p for each of the observed points.

This example really helped in my efforts. Relating this to the example above, the value ‘p’ approximates how I calculated my likelihood, which was subsequently compared to an observation of similar size.

Without the deterministic aspect, the traces for posteriors show no convergence, the sampler was bouncing around the space allowed within the prior distributions. Adding a deterministic element solved it and made sense to me at the time…

I’m saying you must have something else going on. It cannot be explained by adding/removing a Deterministic. Those two examples from the code will behave the same with or without the Deterministic

Ok cool - I will do some digging and see if I can uncover anything. Thanks!