Non-adaptable distribution from an other model posterior

Hi,

I saw on an other post that someone wanted to sample a variable from a non-adaptable distribution (ref). They use Theano, and I saw you switched for pytensor for pymc>5. I would like to define such a non-adaptable distribution using the posterior distribution of another model.

Some context on my study:

I am trying to implement Principal Stratification method using pymc, below is a simplified version. Model 1 predicts if a patient in the placebo arm will be alive at the end of the trial. Model 2 uses model 1 to estimate if treated patients would have been alive if they had been in the placebo arm. Then model 2 estimates the average of a score on the treated arm for patients that would have been alive if they were in the placebo arm.

with pm.Model() as model_1:
    
    # Input data
    x_p = pm.Data("x_p", df_placebo_t[columns_for_pred])
    alive_p = pm.Data("alive_p", df_placebo_t[["alive_obs"]])

    # Create the alpha and beta parameters and assume a normal distribution
    alpha_p = pm.Normal('alpha_p', mu=0.0, tau=non_informative_std)
    beta_p = pm.Normal('beta_p', mu=0.0, tau=non_informative_std, shape = len(columns_for_pred))

    # The probability of being alive is modeled as a logistic function
    p_p = pm.Deterministic('p_p', 1. / (1. + Tensor.exp(beta_p * x_p + alpha_p)))
    
    # Create the bernoulli parameter which uses observed data to inform the algorithm
    observed_p = pm.Bernoulli('obs', p_p, observed=alive_p) * w_p
    
    # Sample
    trace_placebo_survival = pm.sample(
                            n_samples,
                            tune=n_tune,
                            target_accept=0.99,
                            cores = n_cores,
                            random_seed=RANDOM_SEED,
                        )
    
with pm.Model() as model_2:
    
    # Input data
    x_treat = pm.Data("x_p", df_treated_t[columns_for_pred])
    y = pm.Data("y", df_placebo_t["score"])

    # Create the alpha and beta parameters and assume a normal distribution
    alpha_static = ???
    beta_static = ???
    mask = pm.Deterministic('mask', 1. / (1. + Tensor.exp(beta_p * x_treat + alpha_p)))

    # Assume distributions
    mean_trend = pm.Normal("mean_trend", 0, 1)
    sigma = pm.Normal("sigma", 1)
    average = pm.Potential('likelihood',pm.logp(pm.Normal.dist(mu=mean_trend, sigma=sigma),value)*mask)
                    
    # Sample
    trace_placebo_survival = pm.sample(
                            n_samples,
                            tune=n_tune,
                            target_accept=0.99,
                            cores = n_cores,
                            random_seed=RANDOM_SEED,
                        )

I hope this is clear… Note that I also found a way to use posterior as new prior (ref) but as the variable is included in the likelihood, it gets updated and I don’t want that to happen.

Many thanks for your help!

I suggest modelling everything together. People now and then ask for “frozen” distributions but I haven’t seen any work suggesting this is a good strategy.

If you really wanted you could implement a gibbs style sampler where one of the variables is updated following its own logic. This is however saying that new observations don’t tell you anything about the previously inferred variable, which is quite unnatural. It can also lead to random walk behavior because you won’t be updating everything in synch, and render sampling less efficient.

The approach you may have seen trying to put a random variable in the graph isn’t sound. For instance it would make the logp and gradient for the remaining sampler (presumably NUTS) stochastic, and that I would be quite more concerned about.

1 Like

Thanks ricardoV94 for your answer!

Indeed, my new observations don’t tell anything about the previously inferred variable: I am first inferring if you are dead, then, if you are not, I am estimating your quality of life. Your quality of life at time t does not help me to infer if you are alive at time t. But I should not infer your quality of life at time t if you are dead at time t. Your past quality of life could but then it’s another setup I guess.

Following the risk of wanting to estimate such a thing, I am not sure that I understand your point, but there are indeed existing warning regarding the property of what I am trying to estimate: “With a Bayesian approach, because of lack of point identification, the length of posterior intervals will not shrink to 0 as the sample size increases to infinity and the posterior will depend on the prior even as the sample size tends to infinity (Richardson et al., 2011).” (reference).

I ended up doing a loop sampling in the posterior of my first estimate to get one fixed value and inserting it in my new model, and it seems to give me the expected results. Anyway, thanks for your help!

Why not sample everything together?

I ended up doing a loop sampling in the posterior of my first estimate to get one fixed value

You’re discarding information because you didn’t get a single posterior value, you got a distribution

I guess this requires a bit of context on what I am doing. This is causal inference-related (what I am trying to estimate is Principal Stratification).

For a single patient, whom I observe at time t, I have two options: 0 = placebo, 1=treated. Then, still for my patient, she can be dead in placebo D_i,t(0)=1 or not D_i,t(0)=0 (similarly, D_i,t(1)=1 dead under treatment, D_i,t(1)=0 not dead under placebo). Then if D_i,t(0)=0 then Y_i,t(0) exist and if D_i,t(1)=0 then Y_i,t(1) exist. My aim is, if both values exist, to do Y_i,t(1)-Y_i,t(0).

Unfortunately, my patient was only observed in the arm placebo 0, but hopefully she didn’t die. So I have to first estimate if she dies under treatment 1, to make sure Y_i,t(1) exist then if it exist, I estimate Y_i,t(1) using all the values of the patients from arm 1 that I have predicted not to die in placebo arm Y_j,t(1) (otherwise they differ to much for the patient I am interested in who didn’t die in placebo).

Thus, to get the subsample of the observation I am interested in Y_j,t(1), I need the prediction of the death. But I need that subsample predictor not to be updated when I estimate the parameter values Y. Which I did not succeed in doing because the likelihood of Y depends on parameters of D through the subselection, which updates the parameters of D, and I don’t want that…

I am sorry, this might be a bit confusing…

At the end, I concatenate all the posteriors I got, and I use the distribution of the posterior as my posterior. Is that an issue?