I want to implement two-stage Bayesian regression. Where one models posterior feeds into the second model (i.e separate regression). It appears that PyMC only allows free random variables that are to be inferred or observed data and cannot take “fixed distributions” to be sampled as input.
What I want to achieve in the real-world problem:
My attempt at a solution (with a toy example):
In a previous related post @junpenglao suggested using a random stream to sample a “fixed/specified normal” distribution: [Prevent prior from updating? - #5 by mathiasp]
Following a similar approach, I created a custom aesara Op to sample from a fixed Kernal Density Estimate (simplified to t-distribution for a toy example) of the posterior from the first regression step. Here is a toy example that works to enforce a t-dist with loc=5,scale=1,df=10 :
import numpy as np import pymc as pm import scipy.stats as st import aesara.tensor as at class MyRV(at.Op):##custom op itypes = [at.dscalar] otypes = [at.dscalar] def perform(self,node, inputs_storage, output_storage): (dummy_tensor_var,) = inputs_storage#dont do anything with input output_storage=np.array(st.t.rvs(loc=5,scale=1,df=10)) #or could also be a KDE from posterior from regression step 1 e.g. #kde=st.gaussian_kde(samples) #output_storage=kde.resample(size) MyRV=MyRV()##create op srng= at.random.utils.RandomStream(1234) dummy_tensor_var=srng.normal(10, 1) bayesModel =pm.Model() with bayesModel as m: ##custom Op t-dist(or KDE) randomstream post_from_regresion_step1 = pm.Deterministic('post_from_regresion_step1',MyRV(dummy_tensor_var)) ##infered free variables sigma_likelihood = pm.HalfNormal('sigma_likelihood',1) mu=pm.Normal('mu',0,3) #some hierarchical function using our enforced custom distribution Hierachial_calc_result = pm.Deterministic('Hierachial_calc_result',mu-post_from_regresion_step1) #liklihood function with data centred at zero ## mu must therefore become equal to "post_from_regresion_step1" to minimise error y = pm.Normal('y', Hierachial_calc_result, sigma_likelihood, observed=[-0.2,-0.1,0.0,0.1,0.2] ) trace = pm.sample(draws=1*10**4,chains=1) pm.plot_trace(trace)
The graph of the toy example:
The trace of the toy example gives the expected result with our custom “enforced” t-distribution “post_from_regression_step_1” having a mean of 5 and std of 1. Further calculations in the hierarchical model based on these samples also work.
However, in the real-world Hierarchal model, the NUTS sampler fails with: “zero gradient error”.
Are we allowed to inject custom random samples in PyMC graph without NUTS knowing the logp and gradient info?
How can I input a fixed distribution from one regression stage into the next?