Hey Guys,
I’m quite new to PyMC and have a question regarding the combination of different posterior distributions in PyMC3. I haven’t found a Thread which helped
The task seems to be easy: I have a distribution of experimental parameters and I would like infer certain properties about this distribution (e.g. mu and sigma). However there are certain beliefs about the parameters uncertainty. Each time I draw a new parameter for my data collection I do not know the source of its uncertainty. There are different environmental conditions as sources which may have caused them and therefore there are several distributions as uncertainty candidates. In my example I assume that all these candidates are just normal distributions with a mu_# and a sigma sd_#.
My idea was to model a number of distributions reflecting the belief of the parameter to take a certain value. After that I determine the posteriors given the data. I further combine all traced distributions and form (via interpolation) a new prior. This prior serves for the determination of a combined posterior which should give me the desired parameter distribution from which I should be able to “read of” my uncertainty. The followig excerpts are the major parts that make up the inference functionality.
Interpolate data set
def from_posterior(self, param, samples):
smin, smax = np.min(samples), np.max(samples)
width = smax - smin
x = np.linspace(smin, smax, 10000)
y = stats.gaussian_kde(samples)(x)
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
y = np.concatenate([[0], y, [0]])
return pm.Interpolated(param, x, y)
Tracer
def trace_model(self, param, mu_, sd_):
with pm.Model() as model:
mu = pm.Normal(param+'_mu', mu=mu_, sd=sd_)
sd = pm.HalfNormal(param+'_sd', sd=self.data_sd)
Y_obs = pm.Normal(param+'_obs', mu=mu, sigma=sd, observed=self.data)
step = pm.NUTS()
trace = pm.sample(1000, step=step)
ppc = pm.sample_posterior_predictive(trace, samples=1000)
return trace, np.asarray(ppc[param+'_obs'])
Program
def test(self):
print(f"Standard deviation: {self.data_sd}.")
print("Creating model...")
trace_1, ppc_1 = self.trace_model('dist_1', mu_1, sd_1)
trace_2, ppc_2 = self.trace_model('dist_2', mu_2, sd_2)
trace_3, ppc_3 = self.trace_model('dist_3', mu_3, sd_3)
data_comb = np.hstack((trace_1['dist_1'][:], trace_2['dist_2'][:], trace_3['dist_3']))
print(np.mean(data_comb))
print(np.std(data_comb))
# Combine data set and do second iteration with this data set as prior
with pm.Model() as model:
comb_mu = self.from_posterior('comb_mu', data_comb)
Y_comb = pm.Normal('Y_comb', mu=comb_mu, sigma=self.data_sd, observed=self.data)
step = pm.NUTS()
trace_comb = pm.sample(1000, step=step)
ppc = pm.sample_posterior_predictive(trace_comb, samples=1000)
_, ax = plt.subplots()
sns.distplot(data_comb, label='New prior distribution', ax=ax)
sns.distplot(trace_comb['comb_mu'][:], ax=ax, label='Combined posterior')
ax.legend()
plt.show()
return 0
I am not sure whether this is the right procedure fulfilling the needs of my use case.
Many thanks in advance for any help.
Greez, flex