Hi,
I’m trying to use the interpolated distribution provided by Pymc3 to fit observed data. I’m having a poor convergence.
For the sake of simplicity, I’m made a toy example to show the problem.
This is my dataset : a classic gaussian data.
import pymc3 as pm
import numpy as np
from scipy import stats
data = np.random.normal(size=10000) # default is mu=0, sd=1
To compute interpolated pdf, I use the following function found here : https://docs.pymc.io/notebooks/updating_priors.html
def from_posterior(samples):
smin, smax = np.min(samples), np.max(samples)
width = smax - smin
x = np.linspace(smin, smax, 100)
y = stats.gaussian_kde(samples)(x)
# what was never sampled should have a small probability but not 0,
# so we'll extend the domain and use linear approximation of density on it
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
y = np.concatenate([[0], y, [0]])
return pm.Interpolated.dist(x, y)
Then this is my model : (basically I’m trying to find the “hidden parameter” mu & sd)
with pm.Model() as model:
mu = pm.Normal('mu',mu=0,sd=5) # mu prior
sd = pm.HalfNormal('sd',sd=1) # sd prior
s = pm.Normal('s',mu=mu,sd=sd)
obs =from_posterior(data) # creating interpolated dist
pot = pm.Potential('pot', obs.logp(s).sum()) # using dist to create a "potential" in order to fit the normal distribution (and hoping to have an good estimate of mu & sd)
trace = pm.sample(draws=1000, tune=1000)
This is the sampling log :
And the trace plot :
These are quite bad estimation of mu & sd ! (and the mean of sd trace is around 0.8 instead of 1)
I say bad because my reference is the sampling of a model directly using a normal distribution from which I get far better results.
This is the ‘direct’ model :
with pm.Model() as model:
mu = pm.Normal('mu',mu=0,sd=5)
sd = pm.HalfNormal('sd',sd=1)
obs = pm.Normal('obs',mu=mu,sd=sd,observed=data)
trace = pm.sample(draws=1000, tune=1000)
This is the sampling log …
And the trace plot :
We see here far more accurate estimation of mu & sd !
Question : is the accuracy difference only because it’s an interpolated pdf ?
My guess is no, because I raised the number of samples to built a better interpolation, but it did’nt help.
I have no other clue, and augmenting ‘target_accept’ or tuning phase does not help. I’m certainly doing something wrong but I don’t know what…
Any help appreciated.