I would not like to ask such questions unduly, but, despite my research and efforts, I would appreciate some advice validated by experience on how to optimally change priors to improve a model’s fit.
For the following sine wave data, I tried to fit a basic GP with a Periodic kernel, with all the results (under many possible sampler settings) producing good-looking posterior plots:
However, having generated the data, I knew that the estimate of period
, in the periodic kernel, should have been “around 6”, yet I consistently received traceplots that were multi-modal/possessed multiple peaks:
I ended up fixing this situation by specifying a tighter pm.HalfNormal()
prior on period
, and the following results seem to be good:
So my general question - that I would extremely appreciate any advice on - is two-fold:
- In which situations might parameters genuinely possess multiple peaks (apart from explicit mixture models), and, if multiple peaks are just a symptom of MCMC difficulties,
- what is the optimal heuristic on how to modify priors to get rid of multiple peaks? Is there even one?
For 2, I simply naively specified a prior closer to 0 and that ended up working: what if I accidentally made my prior more concentrated in the other direction, and the original secondary peak around 12.5 became the only one the chains picked up?
Code:
## Data generation
x = np.linspace(0,40, num=300)
noise1 = np.random.normal(0,0.3,300)
y = np.sin(x) + noise1
temp = x[150:]
noise2 = 0.004*temp**2 + np.random.normal(0,0.1,150)
y[150:] = y[150:] + noise2
true_line = np.sin(x)
true_line[150:] = true_line[150:] + 0.004*temp**2
x_sin = x[:150]
y_sin = y[:150]
X_sin = np.expand_dims(x, axis=1)
Y_sin = np.expand_dims(y, axis=1)
test_X_sin_1dim = np.linspace(-20,40,500)
test_X_sin_2dim = np.expand_dims(test_X_sin_1dim, axis=1)
plt.plot(x_sin, y_sin)
## Model with correct prior on period
with pm.Model() as gp_model:
## Yearly periodic component x long term trend
period = pm.HalfNormal("period", sd=5)
ℓ_psmooth = pm.Gamma("ℓ_psmooth ", alpha=4, beta=3)
cov_seasonal = pm.gp.cov.Periodic(1, period, ℓ_psmooth)
gp_seasonal = pm.gp.Marginal(cov_func=cov_seasonal)
σ = pm.HalfNormal("σ", sd=10, testval=5)
gp = gp_seasonal
y_ = gp.marginal_likelihood("y", X=x_sin[:, None], y=y_sin, noise = σ, shape=1)