Hello everyone.
I’m a junior data analyst using Bayesian modeling.
I have a question about the way choosing model and creating a more accurate model.
Now, I’m trying to use Gaussian Mixture modeling to derive the posterior distribution
the below is the status of the problem I’m facing.
Status
- the amount of the data I use as explanatory variable is 365 per variable
- the distribution of the target variable has a two peak, so I made two prior distribution per variable.
- the range of one posterior distribution ‘beta’ is about -20 to less than 20, and the mean is around zero
- about the other, the range is also around zero.
For this problem, I’ve just tried a Gaussian mixture regression model with pymc3.
the below is the code of this.
type or paste code here
with pm.Model() as model:
weight = pm.Dirichlet('weight', a=np.array([1, 1]), shape = (1, 2))
alpha = pm.Normal('alpha', mu = 0, sigma = 10, shape = (1, 2))
beta = pm.Normal('beta1', mu = 0, sigma = 10, shape = (1, 2))
sigma = pm.HalfNormal('sigma', sigma=10, shape=2)
mu = alpha + beta1 * X_1['last_year_pb_mean'].values[:, None]
pv_obs = pm.NormalMixture('pb_obs', w = weight, mu=mu, sigma=sigma, observed=np.log(PRTIMES['pb_rounded']))
trace = pm.sample(2000,tune= 1000,target_accept= 0.99, cores = 1)
pm.plot_trace(trace, compact=True)
pm.plot_posterior(trace, var_names=['beta'], hdi_prob=0.95)
y_1 = pm.sample_posterior_predictive(trace, samples=1000, model=model)
y_pred_1 = y_1['pb_obs']
pm.traceplot(trace)
but the result of this is so bad, the posterior is far from the actual model.(figure2, the left is posterior and the right is actual.)
I keeps searching some information, I can’t have find solution.
how can the problem be solved?
here is the results, figures and the waic of this model is 2.529136938170937
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
alpha[0, 0] | 0.262 | 9.994 | -18.358 | 18.348 | 0.174 | 0.158 | 3313.0 | 2881.0 | 1.0 |
alpha[0, 1] | 3.329 | 0.009 | 3.312 | 3.345 | 0.000 | 0.000 | 2765.0 | 2383.0 | 1.0 |
beta1[0, 0] | -0.139 | 9.907 | -17.974 | 18.936 | 0.190 | 0.161 | 2739.0 | 2448.0 | 1.0 |
beta1[0, 1] | 0.058 | 0.009 | 0.040 | 0.073 | 0.000 | 0.000 | 3536.0 | 2517.0 | 1.0 |
weight[0, 0] | 0.003 | 0.003 | 0.000 | 0.008 | 0.000 | 0.000 | 2776.0 | 1736.0 | 1.0 |
weight[0, 1] | 0.997 | 0.003 | 0.992 | 1.000 | 0.000 | 0.000 | 2776.0 | 1736.0 | 1.0 |
sigma[0] | 7.891 | 6.236 | 0.006 | 19.008 | 0.112 | 0.079 | 1480.0 | 654.0 | 1.0 |
sigma[1] | 0.171 | 0.006 | 0.159 | 0.183 | 0.000 | 0.000 | 3503.0 | 2466.0 | 1.0 |
figure0
figure1
figure2