When treating a small data, how to choose a model

Toma · June 1, 2023, 4:31am

Hello everyone.

I’m a junior data analyst using Bayesian modeling.
I have a question about the way choosing model and creating a more accurate model.

Now, I’m trying to use Gaussian Mixture modeling to derive the posterior distribution
the below is the status of the problem I’m facing.

Status

the amount of the data I use as explanatory variable is 365 per variable
the distribution of the target variable has a two peak, so I made two prior distribution per variable.
the range of one posterior distribution ‘beta’ is about -20 to less than 20, and the mean is around zero
about the other, the range is also around zero.

For this problem, I’ve just tried a Gaussian mixture regression model with pymc3.
the below is the code of this.

type or paste code here
with pm.Model() as model:
        weight = pm.Dirichlet('weight',  a=np.array([1, 1]), shape = (1, 2))
        alpha = pm.Normal('alpha', mu = 0, sigma = 10, shape = (1, 2))
        beta = pm.Normal('beta1', mu = 0, sigma = 10, shape = (1, 2))
        sigma = pm.HalfNormal('sigma', sigma=10, shape=2)
      
        mu = alpha + beta1 * X_1['last_year_pb_mean'].values[:, None]
    
        pv_obs = pm.NormalMixture('pb_obs', w = weight, mu=mu, sigma=sigma, observed=np.log(PRTIMES['pb_rounded']))
        trace = pm.sample(2000,tune= 1000,target_accept= 0.99, cores = 1)
        pm.plot_trace(trace, compact=True)
        pm.plot_posterior(trace, var_names=['beta'], hdi_prob=0.95)

        y_1 = pm.sample_posterior_predictive(trace, samples=1000, model=model)
        y_pred_1 = y_1['pb_obs']

pm.traceplot(trace)

but the result of this is so bad, the posterior is far from the actual model.(figure2, the left is posterior and the right is actual.)

I keeps searching some information, I can’t have find solution.

how can the problem be solved?

here is the results, figures and the waic of this model is 2.529136938170937

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[0, 0]	0.262	9.994	-18.358	18.348	0.174	0.158	3313.0	2881.0	1.0
alpha[0, 1]	3.329	0.009	3.312	3.345	0.000	0.000	2765.0	2383.0	1.0
beta1[0, 0]	-0.139	9.907	-17.974	18.936	0.190	0.161	2739.0	2448.0	1.0
beta1[0, 1]	0.058	0.009	0.040	0.073	0.000	0.000	3536.0	2517.0	1.0
weight[0, 0]	0.003	0.003	0.000	0.008	0.000	0.000	2776.0	1736.0	1.0
weight[0, 1]	0.997	0.003	0.992	1.000	0.000	0.000	2776.0	1736.0	1.0
sigma[0]	7.891	6.236	0.006	19.008	0.112	0.079	1480.0	654.0	1.0
sigma[1]	0.171	0.006	0.159	0.183	0.000	0.000	3503.0	2466.0	1.0

figure0

figure1

figure2

cluhmann · June 2, 2023, 3:03am

Welcome!

Do you have any simulated data to work with? It is often suggested to develop models (at least initially) using data where you have full control (or knowledge) over the underlying generative process.

Toma · June 2, 2023, 4:36am

Thank for answering!

No, I don’t have simulation data.

That looks a meaningful discovery for me.
I thank for your advice.

I’ll try to make this.
Then, if I’m troubled, can I ask you a question?

cluhmann · June 2, 2023, 1:58pm

If you don’t have the true, underlying model, then I’m not sure what to make of this statement.

Of course!

Topic		Replies	Views
Bayesian regression using Gaussian mixture model Questions	0	536	September 30, 2021
Mixture Model Dirichlet Questions	7	3034	June 1, 2018
Gaussian Mixture of regression Questions	5	4021	November 10, 2017
How to improve fit of Poisson Mixture Model in PyMC3? Questions	20	5038	May 24, 2018
Fitting Gaussian line using posterior trace samples to double peak gaussian data graph v5 gaussian_process , modeling	10	305	April 3, 2024

When treating a small data, how to choose a model

Related topics