Hello there,
I am building a multivariate mixture model for predicting signal decay.
I need help predicting data with a dimension incompatible with the training model.
I am collecting data on signal reduction over time. I aim to predict the y value where the function is flattening given partial data. The training data are composed of 40 groups. In each group, we collected the signal redaction from 80-time points (1 to 80 sec). I reduced the time resolution tenfold for the test data (8-time points, 1,10,20…80) and from five samples (groups).
Eventually, the goal is to discriminate between two groups from which the samples originated, standard(*norm) vs. ab-normal (*phenotype).
Here is an example of the several groups:
Here is the model I am training on:
y \sim MVN(mu,S)
mu = a + e^{-t*c}
The t is a predictor factor of one dimension of 80 observations. The data contain 50 samples, where 40 are used for training and 10 for predicting. For testing, I will reduce the time resolution.
The code that I used is below:
id_ = len(np.unique(table.image_name_cls.values)) # 40 groups
vector_id_ = table.image_group.values # 40*80 (3200)
opening_opr = table.raius_list.values # 40*80
signal = table.precen_sig.values # 40 *80
with pm.Model() as Multivariate_varying_effect:
chol, Rho_, sigma_cafe = pm.LKJCholeskyCov("chol_cov", n=2, eta=2, sd_dist=pm.Exponential.dist(1.0), compute_corr=True)
a = pm.Normal("a", mu=5.0, sd=2.0) # prior for average intercept
b = pm.Normal("b", mu=-1.0, sd=0.5) # prior for average slope
ab_gren = pm.MvNormal("ab_gren", mu=tt.stack([a, b]), chol=chol, shape=(id_, 2)) # population of varying effects
# shape needs to be (number of id (40) 2) because we're getting back both a and b for each image
opening_opr = pm.Data("opening_opr",opening_opr)
y = pm.Data("y",signal)
mu = ab_gren[vector_id_, 0] + tt.exp(ab_gren[vector_id_, 1] * opening_opr) # linear model
sigma_within = pm.Exponential("sigma_within", 1.0) # prior stddev within image
signal = pm.Normal("signal", mu=mu, sigma=sigma_within, observed=y) # likelihood
trace_Multivariate_varying_effect = pm.sample(4000, tune=4000, target_accept=0.9)
First, I have tried to predict just ten groups of the testing samples without reducing the time resolution:
with Multivariate_varying_effect:
pm.set_data({"opening_opr":np.array(table_test_pm.raius_list), "y":np.array(table_test_pm.precen_sig)})
p_post = pm.sample_posterior_predictive(trace_Multivariate_varying_effect, random_seed=RANDOM_SEED)
I got this message:
ValueError: Input dimension mis-match.
I am a Bayesian newbie, so sorry for the perhaps shallow ability to communicate my question.
Hope it was sufficiently clear