First, thank you for the amazing work on PyMC3.
I’m trying to build a linear model where the independent variables are not the same length. Using the actual values or creating ObservedRVs gives me an Input dimension mis-match
. As a workaround, a colleague suggested creating two parameters, one that is observed and one that isn’t that share hyper-parameters, then use the UnobservedRVs in the linear model. This works to some extent, as the trace for the UnobservedRV matches the sample_ppc for the ObservedRV, but I can’t get the hyper-parameters to converge and I get some pretty wonky traceplots.
Is this the correct way to go about building a linear model with different length independent variables? Is there another option I can try?
Here’s a snippet of how I’m sharing parameters between the RVs, I can provide the full code if that will help
with pm.Model() as model:
# Priors for gene expression
hyper_shape = len(genes) * len(datasets)
hyper_sigma = pm.InverseGamma('hyper_sigma', 2.1, 1, shape=hyper_shape)
hyper_mu = pm.Normal('hyper_mu', 0, 1, shape=hyper_shape)
# RVs for Gene expression
# X is observed and Y isn't, but they share priors
X, Y = {}, {}
i = 0
for gene in genes:
for name, dataset in datasets:
key = f'{gene}-{name}'
X[key] = pm.Normal(f'X-{key}', mu=hyper_mu[i], sd=hyper_sigma[i], observed=dataset[gene])
Y[key] = pm.Normal(f'Y-{key}', mu=hyper_mu[i], sd=hyper_sigma[i])
i += 1
I then use the UnobservedRV Y
when building the linear model.
Here’s an example of a traceplot I get with just one independent variable.