Observed RVs of Different Lengths in a Linear Model

jvivian · November 2, 2018, 2:15am

First, thank you for the amazing work on PyMC3.

I’m trying to build a linear model where the independent variables are not the same length. Using the actual values or creating ObservedRVs gives me an Input dimension mis-match. As a workaround, a colleague suggested creating two parameters, one that is observed and one that isn’t that share hyper-parameters, then use the UnobservedRVs in the linear model. This works to some extent, as the trace for the UnobservedRV matches the sample_ppc for the ObservedRV, but I can’t get the hyper-parameters to converge and I get some pretty wonky traceplots.

Is this the correct way to go about building a linear model with different length independent variables? Is there another option I can try?

Here’s a snippet of how I’m sharing parameters between the RVs, I can provide the full code if that will help

with pm.Model() as model:
    # Priors for gene expression
    hyper_shape = len(genes) * len(datasets)
    hyper_sigma = pm.InverseGamma('hyper_sigma', 2.1, 1, shape=hyper_shape)
    hyper_mu = pm.Normal('hyper_mu', 0, 1, shape=hyper_shape)
        
    # RVs for Gene expression
    # X is observed and Y isn't, but they share priors
    X, Y = {}, {}
    i = 0
    for gene in genes:
        for name, dataset in datasets:
            key = f'{gene}-{name}'
            X[key] = pm.Normal(f'X-{key}', mu=hyper_mu[i], sd=hyper_sigma[i], observed=dataset[gene])
            Y[key] = pm.Normal(f'Y-{key}', mu=hyper_mu[i], sd=hyper_sigma[i])
    i += 1

I then use the UnobservedRV Y when building the linear model.

Here’s an example of a traceplot I get with just one independent variable.

junpenglao · November 2, 2018, 5:45pm

Usually the recommended way is to unpack/reshape the observed into a 1d vector:
Say your observed is

y = [[1, 2, 4, 6], 
     [0, 1],
     [5] ...]

you turn it into a vector, with the index:

y_new = [1, 2, 4, 6, 0, 1, 5, ...]
y_idx = [0, 0, 0, 0, 1, 1, 2, ...]

And in the model, you do:

pm.Normal('observed', hyper_mu[y_idx], sd=hyper_sigma[y_idx], observed=y_new)

Hope this helps.

jvivian · November 2, 2018, 5:51pm

Ah, thank you for the reply Junpeng. I should’ve realized that’s the method used in the radon GLM hierarchical model in the PyMC3 docs for county codes.

Topic		Replies	Views
Is This The Proper Way to Use Observed RVs of Different Lengths? Questions	4	706	January 8, 2019
Posterior and data mismatch in linear model with observed predictors Questions	0	471	August 13, 2020
Observation with different shape	4	515	June 16, 2022
Proper way to model several variables v5	4	1259	August 8, 2022
How do I define mulitple RVs with a joint liklihood? Questions modeling	2	721	February 9, 2022

Observed RVs of Different Lengths in a Linear Model

Related topics