Multilevel model: how to model outcome at the group level

Suppose I want to fit the following model to data:

Group level: Beta1 ~ Zeta1 X variable2

data level: y[i, t] ~ Intercept + Beta1[i] X [t] + Beta2 X variable1

Where [i] indexes person, [t] indexes time, and variable2 is a time invariant characteristic of a person that inform the growth rate of y[i, t] via Beta1.

Once I fit the model, I want to use it on new data. However, I will observe y[i,t], [t],[i] and variable1. What I really want to do is use this observed information to infer what variable2 is for a given person [i] at a given time [t].

I can set up the model and understand how to predict y[i,t]. How would I use the same model to predict variable2?

Place a prior distribution on variable2 within the model, pass it as a masked array so that PyMC3 knows that it is a missing observation and then run the sampler as before. Here’s an example of how it’s done for a multiple linear regression:

import numpy as np
import pymc3 as pm
import matplotlib.pyplot as lt
%matplotlib inline

# Specify number of individuals and covariates
n = 200
p = 2

# Create a fake dataset
true_x = np.random.randn(n,p)
true_intercept = 3
true_coefficients = np.asarray([1.,-2.])
y = true_intercept + np.dot(true_x,true_coefficients) + np.random.randn(n) * 0.1

# Create a mask to apply to fake dataset
mask = np.zeros([n,p],dtype=bool)
mask[0:5,0] = True
print('Missing values of x are:',true_x[0:5,0])
observed_x = np.ma.masked_array(data = true_x, mask=mask)

# Model with prior placed on 'x'
with pm.Model() as model:
    x = pm.Normal('x',observed = observed_x)
    intercept    = pm.Normal('intercept',sd=10)
    coefficients = pm.Normal('coefficients',sd=10,shape = p)
    mean = intercept + pm.math.dot(x,coefficients)
    response = pm.Normal('response',mu=mean,observed=y)

    trace = pm.sample()

# Visualize posterior estimates of missing data
pm.plot_posterior(trace,varnames=['x_missing'])```

Edit: I just realized that I misspecified the error variance in the PyMC3 model, but the code still works.

1 Like

I think that you could find gaussian mixture models really helpful for your problem. I imagine that zeta is a categorical variable, so it will be hard to sample it directly. I recommend you read this for a brief explanation on how to write a different parametrization of your model, called the marginalized parametrization, which would help you to draw samples from Variable2's posterior.

Thankyou lucian and ckrapu. I will try your suggestions and report back