Building a model where model parameter depends on independent data

#1

Hi,

I am trying to build a simple example model with the observed data X and Y, that looks like the following:

\mu \sim \mathcal{ N }(x,1)
y \sim \mathcal{ N }(\mu,1)

where y is the random variable that generated Y and x represents a value for the covariate X. How can I do this in PyMC3? The best I could come up with is the following:

## Generate data
X = np.linspace(0, 100, num=100)
MU = np.random.normal(loc=X)
Y = np.random.normal(loc=MU)

## Create model
with pm.Model() as basic_model:
    mu = pm.Normal('mu', mu=X, sd=1)
    y = pm.Normal('y', mu=mu, sd=1, observed=Y)

But that does not work, since arrays are not accepted as parameters for the distribution.

Any help is much appreciated. Thank you.

#2

Distributions do accept arrays as input, so your example should works.

A more formal way to do so is to model the covariates as coming from some distribution as well. You can have a look at this for an example: https://junpenglao.xyz/Blogs/posts/2017-10-23-OOS_missing.html

#3

Thanks for the example, I did not know that modeling covariates as coming from distributions is a valid way. However, with the code above, I get the following error:

Traceback (most recent call last):
File "/home/guet_jn/Desktop/modelbase/mb_modelbase/models_core/tests/create_PyMC3_testmodels.py", line 292, in <module>
mu = pm.Normal('mu', mu=X, sd=1)
File "/home/guet_jn/anaconda3/envs/modelbase/lib/python3.5/site-packages/pymc3/distributions/distribution.py", line 42, in __new__
return model.Var(name, dist, data, total_size)
File "/home/guet_jn/anaconda3/envs/modelbase/lib/python3.5/site-packages/pymc3/model.py", line 808, in Var
total_size=total_size, model=self)
File "/home/guet_jn/anaconda3/envs/modelbase/lib/python3.5/site-packages/pymc3/model.py", line 1208, in __init__
self.logp_elemwiset = distribution.logp(self)
File "/home/guet_jn/anaconda3/envs/modelbase/lib/python3.5/site-packages/pymc3/distributions/continuous.py", line 447, in logp
return bound((-tau * (value - mu)**2 + tt.log(tau / np.pi / 2.)) / 2.,
File "/home/guet_jn/anaconda3/envs/modelbase/lib/python3.5/site-packages/theano/tensor/var.py", line 147, in __sub__
return theano.tensor.basic.sub(self, other)
File "/home/guet_jn/anaconda3/envs/modelbase/lib/python3.5/site-packages/theano/gof/op.py", line 615, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/guet_jn/anaconda3/envs/modelbase/lib/python3.5/site-packages/theano/tensor/elemwise.py", line 482, in make_node
DimShuffle, *inputs)
File "/home/guet_jn/anaconda3/envs/modelbase/lib/python3.5/site-packages/theano/tensor/elemwise.py", line 438, in get_output_info
['x'] * difference + list(range(length)))(input))
File "/home/guet_jn/anaconda3/envs/modelbase/lib/python3.5/site-packages/theano/gof/op.py", line 625, in __call__
storage_map[ins] = [self._get_test_value(ins)]
File "/home/guet_jn/anaconda3/envs/modelbase/lib/python3.5/site-packages/theano/gof/op.py", line 562, in _get_test_value
ret = v.type.filter(v.tag.test_value)
File "/home/guet_jn/anaconda3/envs/modelbase/lib/python3.5/site-packages/theano/tensor/type.py", line 178, in filter
data.shape))
TypeError: For compute_test_value, one input test value does not have the requested type.

The error when converting the test value to that variable type:
Wrong number of dimensions: expected 0, got 1 with shape (100,).

Process finished with exit code 1

Any idea where this might come from? I thought this was because the X is not a scalar, that’s why I thought arrays are not accepted in distributions. Thanks for the help!