PPC for data with known measurement uncertainty

Hi all,

I would like to fit some data with measurement X_obs, Y_obs, and known measurement error of Y_err,
and I tried to use ppc to check the my fitting results.

Here is my model

X_shared=shared(X_obs)
Y_err_shared=shared(Y_err)

with model:
    a = pm.HalfNormal('a', 6)
    b = pm.HalfNormal('b', 30)
    Y_max = pm.Uniform('Y_max', upper = X_shared * a + b, lower=0,shape=len(X_obs))
    model_y = pm.Normal('Model_Y', mu=real_Y, sd=Y_err_shared, observed=Y_obs)

The measurement error follows the Gaussian distribution, and the real Y follows the uniform distribution.
If I directly replace the shared variable X_shared and Y_err_shared and run sample_ppc,
it will show “incompatible shape” error.
As I found in pymc3 documents, this is probably because the shape of shared variable cannot be updated when the shared variable involves the shape of another variable (Y_max).

My first question is if there is an alternative way to run ppc on this model ?

I currently use a dumb solution that I add e.g. 1000 fake data points, with X and Y not likely to affect the posterior, to run the pm.sample. So I can have sufficient shape to put my predicted X, and I can just updated the shared variable with the same shape. But this solution would be useless for non-Uniform likelihood.

The other possible way is to use the Uniform convolve with Normal directly as likelihood, so I don’t need the Y_max variable. I am wondering if pymc3 provides some easier way to convolve two distribution? or we could only use a custom likelihood?

My third question is: The default sample_ppc only calculate the observed variable (model_y), but what I interested is Ymax. I tried to set vars=model.free_RVs in sample_ppc, but it only return something called “‘a_log__’”, “‘b_log__’”, and ‘Y_max_interval__’. I am wondering what is the correct way to ask sample_ppc to sample all variable?

Any suggestions would be appreciated.

Yes you are right, however i dont see why you need to specify the shape of Y_max, maybe try:

Y_max = pm.Uniform('Y_max', lower=0, upper = X_shared * a + b)
    

Sorry, maybe I didn’t say it clearly.
The measurement error is not a constant,
and here X_obs, Y_obs, and Y_err are all array with the same shape.
In model_y, I need to pass the individual measurement error (Y_err) value to each sample,
that’s why I need to use shape=len(X_obs).

If I use

Y_max = pm.Uniform('Y_max', lower=0, upper = X_shared * a + b)

the line will return the error message

TypeError: For compute_test_value, one input test value does not have the requested type.

The error when converting the test value to that variable type:
Wrong number of dimensions: expected 0, got 1 with shape (50,).

In addition, I also tried another model without the use of shape parameter:

with model 2:
    a = pm.HalfNormal('a', 6)
    b = pm.HalfNormal('b', 30)
    f = pm.Uniform('f', lower = 0, upper = 1)
    Y_max =  X_shared * a + b
    model_y = pm.Normal('Model_Y', mu= Y_max*f, sd=Y_err_shared, observed=Y_obs)

I tested both model 1 and 2 on generated data, which I know a = 3 is correct.
The previous Model 1 can return quite close answers a ~ 3; however, the model 2 always return a= 5 ~ 6.
I don’t really understand why the model 2 have such different results, but I guess it is probably related to the use of shape parameter.