I am working with sensor data that has limited precision which I believe to be important to incorporate into my model. However, I’m struggling with how to implement this. I feel like this is really obvious and I’m just missing something.
I found this section in the Stan user guide: Measurement Error and Meta-Analysis but I’m having difficulty converting it to PyMC. They have two versions; the one that I think is closest to PyMC is
# y = (10, 10, 12, 11, 9)
data {
int<lower=0> N;
vector[N] y;
}
parameters {
real mu;
real<lower=0> sigma;
vector<lower=y-0.5, upper=y+0.5>[N] z;
}
model {
sigma ~ normal(0, 1);
z ~ normal(mu, sigma);
}
trying to convert this to PyMC, I was thinking something like this:
y = np.array([10, 10, 12, 11, 9])
with pm.Model() as rounding_error_model:
mu = pm.Flat("mu")
sigma = pm.HalfNormal('sigma', sigma=1)
z = pm.Normal('z', mu=mu, sigma=sigma, shape=y.shape)
lower = pm.math.floor(z)
upper = pm.math.ceil(z)
y_obs = pm.Uniform('y_obs', observed=y, lower=lower, upper=upper)
pm.sample()
but this won’t sample because the logp
of y_obs
will be -inf
. Are there any ways that I could rework this?
Thank you for your time!
PyMC can give you the probability of a rounded distribution directly. The equivalent to the first model in Stan is something like this (disclaimer: I didn’t run this code):
import pymc as pm
def rounded_normal_dist(mu, sigma, size):
return pm.math.round(pm.Normal.dist(mu, sigma, size=size))
with pm.Model() as m:
mu = pm.Flat("mu")
sigma = pm.HalfNormal("sigma")
y_obs = pm.CustomDist("y_obs", mu, sigma, dist=rounded_normal_dist, observed=[10, 10, 12, 11, 9])
Docstrings on CustomDist, and the dist
param specifically: pymc.CustomDist — PyMC v5.13.1 documentation
Otherwise, for the second approach is a bit harder to translate. I guess you can do something by specifying an interval transform to z between y±0.5, and you don’t need a likelihood. But I’m not 100% sure on what that model means exactly in Stan.
Yup… I figured it would be something simple like this. Thank you for your help!
By default, Stan samples uniformly on values that satisfy the constraint, so this
vector<lower=y-0.5, upper=y+0.5>[N] z;
implies the following in more generative terms,
for (n in 1:N) {
z[n] ~ uniform(y - 0.5, y + 0.5);
}
So it’s just saying the prior is that the value rounded to y
is uniformly distributed between y - 0.5
and y + 0.5
. This is assuming the only noise is measurement error and the rounding is to units.
1 Like