Hello everyone,
I’ve previously benefited from the expertise on this forum on modeling censored data using standard censored distributions in PyMC. I’m now working on a slightly more complex problem and would appreciate your insights.
I’m working on a model where the observed variable is derived from two measurements: an intensive property X_i (which is subject to censoring) and the mass analyzed Z_i (fully observed). I calculate the ratio Y_i=X_i/Z_i and assume that Y_i is lognormally distributed. However, since X_i scales with Z_i (doubling Z_i doubles X_i), I am unsure about the best approach to model X_i.
My current approach is to parametrize the relation as:
\log(X_i) = \log(Z_i) + \log(Y_i)
with
\log(Y_i) \sim \mathcal{N}(\mu, \sigma).
X_i follows a lognormal distribution with parameters \mu + \log(Z_i) and \sigma. To handle the censoring in X_i, I would use PyMC’s pm.Censored
.
This is a sketch of my proposed implementation:
import pymc as pm
import numpy as np
# Assume we have our observed data arrays:
# Zi: mass analyzed (fully observed)
# Xi: measured intensive property (with censoring)
# lower_limit: censoring threshold for Xi
with pm.Model() as model:
# Priors for the lognormal parameters for Yi
# These priors values are just for the example
mu = pm.Normal('mu', mu=0, sigma=2)
sigma = pm.HalfNormal('sigma', sigma=2)
log_Xi_expected = mu + np.log(Zi)
log_lower_limit = np.log(lower_limit)
log_Xi = pm.Censored('log_Xi',
pm.Normal.dist(mu=log_Xi_expected, sigma=sigma),
lower=log_lower_limit,
observed=np.log(Xi))
I’d appreciate any feedback or suggestions on this approach. In particular:
- Is this reparameterization and use of
pm.Censored
appropriate for handling the dependency between X_i and Z_i? - Are there any improvements or alternative strategies to better capture the censoring in X_i while considering the lognormal assumption on Y_i ?
- To assess the distribution of Y_i across different experimental settings, would you recommend imputing the censored values for X_i and then reconstructing Y_i, or use the posterior estimates of \mu and \sigma directly to construct the Y_i distribution?
Thank you in advance for your insights.
Best regards,
Andrea