I would like to model censored data in pymc v4. I am aware of the pm.Censored() functionality; however, this doesn’t work for me for two reasons:

- I am working with multivariate data, and pm.Censored() only works for univariate distributions.
- Each of my data points are censored at different levels.

So, I am using Scipy library which helps with issue 1., because it can evaluate the logcdf of a multivariate distribution, and using pm.Potential() helps with issue 2.

However, the model seems to have issues estimating the parameters. I have done the usual checks and also:

- Finding the MLE’s for the equivalent frequentist model to check if the MLE’s are what I expect.
- Checking the prior predictive.

```
import arviz as az
import numpy as np
import pandas as pd
import pymc as pm
print(pm.__version__)
df = pd.read_csv("MRE.csv")
## construct (censored) likelihood function for each individual
def censored_log_likelihood_name(standardized, uncensored):
llh = pm.math.sum(pm.logp(pm.Normal.dist(0, 1), standardized[np.array(uncensored)]))
if np.sum(uncensored) != len(uncensored): ## if some times are censored
left_censored_name = (1-uncensored).astype(bool)
llh += pm.math.sum(pm.logcdf(pm.Normal.dist(0,1), standardized[np.array(left_censored_name)]))
return llh
def construct_model(df, fix_sigma=False, fix_mu = False):
IDs_idx, IDs = pd.factorize(df.full_name_computed)
coords = {
"IDs":IDs
}
## starting values if required
sigma_in = df.groupby("full_name_computed").Gaus.var().mean()
mu_in = df.groupby("full_name_computed").Gaus.mean()[IDs].values
with pm.Model(coords=coords) as model:
if fix_mu:
mu = mu_in
else:
a = pm.Normal("a", 2)
b = pm.Exponential("b", 1)
mu = pm.Normal("mu", a,b, dims = ["IDs"], initval = mu_in)
if fixed_sigma:
sigma = sigma_in
else:
tau = pm.Gamma("tau", alpha = 10, beta = 5)
sigma = pm.Deterministic("sigma",1/tau**2)
llh_name = []
for i, id_ in enumerate(IDs):
df_id = df[df.full_name_computed == id_]
standardized = (df_id.Gaus_censored.values - mu[i])/sigma ## standardize values to be Normal(0,1)
uncensored = df_id.uncensored
llh_name.append(censored_log_likelihood_name(standardized, uncensored))
llh = pm.Potential("llh", pm.math.sum(llh_name))
return model
```

The problem seems to be with estimating `sigma`

, the common variance for all individuals. I know this to be (empirically) `df.groupby("full_name_computed").Gaus.var().mean() = 0.664`

, however, when I run the above model, even fixing all the `mu`

parameters like so

```
model_fixmu = construct_model(df, False, True)
pm.model_to_graphviz(model_fixmu)
```

```
with model_fixmu:
model_fixmu_idata = pm.sample(chains=1, tune = 1000, draws = 1000)
```

The posterior estimate for `sigma`

is way different to that of both the prior and likelihood. See the posterior estimate below:

Whereas the prior for `sigma`

is given as:

And the MLE and empirical estimates of `sigma`

are `0.680`

and `0.664`

, respectively.

Can anyone explain how the model’s posterior estimate of `sigma`

is so poor, and how I can improve it?

Note: if I fix `sigma`

at the MLE using `model_fixsigma = construct_model(df,True, False)`

, the posterior estimates of `mu`

are bang on.

Many thanks,

Harry

p.s., I am aware there are ways I could simplify the model; however, for more complex models I will need this format.

MRE.csv (54.7 KB)