Hello everyone,
I am new to pymc. I am stuck with a problem with my model and need help.
Context of data: protein intensities’ dataframe of shape (proteins x experimental trials). The goal is to model posterior distribution for true intensity level for each protein.
Context of chosen model: Flat priors for mu and k, as well as Jeffrey’s prior are set by the design of the project I’m working on (don’t ask why no hierarchical model, I wonder why as well).
Sampling from posterior fails: Large number of divergencies, tiny step sizes, >1000 grad evals and takes >1 hours to evaluate. Is it failing because of chosen priors? See the code below. Thank you for your help in advance.
def sample_params_lin_model(data, draws=2000, tune=1000, chains=4, cores=None,
target_accept=0.5, seed=42, progressbar=True):
data = np.where(data > 0, data, np.nan)
log_data = np.log2(data)
nprot, nrep =log_data.shape
i_idx, j_idx = np.where(~np.isnan(log_data))
obs_flat = log_data[i_idx, j_idx]
with pm.Model():
# Define mu. Halfflat to constrain on positive values only
mu = pm.HalfFlat("mu", shape=nprot)
# Jeffrey's prior is biasing the whole joint posterior
# For likelihood "sigma" is used. Std cannot be negative
log_sigma = pm.Flat("log_sigma", shape=nprot)
sigma = pm.Deterministic("sigma", pt.exp(log_sigma))
# Defining k factors
#log_k_free = pm.Flat("log_k_free", shape=nrep-1)
#log_k = pt.concatenate([log_k_free, -log_k_free.sum(keepdims=True)])
#pm.Deterministic("k", pt.exp(log_k))
log_k = pm.ZeroSumNormal("log_k", sigma=0.2, shape=nrep)
# Likelihood
pm.Normal("obs", mu=mu[i_idx] - log_k[j_idx],
sigma=sigma[i_idx],
observed=obs_flat)
# Sampling from posterior
idata = pm.sample(draws=draws, tune=tune, chains=chains, cores=cores,
target_accept=target_accept, seed=seed,
progressbar=progressbar)
post = idata.posterior.stack(sample=("chain", "draw"))
mu_s = post["mu"].values.T
sigma_s = post["sigma"].values.T
k_s = post["k"].values.T
return mu_s, sigma_s, k_s, idata

