Hello everyone!
I am having a bit of the problem. I am running a Hierarchical Bayesian task (Windows 10, pymc 5.10.3, Python 3.11), where the goal is to inveresely estimate a set of parameters that are predictors for arterial displacements, around 101 datapoints per subject in time. The displcements are modeled on the basis of a Gaussian Process defined in the custom op. Generally there are a lot of parameters possible, but for now I am working with subject and group-level Young’s modulus (E, independent) and mean shift of pressure and flow (shift_mvn), which are correlated (the correlation is based on the LKJ prior from: Tomi Capretto - Hierarchical modeling with the LKJ prior in PyMC). The model worked perfectly fine with just the Young’s modulus - it managed to estimate subject level and group level Young’s moduli very well with high ESS (more than 1k), Rhat ~ 1.01 for all params etc. However when I introduced the correlated pressure and flow something started to break down: when running on multiple (20) cores I got the ‘Pipe Broken Error’ at around 50% mark (running 1k tune, 3k samples, 4 chains):
Multiprocess sampling (4 chains in 6 jobs)
CompoundStep
NUTS: [mu_E, sigma_E, sig_S, Corr_triu, mu_S, sig]
CompoundStepMetropolis: [E]
Metropolis: [shift_mvn]
BrokenPipeError: [WinError 109] … EOFError…
Then out of desperation (and reading that error is possibly due to Windows’ way of handling multiprocessing) I got this after 2 chains were completed:
Sequential sampling (4 chains in 1 job)
CompoundStep
NUTS: [mu_E, sigma_E, sig_S, Corr_triu, mu_S, sig]
CompoundStepMetropolis: [E]
Metropolis: [shift_mvn]
|-----------| 4.34% [152/3500 3:48:28<83:52:36 Sampling chain 2, 0 divergences]
Fatal Python error: Aborted
Thread 0x00004ecc (most recent call first):
File “C:\Users\asinek\anaconda3\envs\pymc_env\Lib\site-packages\zmq\utils\garbage.py”, line 47 in run
File “C:\Users\asinek\anaconda3\envs\pymc_env\Lib\threading.py”, line 1045 in _bootstrap_inner
File “C:\Users\asinek\anaconda3\envs\pymc_env\Lib\threading.py”, line 1002 in _bootstrap
Main thread:
Thread 0x00002a68 (most recent call first):
The graph for the model as it is looks like this:
Sorry for the long setup. My question is: by looking at the problem description and the graph, can someone tell me that there is an error in my model (most likely related to the way I handle the correlation matrix prior of parameter shfit_mvn; it’s my primary suspect as it’s my first time doing that and the model worked perfectly fine before, when it was only E). Are there other possibilities for the failure? That’s what I want to know before I install Linux
Thanks for anyone’s input in advance!
Cheers,
Alex
####### CODE BELOW #############
The general code for the model and custom op is provided below, although I can set it up on github if someone really wants to dig in:
if __name__ == '__main__': #since I'm on windows
with pm.Model(coords=coords) as model: #coords is a dict that assigns indices to subjects
# Subjects
exp_index = pm.Data("exp_idx",exp_idx ,dims = "obs_id", mutable=False)
#nomalized priors
##Youngs Modulus
mu_E = pm.Normal("mu_E", mu = 0, sigma = 1, initval=0)
sigma_E = pm.HalfNormal('sigma_E', sigma = 1)
##Joint distribution parameters
S_sigma = pm.HalfNormal('sig_S',sigma = np.array([1,1]))
Corr_triu = pm.LKJCorr('Corr_triu',eta=10,n=2)
Corr = pm.Deterministic('Corr',pytensor.tensor.extra_ops.fill_diagonal(Corr_triu[np.zeros((2,2),dtype=np.int64)],1))
S_diagonal = pm.Deterministic("sig_diag", pytensor.tensor.eye(2) * S_sigma)
Cov = pm.Deterministic('Cov',pytensor.tensor.nlinalg.matrix_dot(S_diagonal, Corr, S_diagonal))
L = pm.Deterministic("L",pytensor.tensor.slinalg.cholesky(Cov))
mu_S = pm.Normal('mu_S',mu = np.array([0,0]), sigma = np.array([1,1]), initval = np.array([0,0]) )
E = pm.Normal("E", mu=mu_E,sigma=sigma_E,dims='exp', initval = theta_scaled[:,0])
#theta scaled are just randomly generated samples from similar distributions
#to start somewhere reasonable
shift_Vp = pm.MvNormal('shift_mvn', mu = mu_S, chol = L,dims=('exp','corrs'), initval=theta_scaled[:,1:])
sigma = pm.HalfNormal('sig',1)
new_inputs =pt.math.stack([E[exp_index], shift_Vp[exp_index,0],shift_Vp[exp_index,1]],axis=1)#.reshape([10,2])
#Gaussian Process predictions
test = pm.Deterministic('gp_sol',pytensor_forward_model_matrix(new_inputs ))
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=test, sigma=sigma, observed=simulated_data)
trace_M = pm.sample(tune=1000, draws=3000, chains = 4, cores=20)
Custom op is defined in a separate script file (otherwise it doesn’t work):
#these are just some local variables to scale & rescale the data for the Gaussian Process
COV_MX = np.array([[0.0004,-0.04],[-0.04,100]])
mean = np.array([0.2,90])
L_std = np.linalg.cholesky(COV_MX)
L_inv = np.linalg.inv(L_std)
@as_op(itypes=[pt.dmatrix], otypes=[pt.dmatrix])
def pytensor_forward_model_matrix(inputs):
# given that parameter values repeat with time for a given case, get a unique number of subjects
unique = np.unique(inputs,axis=0)
#this is a helper function, everything's ok here
f_temp_mcmc, p_temp_mcmc= gsmc.BC_generator_MCMC(flow_p,pres_p, unique ,plot = False)
#same as here
train = gsmc.design_IP_multiple(f_temp_mcmc, unique)
#rescale the sampled inputs from stand'd distributions
train[:,1] = train[:,1]*200_000 + 700_000 #E
train[:,2:] = gsmc.inv_std_mvn(train[:,2:], mean, L_std) #mvn pressure and flow
#copy the generated inputs for a GP
X_input_mcmc_scaled = copy.deepcopy(train)
# column 0 is for time
# scale the inputs using the scaler on which the GP was trained
X_input_mcmc_scaled[:,1:]= gs.transform_data(X_input_mcmc_scaled[:,1:], scaler_for_mcmc)
#return the predictions
gp_solution, _ = m_sgpr_mcmc.predict_y(X_input_mcmc_scaled,full_cov=False)
return gp_solution.numpy()