Hello,
I am fairly new to pymc3 and I have been using it to model geochemical data.
Overall, my current model looks like:
with pm.Model() as model2:
data_arr = geochem_df.iloc[row].to_numpy() b_eq = data_arr * 100 x_v = [] def create_priors(tup): mu = tup[1] sigma = tup[2] name = tup[0] return pm.TruncatedNormal(name, mu, sigma,lower=0,upper=100) for z in range(len(tup_main)): x_v.append(create_priors(tup_main[z])) x = tt.as_tensor_variable(list(x_v)) CP = pm.Deterministic('CP_val',tt.sum(x*-CP_arr[:-15])) pm.Cauchy('CP',alpha=eet,beta=1,observed=CP) total = pm.Deterministic('total',tt.sum(x)) total_cap = pm.Cauchy('total_cap',alpha=100,beta=.1,observed=total) eq = pm.Deterministic('eq_val',tt.dot(a_eq[:,:-15],x)) diff = pm.Deterministic('diff',eq-b_eq) ## difference pm.Cauchy('eq',alpha=0,beta=1,observed=diff,shape=data_arr.shape[0]) ## diff distribution trace2 = pm.sample(1500,tune=500)
The commented lines intend to assert the difference between calculated and observed as a cauchy distribution centred at zero (in order to minimize the difference). This model takes about 10 minutes to run per row in my data, but there’s cases where we have about 10 000 rows, and for each we intend to run the sampler each time, because there’s different priors for each data row. I wonder if we set the difference to be a range of values, say between [0:5], if the sampler would speed up? Does anyone know how to do it properly? I’ve run some experiments but it doesn’t seem that I am coding it properly, as it doesn’t give me sensible results.
Additionally, is it possible to “force” the sampler to run on all available cores? Say I have an AWS instance with 30 something cores, I can’t figure out how to set the sampler to use all cores.
Many thanks for any help!