Increasing sampling speed by changing distribution of difference between measured and calculated to interval

Hello,

I am fairly new to pymc3 and I have been using it to model geochemical data.
Overall, my current model looks like:

with pm.Model() as model2:

    data_arr = geochem_df.iloc[row].to_numpy()
    b_eq = data_arr * 100 
    
    x_v = []
    
    def create_priors(tup):
        mu = tup[1]
        sigma = tup[2]
        name = tup[0]
        return pm.TruncatedNormal(name, mu, sigma,lower=0,upper=100)
        
    for z in range(len(tup_main)):
        x_v.append(create_priors(tup_main[z]))
       
    x = tt.as_tensor_variable(list(x_v))

    CP = pm.Deterministic('CP_val',tt.sum(x*-CP_arr[:-15]))

    pm.Cauchy('CP',alpha=eet,beta=1,observed=CP)

    total = pm.Deterministic('total',tt.sum(x))

    total_cap = pm.Cauchy('total_cap',alpha=100,beta=.1,observed=total)
    
    eq = pm.Deterministic('eq_val',tt.dot(a_eq[:,:-15],x))
    
    diff = pm.Deterministic('diff',eq-b_eq) ## difference
    
    pm.Cauchy('eq',alpha=0,beta=1,observed=diff,shape=data_arr.shape[0]) ## diff distribution
    
    trace2 = pm.sample(1500,tune=500)

The commented lines intend to assert the difference between calculated and observed as a cauchy distribution centred at zero (in order to minimize the difference). This model takes about 10 minutes to run per row in my data, but there’s cases where we have about 10 000 rows, and for each we intend to run the sampler each time, because there’s different priors for each data row. I wonder if we set the difference to be a range of values, say between [0:5], if the sampler would speed up? Does anyone know how to do it properly? I’ve run some experiments but it doesn’t seem that I am coding it properly, as it doesn’t give me sensible results.

Additionally, is it possible to “force” the sampler to run on all available cores? Say I have an AWS instance with 30 something cores, I can’t figure out how to set the sampler to use all cores.

Many thanks for any help!

Vectorization may give you a little bit of a speedup:

x = pm.TruncatedNormal('priors', tup[1], tup[2], lower=0, upper=100)

It looks like you have a “soft” constraint on the total, and are trying to get the sampler to enforce it. This probably is causing a significant slow-down. It’s probably faster to rescale a la:

total_cap = pm.Cauchy('total_cap', alpha=100, beta=.1)
x_scaled = (x/tt.nlinalg.norm(x)) * total_cap

this will have a weird impact on your posteriors, since the x values are happy being quite large and getting rescaled; and if you don’t want that you may need to do something a little bit cleverer to constrain the total. But I suspect this potential is the source of the problem. You could, for instance, use Dirichlet to represent proportions of the total_cap; and use a pm.Deterministic('x_cap', total_cap * dirich_proportions) to pull out the values for analysis.

and it probably would help simplify the graph to do

pm.Cauchy('eq', alpha=b_eq, beta=1, observed=tt.dot(a_eq[:,:-15], x))