Weak PYMC3 Estimates for Large Datasets

I generated a dataset from a known Weibull distribution: Weibull( alpha= A.SI^-n, beta) where A=1800, n=0.5, Beta=1.5, and SI=1000. Here is the link of the dataset (DF1). I tried to estimate the parameters of the distribution in a Bayesian analysis using PYMC3 and below is my code. The Bayesian estimates are very good when the size of the dataset is small (100 data points) but they get away from the true values when the dataset is larger (300 data points). For the larger dataset I tried to get better estimates by increasing number of samples to 10000, tune to 10000, and target_accept to 0.99 but the estimates did not significantly change and still were far from the true values. I was wondering if anyone knows how to define the parameters of the pm.sample() to get better estimates for the larger dataset?


ns1 =round(DF1['biased drops'],2)

def logp(SIs,ns1,SI):
    summ1 = 0
    for i in range(0,len(DF1)): 
        PDF = (B*nu**(B-1))/(A*SI**-n)**B
        R = np.exp(-(nu/(A*SI**-n))**B)
        logLik = (np.log ((PDF**F)*R))
        summ1 += logLik

with pm.Model() as model_ss1:

    MuB = pm.Uniform('MuB', lower=1, upper=3)
    SigmaB= pm.HalfNormal("SigmaB", 2/3)
    B = pm.Normal('B', mu=MuB, sigma=SigmaB)

    MuA = pm.Uniform('MuA', lower=400, upper=2000)
    SigmaA= pm.HalfNormal("SigmaA", 400)
    A = pm.Normal('A', mu=MuA, sigma=SigmaA)

    Mun = pm.Uniform('Mun', lower=0.2, upper=0.8)
    Sigman= pm.HalfNormal("Sigman", 0.16)
    n = pm.Normal('n', mu=Mun, sigma=Sigman) 

    y = pm.DensityDist('y', logp, observed={ 'SI': SI,'SIs': SIs.values.astype(int), 'ns1': ns1.values.astype(int)})
    trace_ss1 = pm.sample(1000, tune=1000, chains = 2)

Bi = pm.summary(trace_ss1, var_names=['B'])['mean'][0]
Ai = pm.summary(trace_ss1, var_names=['A'])['mean'][0]
ni = pm.summary(trace_ss1, var_names=['n'])['mean'][0]
az.plot_trace(trace_ss1, var_names=['B','A','n'])

Can you write your logp in Theano without the for-loop, i.e. vectorized?

Actually, I am not familiar with Theano. I don’t know how to do that.

Isn’t that constant and independent of the model parameters?

Also, if I understand correctly, both A and n here (but not SI) are parameters of the model. Are you sure the model is not degenerate (multiple pairs of A and n give the same result)?

A and n are the parameters of the model and SI is constant (SI=1000).
Actually, A and n are dependent and there are multiple pairs of A and n that give the same result. I bounded the priors to prevent selecting the wrong A and n values but it resulted in truncated posteriors that were not good. I am not sure how to deal with this problem.