Weak PYMC3 Estimates for Large Datasets

NeSha · January 5, 2022, 10:30pm

I generated a dataset from a known Weibull distribution: Weibull( alpha= A.SI^-n, beta) where A=1800, n=0.5, Beta=1.5, and SI=1000. Here is the link of the dataset (DF1). I tried to estimate the parameters of the distribution in a Bayesian analysis using PYMC3 and below is my code. The Bayesian estimates are very good when the size of the dataset is small (100 data points) but they get away from the true values when the dataset is larger (300 data points). For the larger dataset I tried to get better estimates by increasing number of samples to 10000, tune to 10000, and target_accept to 0.99 but the estimates did not significantly change and still were far from the true values. I was wondering if anyone knows how to define the parameters of the pm.sample() to get better estimates for the larger dataset?

SI=1000

ns1 =round(DF1['biased drops'],2)
SIs=round(DF1['SIs'],2)

def logp(SIs,ns1,SI):
    summ1 = 0
    for i in range(0,len(DF1)): 
        print(i)
        F=DF1['failure'][i] 
        nu=(ns1[i])*(SIs[i]/SI)**n
        PDF = (B*nu**(B-1))/(A*SI**-n)**B
        R = np.exp(-(nu/(A*SI**-n))**B)
        logLik = (np.log ((PDF**F)*R))
        summ1 += logLik
    return(summ1)

with pm.Model() as model_ss1:

    MuB = pm.Uniform('MuB', lower=1, upper=3)
    SigmaB= pm.HalfNormal("SigmaB", 2/3)
    B = pm.Normal('B', mu=MuB, sigma=SigmaB)

    MuA = pm.Uniform('MuA', lower=400, upper=2000)
    SigmaA= pm.HalfNormal("SigmaA", 400)
    A = pm.Normal('A', mu=MuA, sigma=SigmaA)

    Mun = pm.Uniform('Mun', lower=0.2, upper=0.8)
    Sigman= pm.HalfNormal("Sigman", 0.16)
    n = pm.Normal('n', mu=Mun, sigma=Sigman) 

    y = pm.DensityDist('y', logp, observed={ 'SI': SI,'SIs': SIs.values.astype(int), 'ns1': ns1.values.astype(int)})
    trace_ss1 = pm.sample(1000, tune=1000, chains = 2)

Bi = pm.summary(trace_ss1, var_names=['B'])['mean'][0]
Ai = pm.summary(trace_ss1, var_names=['A'])['mean'][0]
ni = pm.summary(trace_ss1, var_names=['n'])['mean'][0]
az.plot_trace(trace_ss1, var_names=['B','A','n'])

twiecki · January 6, 2022, 3:13am

Can you write your logp in Theano without the for-loop, i.e. vectorized?

NeSha · January 8, 2022, 10:59pm

Actually, I am not familiar with Theano. I don’t know how to do that.

OriolAbril · January 9, 2022, 1:03am

Isn’t that constant and independent of the model parameters?

Also, if I understand correctly, both A and n here (but not SI) are parameters of the model. Are you sure the model is not degenerate (multiple pairs of A and n give the same result)?

NeSha · January 9, 2022, 1:46am

A and n are the parameters of the model and SI is constant (SI=1000).
Actually, A and n are dependent and there are multiple pairs of A and n that give the same result. I bounded the priors to prevent selecting the wrong A and n values but it resulted in truncated posteriors that were not good. I am not sure how to deal with this problem.

Topic		Replies	Views
Custom Weibull distribution Questions	9	1889	April 11, 2019
Bayesian regression: crashes and performance issues with large datasets and many parameters with different priors? Questions	7	1635	January 27, 2021
PyMC3-Sampling not taking place for the entire range of values between the given bounds of a Uniform prior Questions	7	1599	January 14, 2018
Fitting with small number of data points Questions	5	2063	July 9, 2018
Fit Weibull to Data using Jeffreys Prior Questions	7	3810	March 28, 2018

Weak PYMC3 Estimates for Large Datasets

Related topics