Hi all,
I am currently implementing a Bayesian regression model to identify 57 unknown parameters.
My datasets consists of over 500000 observations.
I order to identify the parameters using Bayesian regression with PyMC3, I created a design matrix A with over 500000 rows and 57 columns and an observation vector d with over 500000 rows.
At a first try, I implemented this using Matlab using conjugate priors and the results look promising (very similiar to the conventional least square solution, in fact).
Nonetheless, some of the identfied parameters are physically not feasible, as the conjugate assumtions rely simply on normal priors. Thus, I am starting with PyMC3 for this now.
As I have varying knowledge on the shape of some of the parameters, I placed different forms of priors on each of them, e.g. normal priors, negative bounded normal priors or inverse gamma priors.
I have the following issues:
-
I hope that the model is correct, as I stacked the different priors using
tt.stack
.
As the model is at least running, I hope that this is correct. -
As I have so many possible observations (taken from time series), I run in memory issues. Thus, I took only a very limited subset for now ( about 1000-10000 observation rows). Nonetheless, Matlab is able to process the whole dataset in only very few seconds with a ‘lasso’ model where the posterior is not analytically traceable, so I am wondering if PyMC3 can even handle such big datasets with so many parameters?
-
The model repeatedly crashes after 305 or 405 samples, even when I change the number of stochastic parameters or the number of observations. Any idea on why it crashes often at the same number of points?
-
When it crashes, the following error appears (:
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `p1`.ravel()[0] is zero.
The derivative of RV `p2`.ravel()[0] is zero.
The derivative of RV `p3_log__`.ravel()[0] is zero.
The derivative of RV `p4_log__`.ravel()[0] is zero.
...
I also tried using the MAP as start and also tried the initialization with advi+adapt_diag
.
- The sampling gets VERY slow after a few samples, even with only 1000 observations (about 1 sample in 1-5 sec). Is this normal with so many parameters? Or do you say that PyMC3 should be able to process so many parameters with so many observations on a regular laptop in reasonable time?
Unfortunately, I didn’t find any solution in any of the other posts yet to solve this.
In the following, I attached my (cleaned) code.
I’d be very grateful for any support on where my issue is!
data = sio.loadmat('measurements.mat')
A = data['A']
d= data['d']
with pm.Model() as model:
NegNormal = pm.Bound(pm.Normal, lower=-np.inf, upper=0)
p1 = pm.Normal('', mu=1.0, sigma=1.0)
p2 = NegNormal('', mu=-10.0, sigma=1.0)
p3 = NegNormal('', mu=-10.0, sigma=1.0)
p4= pm.Normal('', mu=1.0, sigma=1.0)
p5= pm.Normal('', mu=1.0, sigma=1.0)
p6= pm.InverseGamma('', alpha=3, beta=1)
p7= pm.InverseGamma('', alpha=3, beta=1)
p8= pm.InverseGamma('', alpha=3, beta=1)
p9= pm.InverseGamma('', alpha=3, beta=1)
p10= pm.InverseGamma('', alpha=3, beta=1)
p11= NegNormal('', mu=-10.0, sigma=1.0)
p12= NegNormal('', mu=-10.0, sigma=1.0)
p13= pm.Normal('', mu=1.0, sigma=1.0)
p14= pm.Normal('', mu=1.0, sigma=1.0)
p15= pm.Normal('', mu=1.0, sigma=1.0)
p16= pm.InverseGamma('', alpha=3, beta=1)
p17= pm.InverseGamma('', alpha=3, beta=1)
p18= pm.InverseGamma('', alpha=3, beta=1)
p19= pm.InverseGamma('', alpha=3, beta=1)
p20= pm.InverseGamma('', alpha=3, beta=1)
p21= NegNormal('', mu=-10.0, sigma=1.0)
p22= NegNormal('', mu=-10.0, sigma=1.0)
p23= pm.Normal('', mu=1.0, sigma=1.0)
p24= pm.Normal('', mu=1.0, sigma=1.0)
p25= pm.Normal('', mu=1.0, sigma=1.0)
p26= pm.InverseGamma('', alpha=3, beta=1)
p27= pm.InverseGamma('', alpha=3, beta=1)
p28= pm.InverseGamma('', alpha=3, beta=1)
p29= pm.InverseGamma('', alpha=3, beta=1)
p30= pm.InverseGamma('', alpha=3, beta=1)
p31= pm.InverseGamma('', alpha=3, beta=1)
p32= NegNormal('', mu=-10.0, sigma=1.0)
p33= NegNormal('', mu=-10.0, sigma=1.0)
p34= pm.Normal('', mu=1.0, sigma=1.0)
p35= pm.Normal('', mu=1.0, sigma=1.0)
p36= pm.Normal('', mu=1.0, sigma=1.0)
p37= pm.InverseGamma('', alpha=3, beta=1)
p38= pm.InverseGamma('', alpha=3, beta=1)
p39= pm.InverseGamma('', alpha=3, beta=1)
p40= pm.InverseGamma('', alpha=3, beta=1)
p41= pm.InverseGamma('', alpha=3, beta=1)
p42= pm.InverseGamma('', alpha=3, beta=1)
p43= NegNormal('', mu=-10.0, sigma=1.0)
p44= NegNormal('', mu=-10.0, sigma=1.0)
p45= pm.Normal('', mu=1.0, sigma=1.0)
p46= pm.Normal('', mu=1.0, sigma=1.0)
p47= pm.Normal('', mu=1.0, sigma=1.0)
p48= pm.InverseGamma('', alpha=3, beta=1)
p49= pm.InverseGamma('', alpha=3, beta=1)
p50= pm.InverseGamma('', alpha=3, beta=1)
p51= pm.InverseGamma('', alpha=3, beta=1)
p52= pm.InverseGamma('', alpha=3, beta=1)
p53= pm.InverseGamma('', alpha=3, beta=1)
p54= NegNormal('', mu=-10.0, sigma=1.0)
p55= NegNormal('', mu=-10.0, sigma=1.0)
p56= pm.Normal('', mu=1.0, sigma=1.0)
p57= pm.Normal('', mu=1.0, sigma=1.0)
beta = tt.stack([p1, p2, p3, \
p4, p5, p6, p7, p8, p9, p10, p11, p12, \
.... \
p56, p57 ])
e = pm.InverseGamma('e', alpha=3, beta=1)
d_est = pm.math.dot(A, beta)
likelihood = pm.Normal('d', mu=d_est, sigma=e, observed=d)
trace = pm.sample(1000, init = 'adapt_diag', progressbar=True, cores=1, chains=1)