This was my implementation of a model in PyMC3. I was trying to find out how long the program takes to run for different amount of observations. When I give around 3000 or 5000 observations, the program runs fine. When I increase it to around 7000 or above, it runs fine till the pm.sample()
line. Upon reaching the sampling statement, the program gives me a segmentation fault (core dumped) error. Is this normal/how do I fix this? Also, why does this happen?
with pm.Model() as hotDINA:
# Priors: theta, bk, ak, learn_k, ones, ss_k, g_k
theta = pm.Normal('theta', mu=0.0, sd=1.0, shape=(I, 1))
lambda0 = pm.Normal('lambda0', mu=0.0, sd=1.0, shape=(K, 1)) #bk
lambda1 = pm.Uniform('lambda1', 0.0, 2.5, shape=(K, 1)) #ak
learn = pm.Beta('learn', alpha=1, beta=1, shape=(K, 1))
ones = pm.Bernoulli('known', p=1, shape=(K, 1))
ss = pm.Uniform('ss', 0.5, 1.0, shape=(K, 1))
g = pm.Uniform('g', 0, 0.5, shape=(K, 1))
for i in range(I):
print("STUDENT", i+1, " out of", I)
# t = 0
for k in range(K):
prob[i][0][k] = pm.math.invlogit((1.7) * lambda1[k,0] * (theta[i,0] - lambda0[k,0]))
alpha_name = 'alpha[' + str(i) + ',0,' + str(k) + ']'
alpha[i][0][k] = pm.Bernoulli(alpha_name, prob[i][0][k])
for s in range(MAXSKILLS):
idx = int(idxY[i][0][s] - 1)
if idx >= K: continue
py[i][0][idx] = pow(ss[idx,0], alpha[i][0][idx]) * pow(g[idx,0], (1-alpha[i][0][idx]))
# t = 1,2...T[i]-1
for t in tqdm(range(1, T[i])):
for k in range(K):
alpha[i][t][k] = pm.math.switch(alpha[i][t-1][k], ones[k,0], learn[k,0])
for s in range(MAXSKILLS):
idx = int(idxY[i][t][s] - 1)
if idx >= K: continue
py[i][t][idx] = pow(ss[idx,0], alpha[i][t][idx]) * pow(g[idx,0], (1-alpha[i][t][idx]))
for t in tqdm(range(T[i])):
for s in range(MAXSKILLS):
idx = int(idxY[i][t][s] - 1)
if idx >= K: continue
obsData = pm.Minibatch(observed_data[i][idx][t], batch_size=batch_size)
Y[i][t][idx] = pm.Bernoulli(f'y_{i}_{t}_{idx}', p=py[i][t][idx], observed=obsData)
start = time.time()
print(start)
trace = pm.sample(2500, tune=2500)
end = time.time()
print("TIME: ", end - start)
pm.save_trace(trace=trace, directory=".pymc_1.trace", overwrite=True)
print("SAVED")
summary_df = pm.stats.summary(trace)
summary_df.to_excel("summary.xlsx")
print("TOTAL END: ", time.time() - total_start)
Does this have more to do with theano rather than pymc3?
Versions:
PyMC3 Version: 3.9.2
Theano Version: 1.0.4
Python Version: 3.7.4
Operating system: Ubuntu 16
How did you install PyMC3: pip