Segmentation fault (core dumped) on running pm.sample in Ubuntu

This was my implementation of a model in PyMC3. I was trying to find out how long the program takes to run for different amount of observations. When I give around 3000 or 5000 observations, the program runs fine. When I increase it to around 7000 or above, it runs fine till the pm.sample() line. Upon reaching the sampling statement, the program gives me a segmentation fault (core dumped) error. Is this normal/how do I fix this? Also, why does this happen?

with pm.Model() as hotDINA:    
    # Priors: theta, bk, ak, learn_k, ones, ss_k, g_k
    theta   = pm.Normal('theta', mu=0.0, sd=1.0, shape=(I, 1))
    lambda0 = pm.Normal('lambda0', mu=0.0, sd=1.0, shape=(K, 1))    #bk
    lambda1 = pm.Uniform('lambda1', 0.0, 2.5, shape=(K, 1))    #ak
    learn   = pm.Beta('learn', alpha=1, beta=1, shape=(K, 1))
    ones    = pm.Bernoulli('known', p=1, shape=(K, 1))
    ss      = pm.Uniform('ss', 0.5, 1.0, shape=(K, 1))
    g       = pm.Uniform('g', 0, 0.5, shape=(K, 1))
    for i in range(I):
        print("STUDENT", i+1, " out of", I)
        
        # t = 0
        for k in range(K):
            prob[i][0][k] = pm.math.invlogit((1.7) * lambda1[k,0] * (theta[i,0] - lambda0[k,0]))
            alpha_name = 'alpha[' + str(i) + ',0,' + str(k) + ']'
            alpha[i][0][k] = pm.Bernoulli(alpha_name, prob[i][0][k])
            
        for s in range(MAXSKILLS):
            idx = int(idxY[i][0][s] - 1)
            if idx >= K: continue
            py[i][0][idx] = pow(ss[idx,0], alpha[i][0][idx]) * pow(g[idx,0], (1-alpha[i][0][idx]))
        
        # t = 1,2...T[i]-1
        for t in tqdm(range(1, T[i])):
            for k in range(K):
                alpha[i][t][k] = pm.math.switch(alpha[i][t-1][k], ones[k,0], learn[k,0])
                
            for s in range(MAXSKILLS):
                idx = int(idxY[i][t][s] - 1)
                if idx >= K: continue
                py[i][t][idx] = pow(ss[idx,0], alpha[i][t][idx]) * pow(g[idx,0], (1-alpha[i][t][idx]))
            
        for t in tqdm(range(T[i])):
            for s in range(MAXSKILLS):
                idx = int(idxY[i][t][s] - 1)
                if idx >= K: continue
                obsData = pm.Minibatch(observed_data[i][idx][t], batch_size=batch_size)
                Y[i][t][idx] = pm.Bernoulli(f'y_{i}_{t}_{idx}', p=py[i][t][idx], observed=obsData)

    start = time.time()
    print(start)
    trace = pm.sample(2500, tune=2500)
    end = time.time()
    print("TIME: ", end - start)
    pm.save_trace(trace=trace, directory=".pymc_1.trace", overwrite=True)
    print("SAVED")
    summary_df = pm.stats.summary(trace)
    summary_df.to_excel("summary.xlsx")
    print("TOTAL END: ", time.time() - total_start)

Does this have more to do with theano rather than pymc3?
Versions:

PyMC3 Version: 3.9.2
Theano Version: 1.0.4
Python Version: 3.7.4
Operating system: Ubuntu 16
How did you install PyMC3: pip

I’m not overly familiar with either theano or pymc3, but when I’ve seen segfaults in python it’s been because I have too much in memory - maybe check your memory usage

Yes, my RAM seems to be full. However, when I run this on the server by submitting a .job script, my .out file gives the final output for small amounts of data but doesnt give an output for even slightly larger data. It just keeps running forever (for a day, when I expect it to be completed in 2-3 hours max). But my .out file doesn’t contain a segfault error statement, is this normal if I’m using SLURM for Job scheduling? Anyway my guess is that it runs out of RAM too (when I request a single 128GB node). Going by your previous comment, does this mean it is supposed to work properly once I request more computational nodes (say 4 instead of 1)?

Out of my depth there, sorry. It looks like SLURM should throw an error if you use too much memory - you could try querying the memory usage with sacct, as mentioned in that link, to see what’s going on.

Are you able to vectorise the computations rather than use python for loops? Also try without tqdm.

The start may look something like:

I, K = 16, 4
with pm.Model() as hotDINA:    
    # Priors: theta, bk, ak, learn_k, ones, ss_k, g_k
    theta   = pm.Normal('theta', mu=0.0, sd=1.0, shape=(I, 1))
    lambda0 = pm.Normal('lambda0', mu=0.0, sd=1.0, shape=(K, 1))    #bk
    lambda1 = pm.Uniform('lambda1', 0.0, 2.5, shape=(K, 1))    #ak
    learn   = pm.Beta('learn', alpha=1, beta=1, shape=(K, 1))
    ones    = pm.Bernoulli('known', p=1, shape=(K, 1))
    ss      = pm.Uniform('ss', 0.5, 1.0, shape=(K, 1))
    g       = pm.Uniform('g', 0, 0.5, shape=(K, 1))
    
    prob = pm.math.invlogit((1.7) * lambda1.T * (theta - lambda0.T))
    alpha = pm.Bernoulli('alpha', p=prob, shape=(I, K))