Sampling gets stuck with more than one core

perone · May 29, 2020, 2:15pm

Why using this following simple model:

with pm.Model() as linear_model:
    data_x = pm.Data('data_x', matrix_train)
    data_y = pm.Data('data_y', matrix_positive.flatten())
    
    beta = pm.Normal('beta', mu=0, sd=20, shape=nfeatures)
    sigma = pm.HalfNormal('sigma', sd=20)
    mu = pm.Deterministic("mu", pm.math.dot(data_x, beta))
    
    y_obs = pm.Normal('y', mu=mu, sd=sigma,
                       observed=data_y)

works with the following sampling:

trace = pm.sample(10000, tune=2000, cores=1)

But gets stuck with the following sampling:

trace = pm.sample(10000, tune=2000, cores=2)

Is this a known issue ? Seems to be related with multiprocessing. I’m using the master branch and MacOS.

AlexAndorra · May 29, 2020, 7:53pm

Hi,
What do you mean by “gets stuck” exactly? Do you get an error message somewhere?

perone · May 29, 2020, 8:09pm

Nothing, just stuck. I’m using it on a jupyter notebook. It’s very weird because I don’t have this issue with other models. However for this model I have around 5k samples, maybe memory issues with 2 cores ? But then it would thrown an error or something.

junpenglao · May 30, 2020, 6:46am

Looks like memory issue to me as well - sometimes pm.math.dot has weird memory issue - could you try doing tt.sum(data_x * beta[None, ...], axis=-1)

AlexAndorra · May 30, 2020, 10:35am

Yeah, looks like memory issue – I think I remember that when using multiple cores, the data need to be actually kind of copied to each core.
To verify this hypothesis, you can try diminishing the number of samples you give your models, to see if this runs.
This won’t fix the issue though, contrary to Junpeng’s suggestion. Hope this helps

perone · May 30, 2020, 10:40am

It was the tt.mul indeed, so it seems to be related with memory issues. Thanks a lot @AlexAndorra and @junpenglao !

Topic		Replies	Views
Pm.sample gets stuck after init with cores > 1 Questions	17	3949	January 4, 2021
Sampling hangs with multiple cores Questions	5	4044	May 21, 2020
Can not use more than 1 core Questions	1	1572	March 16, 2021
Sampling doesn't start when njobs > 1 for some models Questions	20	4771	January 17, 2020
Sample with multiple cores Questions	3	1475	September 10, 2020

Sampling gets stuck with more than one core

Related topics