Sampling gets stuck with more than one core

Why using this following simple model:

with pm.Model() as linear_model:
    data_x = pm.Data('data_x', matrix_train)
    data_y = pm.Data('data_y', matrix_positive.flatten())
    
    beta = pm.Normal('beta', mu=0, sd=20, shape=nfeatures)
    sigma = pm.HalfNormal('sigma', sd=20)
    mu = pm.Deterministic("mu", pm.math.dot(data_x, beta))
    
    y_obs = pm.Normal('y', mu=mu, sd=sigma,
                       observed=data_y)

works with the following sampling:

trace = pm.sample(10000, tune=2000, cores=1)

But gets stuck with the following sampling:

trace = pm.sample(10000, tune=2000, cores=2)

Is this a known issue ? Seems to be related with multiprocessing. I’m using the master branch and MacOS.

Hi,
What do you mean by “gets stuck” exactly? Do you get an error message somewhere?

Nothing, just stuck. I’m using it on a jupyter notebook. It’s very weird because I don’t have this issue with other models. However for this model I have around 5k samples, maybe memory issues with 2 cores ? But then it would thrown an error or something.

Looks like memory issue to me as well - sometimes pm.math.dot has weird memory issue - could you try doing tt.sum(data_x * beta[None, ...], axis=-1)

3 Likes

Yeah, looks like memory issue – I think I remember that when using multiple cores, the data need to be actually kind of copied to each core.
To verify this hypothesis, you can try diminishing the number of samples you give your models, to see if this runs.
This won’t fix the issue though, contrary to Junpeng’s suggestion. Hope this helps :vulcan_salute:

2 Likes

It was the tt.mul indeed, so it seems to be related with memory issues. Thanks a lot @AlexAndorra and @junpenglao !

1 Like