Only one chain running, three are stuck

Hi all,

this problem is a bit of a strange one: When running my current model with pm.sample() only one of the four changes is exploring the posterior, while the other three are stuck at their initial values and don’t change.
About the model:
I am trying to infer parameters from a multidimensional dataset (4x2500) using a model I have written in PyTensor. A custom PyTensor function is part of the model. There are multiple combinations of parameters that could explain the data so the posterior may be rather flat at regions.

Everything runs fine, if I set chains=1.
Are there any things that could lead to this effect of only one chain running? Maybe I need to specify the parallel chains in the custom PyTensor function?

Any help or ideas are highly appreciated…

Update: This issue disappears when I use pm.Metropolis(). Therefore I think it is related to one of two things:
a) the ‘flatness’ of the posterior that makes the NUTS sampler struggle somehow
or
b) an error in the grad() function inside my custom PyTensor function, as that one won’t be used by the Metropolis-sampler.

For now I’ll stick with Metropolis, but if you have any other suggestions, I’d be happy to hear them.

Is this issue unique to this model? Do other models sample well with multiple chains?

Yes, it seems to be unique for this model with the custom pyTensor function.
Other models run fine with multiple chains.

Can you expand on that? Maybe share some code?

Yes, this function is the result of another Discourse post:

class RootFinder(Op):
        
    def __init__(self, t):
        
        self.t= t
        
    def make_node(self, D, a, b):

        outputs = [at.vector(dtype='float64')]
        return Apply(self, [D, a, b], outputs)
    
    def perform(self, node, inputs, outputs_storage):
        D, a, b = inputs
        
        outputs_storage[0][0] = root_finder_loop_perform(D, a, b, self.t)
    
    def grad(self, inputs, output_gradients):
        D, a, b = inputs
        x_list= self(D, a, b)
        
        x_grad_list = a*D/((a**2*b)/4 - D**2*x_list**2) + 2*a*D**3*x_list**2/((a**2*b)/4 - D**2*x_list**2)**2
        D_grad_list = 4*a*(a**2*b*x_list+ 4*D**2*x_list**3)/(a**2*b-4*D**2*x_list**2)**2
        a_grad_list = -4*D*x_list*(a**2*b + 4*D**2*x_list**2)/(a**2*b-4*D**2*x_list**2)**2
        b_grad_list = -4*a**3*D*x_list/(a**2*b - 4*D**2*x_list**2)**2
        
        grad_D = at.dot((-D_grad_list/x_grad_list), output_gradients[0])
        grad_a = at.dot((-a_grad_list/x_grad_list), output_gradients[0])
        grad_b = at.dot((-b_grad_list/x_grad_list), output_gradients[0])
       
        return grad_D, grad_a, grad_b

where, root_finder_loop_perform() is a function that returns the first n roots of a function. D, a and b are random scalar variables.The grad() function is expected to return three zero-dimensional outputs. I have confirmed that it works with grad_verify() and it runs smoothly with a single chain.

I then call the function via:

rootfinder = RootFinder(t=t)
x_list = rootfinder(D, a, b)

and use the n-dimensional x_list values in the next steps.
As I said previously, this works fine for a single chain, but doesn’t seem to like multiple chains.
Any help is highly appreciated!

What gets called in root_finder_loop_perform, and does it use multiple cores? There might be some multiprocessing scheduling issues resulting from each PyMC worker spawning lots of workers that don’t know about each other.

It calls scipy.optimize.shgo() to find the n roots of a function. Is that known to use multiple cores?

Very possibly?

1 Like

Does the problem persist when you set workers=1 on shgo? Might be barking up the wrong tree entirely, but that would be the test.

1 Like

Yes, I just tried that and it doesn’t solve the issue. But maybe the shgo algorithm is less effective with workers = 1 and hence lead to more problems. Maybe using pm.Metropolis() is the best way in this case…