Unable to find maximum with NUTS. Too many local maxima maybe?

I’m currently trying to sample from my model using the default NUTS sampler but I have problems finding the the maximum that I want (currently testing on known data).

Here’s my model:

amplitude_model = pymc3.Model()
with amplitude_model:
#     amplitude = pymc3.Uniform('amplitude', 1, 40) # uniform prior for A
    l = pymc3.Uniform('l', 0, 127) 
#     m = pymc3.Uniform('m', 0, 64)
    source_A = pointSourceExt_theano2(128,25,l,55) 

    vis_A = theano_RIME_simple(source_A, uvstack, 128) # calculate visibilites using RIME

    vis_A_flat = nonzeroElements(vis_A) # only need nonzero elements
    vis_A_flat = vis_A[vis.nonzero()[0],vis.nonzero()[1],:]
    
    vis_flat = vis[vis.nonzero()] # same here
    vis_flat_stack = np.stack((vis_flat.real,vis_flat.imag), axis=1)

    
    cov = 0.5*np.eye(vis_flat_stack.shape[0]*2,vis_flat_stack.shape[0]*2)


    sigma=0.1
    pymc3.MvNormal('visR', mu=vis_A_flat.flatten(), cov=sigma*cov ,observed=vis_flat_stack.flatten())#vis_A_flat.flatten()    

The purpose is to find the location of an point source in a sky model on the l-axis. The data points that I measure are called visibilities and they result from the fourier transform of the sky model.

My likelihood is a multivariate normal distribution with dimension 2*N, since I have N measurements in complex space.

The sampled posterior looks like this:
posteriot

My chains seem to freeze on local maxima. Any ideas how to fix this? The true maximum should be at 45.

Without a fully reproducible example, it’s challenging to figure out where the pathologies in this problem come from. Some thoughts:

  • Can you let sigma have a prior distribution like pm.HalfNormal('sigma', sd=0.1) instead?

  • If you are using a diagonal covariance matrix, you can just as easily use pm.Normal('visR', mu=vis_A_flat.flatten(), sigma=sigma ,observed=vis_flat_stack.flatten())) since it’s the same likelihood, just faster to evaluate.

  • Using the pm.Metropolis() backup as the sampler, can you get any of your chains to find the maximum? You could try using some large number of chains while sampling to test this out.

  • This is probably a parameterization issue for your model; issues like these are not terribly uncommon when a primarily physics-driven / engineering model is plugged into and MCMC framework - it tends to reveal when the model has sharp posterior gradients or has identifiability issues without the use of strong priors.

2 Likes

This actually solved my problem! I think what helped the most was switching from an MvNormal with a fixed value for sigma to and a Normal with a prior for sigma.