Difficulty calculating model gradients, shared variables with static values, and sparse matrices


#1

I am having difficulty implementing a model involving Fourier transforms and interpolation from a regular grid. I can get a model working in Theano that calculates the correct values and gradients, but when I go to put this in a PyMC3 model context and sample it, an error about dimension mismatch is raised ValueError: shapes (100,) and (8320,) not aligned: 100 (dim 0) != 8320 (dim 0). The full code for this example is in a notebook here

I’ve repasted the relevant model declaration here:

with pm.Model() as model:

  # the image-plane grid over which the sky model will be evaluated
  # NOTE that these must be `fftshifted` already.
  # add an extra dimension for the later packing into the rfft
  alpha = theano.shared(XX[np.newaxis,:])
  dalpha = abs(alpha[0,0,1] - alpha[0,0,0])
  delta = theano.shared(YY[np.newaxis,:])
  ddelta = abs(delta[0,1,0] - delta[0,0,0])

  # Define the PyMC3 model parameters, which are just for the image plane model
  a = pm.Uniform("a", lower=0.0, upper=10.0)
  delta_alpha = pm.Uniform("delta_alpha", lower=-1*arcsec, upper=2*arcsec)
  delta_delta = pm.Uniform("delta_delta", lower=-1*arcsec, upper=2*arcsec)
  sigma_alpha = pm.Uniform("sigma_alpha", lower=0.5*arcsec, upper=1.5*arcsec)
  sigma_delta = pm.Uniform("sigma_delta", lower=0.5*arcsec, upper=1.5*arcsec)

  # Calculate the sky-plane model
  I = a * tt.exp(-(alpha - delta_alpha)**2/(2 * sigma_alpha**2) - (delta - delta_delta)**2/(2 * sigma_delta**2))
  # since the input coordinates were already shifted, then this is too
  # I shape should be (1, N_dec, N_alpha)

  # taper the image with the gridding correction function
  # this should broadcast OK, since the trailing image dimensions match
  corrfun = theano.shared(corrfun_mat)
  I_tapered = I * corrfun

  # use the FFT to transform the image sky model to the Fourier plane
  # output from the RFFT is (1, N_delta, N_alpha//2 + 1, 2)
  rfft = dalpha * ddelta * fft.rfft(I_tapered, norm=None)

  # Store the interpolation matrices as theano shared variables, make sure it's sparse
  C_real_sparse = theano.sparse.CSC(C_real.data, C_real.indices, C_real.indptr, C_real.shape)
  C_imag_sparse = theano.sparse.CSC(C_imag.data, C_imag.indices, C_imag.indptr, C_imag.shape)

  # flatten the RFFT output appropriately for the interpolation, taking the real and imag parts separately
  vis_real = rfft[0, :, :, 0].flatten() # real values
  vis_imag = rfft[0, :, :, 1].flatten() # imaginary values

  # interpolate the RFFT to the baselines by a sparse matrix multiply
  interp_real = theano.sparse.dot(C_real_sparse, vis_real)
  interp_imag = theano.sparse.dot(C_imag_sparse, vis_imag)

  # condition on the real and imaginary observations
  pm.Normal("obs_real", mu=interp_real, sd=noise, observed=real_data)
  pm.Normal("obs_imag", mu=interp_imag, sd=noise, observed=imag_data)

My setup comes from radio astronomy, where we are trying to infer the parameters of an astrophysical source on the sky (“the image plane”). However, radio interferometers measure the Fourier transform of the image plane model at discrete (u, v) spatial frequencies, which are called visibilities. So, fitting a sky-plane model to a radio astronomical dataset requires evaluating the image-plane model, using the FFT to get a dense measurement of the visbility plane, and then interpolating this dense measurement to the exact (u,v) locations of all the datapoints in your dataset. I’ve implemented this interpolation operation as a sparse matrix multiply. In the field it is common to use a convolutional kernel—it’s not too dissimilar from other smooth interpolation schemes that use the nearest ~few points in each dimension.

In this simple example I’ve created a fake dataset using a two-dimensional Gaussian with amplitude, x-location, y-location, x-width, and y-width. Then, I add noise to the visibilities and try to infer the model parameters.

So, for this example there are only the 5 model parameters of the 2D Gaussian. However, to evaluate the visibility model properly, I also need to define several constant grids that will be used in the calculation. First, there are the x,y coordinate values of the image plane model (e.g., grids of 128 x 128 pixels with x=0,1,2... y=0,1,2. In this model they’re called alpha (RA) and delta (DEC) for astronomy conventions). Then there are the interpolation matrices, which have a number of rows equal to the number of datapoints (in this case 100) and a number of columns equal to the total number of points in the flattened RFFT output (in this case 8320). Both the image input grids and the interpolation matrices retain fixed values regardless of the values of the Gaussian model parameters. I suspect that how I am setting these variables up is causing me some error, so I could use some guidance on how best to implement “static” variables like these in a model setup.

I have a few questions

1.) I’m not yet sure if I need to declare the image plane coordinate grid and the interpolation matrices as theano shared variables, or if I can just leave them as plain numpy variables. Either way, I can create a full Theano function and PyMC3 model to calculate the model visibilities without problem. However, when it comes time to calculate the gradients of the model, I can only get the implementation using shared variables to calculate the gradient (see example here at the bottom), and only using Theano in a standalone fashion (meaning the model doesn’t function within a PyMC3 model context with HMC).
If I try to calculate the gradients of the model using the grid variables as plain numpy objects in a standalone fashion with Theano, I get an error

'You cannot drop a non-broadcastable dimension.', ((False, True, True), [])

2.) I’m vaguely suspicious that some of the errors may be related to my use of sparse matrices, or at least the handling of dimensionality between the image plane, the array packing required for the RFFT operation, and the array flattening required for the interpolation operation. These are the two dimensions that it quotes in the PyMC3 error. Is it possible that PyMC3 is trying calculate the gradient w.r.t. individual values of these fixed arrays, and running into trouble? I tried browsing through some of the gradient docs about considering variables constant, but I suppose this compounds my confusion about whether these unchanging grid variables need to be declared as Theano objects in the first place.

Thanks for any help!


#2

I cant really run your code without gridding, maybe you can save the output matrix so others can reproduce your notebook easier.
On a highlevel, it is indeed likely related to sparse matrix. I remembered we discussed it here on this discourse a while ago - not sure what is the solution at the time.


#3

I think that your error comes from where you define the observations. interp_real and interp_imag are both arrays, from what I could follow, so your random variables should have a shape compatible with the supplied parameter mu. This means that you should add a shape=something to the pm.Normal("obs_...) RV creation. Regrettably, we can’t handle symbolic shapes, so shape=interp_real.shape wont work. You will have to write down the proper shape tuple that depends on the shapes of your shared arrays’ alpha and delta.


#4

Thank you @junpenglao and @lucianopaz for the replies!

I updated the notebooks to use saved npy files of the arrays produced by gridding, so there’s no dependency. The updated notebook “Vis2DPyMC3” and npy save files are in this directory here: https://github.com/iancze/million-points-of-light/tree/master/notebooks . The gridding file is also in the directory above that if you’re curious.

As far as the shape goes, I think that interp_real, interp_image, noise, real_data, and imag_data are all 1D vectors of length N_vis, which is 100 (confirmed via tt.printing statements). Note that I can actually get the sampler to run a few test samples with Metropolis Hastings, it’s when the calculation of the gradients are required that things start to go awry.

Thanks for your help!


#5

Dan Foreman-Mackey provided a solution to this problem, which turned out to stem from the way the sparse matrix dot operation was interacting with vector shape.

Changing lines in the model definition to add and then remove a dimension

# interpolate the RFFT to the baselines by a sparse matrix multiply
interp_real = theano.sparse.dot(C_real_sparse, vis_real[:, None])[:, 0]
interp_imag = theano.sparse.dot(C_imag_sparse, vis_imag[:, None])[:, 0]

now has everything working within the PyMC3 model context (before it only worked correctly in a Theano-only context). Thank you all for your help!