I am having difficulty implementing a model involving Fourier transforms and interpolation from a regular grid. I can get a model working in Theano that calculates the correct values and gradients, but when I go to put this in a PyMC3 model context and sample it, an error about dimension mismatch is raised ValueError: shapes (100,) and (8320,) not aligned: 100 (dim 0) != 8320 (dim 0).
The full code for this example is in a notebook here
I’ve repasted the relevant model declaration here:
with pm.Model() as model:
# the image-plane grid over which the sky model will be evaluated
# NOTE that these must be `fftshifted` already.
# add an extra dimension for the later packing into the rfft
alpha = theano.shared(XX[np.newaxis,:])
dalpha = abs(alpha[0,0,1] - alpha[0,0,0])
delta = theano.shared(YY[np.newaxis,:])
ddelta = abs(delta[0,1,0] - delta[0,0,0])
# Define the PyMC3 model parameters, which are just for the image plane model
a = pm.Uniform("a", lower=0.0, upper=10.0)
delta_alpha = pm.Uniform("delta_alpha", lower=-1*arcsec, upper=2*arcsec)
delta_delta = pm.Uniform("delta_delta", lower=-1*arcsec, upper=2*arcsec)
sigma_alpha = pm.Uniform("sigma_alpha", lower=0.5*arcsec, upper=1.5*arcsec)
sigma_delta = pm.Uniform("sigma_delta", lower=0.5*arcsec, upper=1.5*arcsec)
# Calculate the sky-plane model
I = a * tt.exp(-(alpha - delta_alpha)**2/(2 * sigma_alpha**2) - (delta - delta_delta)**2/(2 * sigma_delta**2))
# since the input coordinates were already shifted, then this is too
# I shape should be (1, N_dec, N_alpha)
# taper the image with the gridding correction function
# this should broadcast OK, since the trailing image dimensions match
corrfun = theano.shared(corrfun_mat)
I_tapered = I * corrfun
# use the FFT to transform the image sky model to the Fourier plane
# output from the RFFT is (1, N_delta, N_alpha//2 + 1, 2)
rfft = dalpha * ddelta * fft.rfft(I_tapered, norm=None)
# Store the interpolation matrices as theano shared variables, make sure it's sparse
C_real_sparse = theano.sparse.CSC(C_real.data, C_real.indices, C_real.indptr, C_real.shape)
C_imag_sparse = theano.sparse.CSC(C_imag.data, C_imag.indices, C_imag.indptr, C_imag.shape)
# flatten the RFFT output appropriately for the interpolation, taking the real and imag parts separately
vis_real = rfft[0, :, :, 0].flatten() # real values
vis_imag = rfft[0, :, :, 1].flatten() # imaginary values
# interpolate the RFFT to the baselines by a sparse matrix multiply
interp_real = theano.sparse.dot(C_real_sparse, vis_real)
interp_imag = theano.sparse.dot(C_imag_sparse, vis_imag)
# condition on the real and imaginary observations
pm.Normal("obs_real", mu=interp_real, sd=noise, observed=real_data)
pm.Normal("obs_imag", mu=interp_imag, sd=noise, observed=imag_data)
My setup comes from radio astronomy, where we are trying to infer the parameters of an astrophysical source on the sky (“the image plane”). However, radio interferometers measure the Fourier transform of the image plane model at discrete (u, v) spatial frequencies, which are called visibilities. So, fitting a sky-plane model to a radio astronomical dataset requires evaluating the image-plane model, using the FFT to get a dense measurement of the visbility plane, and then interpolating this dense measurement to the exact (u,v) locations of all the datapoints in your dataset. I’ve implemented this interpolation operation as a sparse matrix multiply. In the field it is common to use a convolutional kernel—it’s not too dissimilar from other smooth interpolation schemes that use the nearest ~few points in each dimension.
In this simple example I’ve created a fake dataset using a two-dimensional Gaussian with amplitude, x-location, y-location, x-width, and y-width. Then, I add noise to the visibilities and try to infer the model parameters.
So, for this example there are only the 5 model parameters of the 2D Gaussian. However, to evaluate the visibility model properly, I also need to define several constant grids that will be used in the calculation. First, there are the x,y coordinate values of the image plane model (e.g., grids of 128 x 128 pixels with x=0,1,2... y=0,1,2
. In this model they’re called alpha (RA) and delta (DEC) for astronomy conventions). Then there are the interpolation matrices, which have a number of rows equal to the number of datapoints (in this case 100) and a number of columns equal to the total number of points in the flattened RFFT output (in this case 8320). Both the image input grids and the interpolation matrices retain fixed values regardless of the values of the Gaussian model parameters. I suspect that how I am setting these variables up is causing me some error, so I could use some guidance on how best to implement “static” variables like these in a model setup.
I have a few questions
1.) I’m not yet sure if I need to declare the image plane coordinate grid and the interpolation matrices as theano shared variables, or if I can just leave them as plain numpy variables. Either way, I can create a full Theano function and PyMC3 model to calculate the model visibilities without problem. However, when it comes time to calculate the gradients of the model, I can only get the implementation using shared variables to calculate the gradient (see example here at the bottom), and only using Theano in a standalone fashion (meaning the model doesn’t function within a PyMC3 model context with HMC).
If I try to calculate the gradients of the model using the grid variables as plain numpy objects in a standalone fashion with Theano, I get an error
'You cannot drop a non-broadcastable dimension.', ((False, True, True), [])
2.) I’m vaguely suspicious that some of the errors may be related to my use of sparse matrices, or at least the handling of dimensionality between the image plane, the array packing required for the RFFT operation, and the array flattening required for the interpolation operation. These are the two dimensions that it quotes in the PyMC3 error. Is it possible that PyMC3 is trying calculate the gradient w.r.t. individual values of these fixed arrays, and running into trouble? I tried browsing through some of the gradient docs about considering variables constant, but I suppose this compounds my confusion about whether these unchanging grid variables need to be declared as Theano objects in the first place.
Thanks for any help!