Matrix-based predictions and complex variables

Hello there,

I am new to pymc and have an issue with a toy model with computes a value Z given a vector w and a matrix M.

Z = wM

Ultimately I would like this to be a complex expression

Z = jwM

however, I only need the absolute value of this expression and am quite happy to flatten the matrix so I have a single vector observation which I could compare to a similarly shaped likelihood function.

In the following code, I am trying to infer the values m1 and m2 in the matrix M. The model and inference run successfully (I have to separate the two as the sampler hangs forever if I don’t - any ideas why??) but the trace values indicate that the sample space is not being explored, with nearly identical values for m1, m2 and crazy high values for sigma.

This will ultimately form a basis for more complex models, but I hope this gets me over this current issue and gets me moving.

import pytensor
import numpy
import pytensor.tensor as pt
from pytensor import function
import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import arviz as az
from pymc import pytensorf
plt.close('all')

#Generate synthetic data
f = np.linspace(0.01, 10, 1000)         # frequencies for calculation
w = 2*np.pi*f                           # angular frequencies
m1_true = 1
m2_true = 2
M_true = np.array([[m1_true,0],[0,m2_true]])

#observation data Z = jwm
Z_true = np.zeros((2,2,len(f)))
for i in range(len(f)):
    Z_true[:,:,i] = np.abs(1 * w[i] * M_true)#.reshape(-1, 1)

#add noise to the observation   
n_std = 5 
Z_obs = np.abs(Z_true + np.random.normal(0, n_std, Z_true.shape))

# Create a PyMC3 model to sample masses and sigma
with pm.Model() as model:
    # Priors for unknown model parameters
    mass1 = pm.Uniform("mass1", 0, 5)
    mass2 = pm.Uniform("mass2", 0, 5)
    sigma = pm.HalfNormal("sigma", 10)
    
    #symbolic variables and matrix creation
    m1 = pt.dscalar('m1')
    m2 = pt.dscalar('m2')
    mass_mat = pt.stacklists([[mass1, 0], [0, mass2]])
    
    #loop step function over w to calculate/predict Z with priors
    def step(w, mass_mat):
        return np.abs(1* w *mass_mat)#/(1j*w_vals))
    
    # store results using scan instead of for looping
    results, _ = pytensor.scan(fn=step, sequences=[w], outputs_info=None, non_sequences=[mass_mat])
       
    # Likelihood term
    likelihood = likelihood = pm.Normal("likelihood", mu=results.flatten(), sigma=sigma, observed=Z_obs.flatten())

#%%
with model: # note - i have to implement this way otherwise the sampler hangs indefinitely
    trace = pm.sample(1000, tune=1000)

#%%
# Plotting
az.plot_trace(trace, combined = True)

Welcome!

The first thing that sticks out to me is that your model+priors seems to indicate that your observed data is nearly as likely to be <0 as it to be >0, despite your data being strictly positive. This, combined with the giant values of sigma in your posterior seem to suggest some sort of mis-specification to me. But that’s just an observation.

Also, your code works fine if I wrap the entire thing in a single model context. What version of PyMC and PyTensor are you using? If you are on windows, have you included a __name__ == "__main__"?

Hi Cluhmann, thank you for the warm welcome :slight_smile:

Appreciate the shout on the observation data. Correct in saying the values take a wide range. I have tweaked the zero-mean random noise value n_std to a much lower value. The problems persist in that the traces and distributions don’t make sense, particularly the way mass1 and mass2 seem to be bound in the trace to a narrow range, and the sigma values are way higher than stated in the HalfNormal definition.

The version of Pymc I am using is v5.8.2.

Regarding the point about wrapping into a single model context. I endeavored to run as a single model context (assuming I understood correctly - as that is new to me :thinking:).

After some googling etc, I have the following:

import pytensor
import numpy
import pytensor.tensor as pt
from pytensor import function
import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import arviz as az
from pymc import pytensorf

def generate_synthetic_data():
    # Generate synthetic data
    f = np.linspace(0.01, 10, 1000)  # frequencies for calculation
    w = 2 * np.pi * f  # angular frequencies
    m1_true = 1
    m2_true = 2
    M_true = np.array([[m1_true, 0], [0, m2_true]])

    # observation data Z = jwm
    Z_true = np.zeros((2, 2, len(f)))
    for i in range(len(f)):
        Z_true[:, :, i] = np.abs(1 * w[i] * M_true)

    # add noise to the observation
    n_std = 0.05
    Z_obs = np.abs(Z_true + np.random.normal(0, n_std, Z_true.shape))

    return w, Z_obs  # Return w and Z_obs for later use

def create_pymc3_model(w, Z_obs):
    # Create a PyMC3 model to sample masses and sigma
    with pm.Model() as model:
        # Priors for unknown model parameters
        mass1 = pm.Uniform("mass1", 0, 5)
        mass2 = pm.Uniform("mass2", 0, 5)
        sigma = pm.HalfNormal("sigma", 1)

        # symbolic variables and matrix creation
        m1 = pt.dscalar('m1')
        m2 = pt.dscalar('m2')
        mass_mat = pt.stacklists([[mass1, 0], [0, mass2]])

        # loop step function over w to calculate/predict Z with priors
        def step(w, mass_mat):
            return np.abs(1 * w * mass_mat)

        # store results using scan instead of for looping
        results, _ = pytensor.scan(fn=step, sequences=[w], outputs_info=None, non_sequences=[mass_mat])

        # Likelihood term
        likelihood = pm.Normal("likelihood", mu=results.flatten(), sigma=sigma, observed=Z_obs.flatten())

        # Run the MCMC sampling
        trace = pm.sample(1000, tune=1000)
        
        return trace 

def main():
    plt.close('all')
    
    #main section
    w, Z_obs = generate_synthetic_data()
    trace = create_pymc3_model(w, Z_obs)
   
    #plotting
    az.plot_trace(trace, combined = True)
    
if __name__ == "__main__":
    main()

This runs without crashing however the returned traces remain as per the figure above. I’d be grateful if you can spot anything or see any other issues.

:slight_smile: mmn.

My guess is that the issue has more to do with the form of your likelihood that the value of sigma. The likelihood suggests that your observations are normally distributed around the empirical mean. But your data doesn’t appear to be normally distributed (to me). Have you considered other possibilities?

Hi again,

I think I have resolved the issue. The scan function iterates over the first dimension, whereas my synthesised observation iterates over the third. Changing this in the observation resulted in correct inference of the values for mass1 (10), mass2 (20), and the standard deviation of additive noise (100). Lovely stuff, thank you for all the pointers.

In my original post, I mentioned I would like to extend this into using complex values in calculating the likelihood function (Z = jwm), however, I only require absolute values for Z i.e. ‘mu’. Despite this, it would seem that NUTS does not accept complex number use on account of gradient issues. Admittedly, I need to read up on this a bit more but I did see this post “Using PyMC3 in a model with complex variables inside” that seems to suggest this is a no-go as I have found.

Using Metropolis does navigate this issue for the toy example chosen. I do wonder though if you had any thoughts on a work-around whereby I use complex numbers but only take the absolute value - such that the NUTS sampler could be used. Something like this perhaps - Using a “black box” likelihood function?

Many thanks once again.
:slight_smile:

1 Like

Hi! I think that you will have problems using the absolute value too. The main issue is related to the fact that no internal variable inside the function can be complex.

As long as all the complex-valued operations happen inside Op that takes in a real and outputs a real, it shouldn’t be a problem. Be aware that you will have to define the gradients manually and implement those as well (the gradients should also be \mathbb R \mapsto \mathbb R)

Thanks for that ( jessegrabowski and verderis).

For manually defined gradients, this would be for NUTS sampling, right? I managed to get it working using MCMC-Metropolis-Hastings, though noticeably slower.

I’ve not defined a gradient manually myself before in pymc. Would something along the lines of this suffice → Using a “black box” likelihood function (numpy) — PyMC example gallery ?

If so, a steer as to how for force this into the NUTS sampler would be ideal.

Thanks,

mmn :slight_smile:

Yeah, gradients are mostly for NUTS. The example you linked is a great place to get started. I can also take a look if you post your likelihood function. Basically you just need to get out a pen and paper and work out the derivatives for your function with respect to its inputs, translate that result to numpy code, then swap out the normal_gradients from the example with your new function. Pretty much everything else is boilerplate.

Hi there,

Apologies for the delay in response.

The function I was performing inference on is simply

Z = j\omega M

where M is a diagonal 2 x 2 matrix, comprising unkowns m_1 and m_2 .

Code thus far is as follows:

import pytensor
import numpy
import pytensor.tensor as pt
from pytensor import function
import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import arviz as az
from pymc import pytensorf
plt.close('all')

def generate_synthetic_data():
    #function for observed data

    # Define the shared variable for the frequencies 'w'
    f = np.linspace(0.1, 1.0, 100)
    w_values = 2*np.pi*f
    
    # Define the 2x2 matrix M as a shared variable, the non-zero values are to be inferred by this routine.
    M_shared = np.array([[8,0],[0,2]])
    
    #Define Z function (true values)
    Z_true = np.zeros((len(f),2,2),dtype = complex)
    for i in range(len(f)):
        Z_true[i,:,:] = 1j*w_values[i]*M_shared
    
    # Add noise to create the observed data
    n_std = 0.01
    Z_observed_values = np.abs(Z_true) + np.random.normal(0, n_std, Z_true.shape)
    
    return f, w_values, np.abs(Z_true), Z_observed_values  # Return w and Z_obs for later use    
  
def create_pymc_model(w, Z_obs):
    # Create a PyMC model to sample masses and sigma
    with pm.Model() as model:
        # Priors for unknown model parameters
        mass1 = pm.Uniform("mass1", 0.1, 10)
        mass2 = pm.Uniform("mass2", 0.1, 10)
        sigma = pm.HalfNormal("sigma", 1)
      
        mass_mat = pt.stacklists([[mass1, 0], [0, mass2]])

        # loop step function over w to calculate/predict Z with priors
        def step(w, mass_mat):
            Z_pred = 1j*w*mass_mat
            return np.abs(Z_pred)
                
        # store results using scan instead of for looping
        results, _ = pytensor.scan(fn=step, sequences=[w], outputs_info=None, non_sequences=[mass_mat])
        
        # Likelihood term
        likelihood = pm.Normal("likelihood", mu=results.flatten(), sigma=sigma, observed=Z_obs.flatten())
        
        # Run the MCMC sampling
        trace = pm.sample(1000, tune=1000,step=pm.Metropolis(),chains = 4, cores=4)        
        
        return trace
                      
def main():
    plt.close('all')
    
    #main section and plotting
    f, w, Z_true, Z_obs = generate_synthetic_data()
    trace = create_pymc_model(w, Z_obs)
    az.plot_trace(trace);

     
if __name__ == "__main__":
    main()

Based on that, I hope I am right in saying that

\frac{dZ}{dM} = \begin{bmatrix} \ j\omega & 0 \\ 0 & j\omega \end{bmatrix}

From here, would it be best to take the absolute value and leave it in matrix form? or take the absolute value and flatten/vectorise, so that there is a single scalar value at each point in the flattened vector (though there will be gradients of zero for half the points)?

Also, as an aside (and sorry for squeezing this in here :heart:) - there will be a need for me in the future to add more lines to the scan element of the code so that I can use more complicated functions involving matrix inversions. An example extending the scan function in my previous code block is provided below…

     def step(w, mass_mat):
            Z_pred = 1j*w*mass_mat
            Y_pred  = pt.linalg.inv(Z_pred)
            return pt.abs(Y_pred)

        results, _ = pytensor.scan(fn=step, sequences=[w], outputs_info=None, non_sequences=[mass_mat])

It would appear that the inference goes out the window when I define the scan in this way. If I do it all in one line, then the inference works ok. NOTE - I realise this is applicable for MCMC-MH type algorithms and not gradient-based ones.

    def step(w, mass_mat):
        Y_pred = pt.linalg.inv(1j*w*mass_mat)
        return pt.abs(Y_pred)

I could code something up like this super easy via ‘for’ loops but trying to stick to the rules of using scan. I’m at a loss as to why this added step ceases to work but executing in one line does not. Is there a way to use multiple lines as intended? I hope it is a simple error on my part.

Many thanks once again.

M.

As an edit to the above, note that the issue around multiple lines in the scan file, namely

def step(w, mass_mat):
            Z_pred = 1j*w*mass_mat
            Y_pred  = pt.linalg.inv(Z_pred)
            return pt.abs(Y_pred)

        results, _ = pytensor.scan(fn=step, sequences=[w], outputs_info=None, non_sequences=[mass_mat])

Was resolved by ensuring the observation was reformatted as a tensor variable.

Issue with gradient implementation remains!