Average loss for ADVI never decrease for my model

Hi all. I am new to ADVI sampling. When I use ADVI to fit parameters for my model. I find that the average loss never decreases and the posterior value of the parameters is exactly the value at the middle of my prior distribution. The ELBO and trace figures are as follows:


And here is my model:


I am sure that there must be some sets of better parameters for the fitting process since I can even manually adjust the parameters to get a better fitting model.

What can be the possible reason that ADVI doesn’t work for me? Thanks!

1 Like

Can you post the code directly instead of a screenshot?

Does the model sample correctly with NUTS (i.e. pm.sample())? I suspect it’s not an ADVI but a model issue.

Hi,

Thanks for the reply. You are correct. I forgot to mention that using NUTS for my model takes an extremely long time but it’s okay to use metropolis hasting.

Here is the code.

def update_decisions(acti_normalized, noise_d_seq, sign_seq, bias, weight,learning_rate,bias_strength):
    # Calculate sign of decision
    decision=pt.sum(acti_normalized*weight,axis=1) - bias_strength * bias + noise_d_seq
    decision=decision/pt.abs(decision)

    # Update bias and weight
    bias= bias_update*decision+(1-bias_update)*bias
    delta=acti_normalized*learning_rate*sign_seq*Amax
    weight=weight+(w_max-weight)*pt.switch(delta>=0,delta,0)+(weight-w_min)*pt.switch(delta<0,delta,0)
    
    return bias, weight, decision    

BIP=pm.Model()
lower=[0,0,0,0,0]
upper=[0.01,3,3,0.5,0.5]

with BIP:
    test=pm.Uniform('test', lower=lower, upper=upper)
    
    noise_d_seq =  pt.as_tensor_variable(np.random.normal(0, 1, size=(nTrials,nAverage)))*test[3]
    noise_r_seq =  pt.as_tensor_variable(np.random.normal(0, 1, size=(nTrials,nAverage,nUnits)))*test[4]
    
    acti_withnoise = acti_seq +noise_r_seq
    acti_normalized =((1-pt.exp(-test[2]*pt.switch(acti_withnoise>0,acti_withnoise,0)))/(1+pt.exp(-test[2]*pt.switch(acti_withnoise>0,acti_withnoise,0))))*Amax
    
    results, _ = pytensor.scan(fn=update_decisions,
                         sequences=[acti_normalized, noise_d_seq, sign_seq],
                         outputs_info=[bias,weight, None],
                         non_sequences=[test[0],test[1]])
    
    correctness=(results[2].sum(axis=1)/nAverage)*sign_seq
    block_correctness_prob=pt.clip((pt.stack([correctness[IL].sum(axis=1),correctness[IN].sum(axis=1),correctness[IH].sum(axis=1)
                                 ,correctness[CL].sum(axis=1),correctness[CN].sum(axis=1),correctness[CH].sum(axis=1)],axis=0)+50)/100,0.001,0.999)
    nCorrect=pm.Binomial('nCorrect', n=trials_p_block, p=block_correctness_prob, observed=block_correct)

OK that means your model is mis-specified. Unfortunately I won’t be able to dig into why but I recommend you start with the simplest model you can, get that working on NUTS, and then slowly add back complexity while always making sure NUTS samples well and fast.

1 Like

Can you explain a little bit about what ‘mis-specified’ means? I have checked the calculation line by line and no error is found. Thanks!

I would describe what you are seeing as the model and the data and the sampler not getting along very nicely. Uniform distributions tend to be tough with the NUTS sampler, for example. And the fact that you are incorporating random elements into your model (e.g., the calls to np.random.normal) is generally not a good idea in the context of a PyMC model.

Instead of debugging your current model, I would suggest stepping back and describing what you are trying to do. Once you and others here have a handle on that, you can do as @twiecki suggest and start with a simple (possibly too-simple) version of the model and gradually add pieces to it so that you know when (if) sampling starts to be problematic.

1 Like

Thank you for the suggestion! I will definitely try them!