Average loss for ADVI never decrease for my model

Shanglin_Yang · August 31, 2023, 4:42pm

Hi all. I am new to ADVI sampling. When I use ADVI to fit parameters for my model. I find that the average loss never decreases and the posterior value of the parameters is exactly the value at the middle of my prior distribution. The ELBO and trace figures are as follows:

And here is my model:

I am sure that there must be some sets of better parameters for the fitting process since I can even manually adjust the parameters to get a better fitting model.

What can be the possible reason that ADVI doesn’t work for me? Thanks!

twiecki · August 31, 2023, 5:15pm

Can you post the code directly instead of a screenshot?

Does the model sample correctly with NUTS (i.e. pm.sample())? I suspect it’s not an ADVI but a model issue.

Shanglin_Yang · August 31, 2023, 5:33pm

Hi,

Thanks for the reply. You are correct. I forgot to mention that using NUTS for my model takes an extremely long time but it’s okay to use metropolis hasting.

Here is the code.

def update_decisions(acti_normalized, noise_d_seq, sign_seq, bias, weight,learning_rate,bias_strength):
    # Calculate sign of decision
    decision=pt.sum(acti_normalized*weight,axis=1) - bias_strength * bias + noise_d_seq
    decision=decision/pt.abs(decision)

    # Update bias and weight
    bias= bias_update*decision+(1-bias_update)*bias
    delta=acti_normalized*learning_rate*sign_seq*Amax
    weight=weight+(w_max-weight)*pt.switch(delta>=0,delta,0)+(weight-w_min)*pt.switch(delta<0,delta,0)
    
    return bias, weight, decision    

BIP=pm.Model()
lower=[0,0,0,0,0]
upper=[0.01,3,3,0.5,0.5]

with BIP:
    test=pm.Uniform('test', lower=lower, upper=upper)
    
    noise_d_seq =  pt.as_tensor_variable(np.random.normal(0, 1, size=(nTrials,nAverage)))*test[3]
    noise_r_seq =  pt.as_tensor_variable(np.random.normal(0, 1, size=(nTrials,nAverage,nUnits)))*test[4]
    
    acti_withnoise = acti_seq +noise_r_seq
    acti_normalized =((1-pt.exp(-test[2]*pt.switch(acti_withnoise>0,acti_withnoise,0)))/(1+pt.exp(-test[2]*pt.switch(acti_withnoise>0,acti_withnoise,0))))*Amax
    
    results, _ = pytensor.scan(fn=update_decisions,
                         sequences=[acti_normalized, noise_d_seq, sign_seq],
                         outputs_info=[bias,weight, None],
                         non_sequences=[test[0],test[1]])
    
    correctness=(results[2].sum(axis=1)/nAverage)*sign_seq
    block_correctness_prob=pt.clip((pt.stack([correctness[IL].sum(axis=1),correctness[IN].sum(axis=1),correctness[IH].sum(axis=1)
                                 ,correctness[CL].sum(axis=1),correctness[CN].sum(axis=1),correctness[CH].sum(axis=1)],axis=0)+50)/100,0.001,0.999)
    nCorrect=pm.Binomial('nCorrect', n=trials_p_block, p=block_correctness_prob, observed=block_correct)

twiecki · August 31, 2023, 6:41pm

OK that means your model is mis-specified. Unfortunately I won’t be able to dig into why but I recommend you start with the simplest model you can, get that working on NUTS, and then slowly add back complexity while always making sure NUTS samples well and fast.

Shanglin_Yang · September 1, 2023, 1:45am

Can you explain a little bit about what ‘mis-specified’ means? I have checked the calculation line by line and no error is found. Thanks!

cluhmann · September 1, 2023, 2:08am

I would describe what you are seeing as the model and the data and the sampler not getting along very nicely. Uniform distributions tend to be tough with the NUTS sampler, for example. And the fact that you are incorporating random elements into your model (e.g., the calls to np.random.normal) is generally not a good idea in the context of a PyMC model.

Instead of debugging your current model, I would suggest stepping back and describing what you are trying to do. Once you and others here have a handle on that, you can do as @twiecki suggest and start with a simple (possibly too-simple) version of the model and gradually add pieces to it so that you know when (if) sampling starts to be problematic.

Shanglin_Yang · September 3, 2023, 9:46am

Thank you for the suggestion! I will definitely try them!

Topic		Replies	Views
BNN regression using ADVI Questions	10	1231	May 14, 2019
Average Loss in optimization output Questions	20	4733	July 20, 2018
What are the differences between NUTS and ADVI? Questions	3	1816	July 4, 2023
ADVI result systematically different to NUTS Questions	2	633	January 29, 2020
Inf Average loss ADVI in Correlated Topic Model Questions	9	1537	July 25, 2018

Average loss for ADVI never decrease for my model

Related topics