How to perform ADVI only on some of the variables in a model?

I am trying to build a model (M1) where some parameters are extracted using the posterior of some other model (M2) and other parameters should be estimated using ADVI. Is this possible to use this kind of framework? One way to do this is to use posterior of M2 as a prior of M1, but I want to keep those parameters fixed and not learn again. Anything in this direction would be useful.

If you don’t want to estimate scalar values, you can just plug them into your model equations as numbers, or wrap them in pm.Deterministic. For example, if you write the following regression:

with pm.Model():
    x = pm.MutableData('x', ...)
    z = pm.MutableData('z', ...)
    y = pm.MutableData('z', ...)
    a = pm.Normal('a')
    b = pm.Normal('b')
    mu = pm.Determinsitic('mu', a + b * x + 4 * z)
    sigma = pm.Exponential('sigma', 1)

    y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma, observed=y)

The coefficient “4” associated with covariate z will not be “learned”.

Is your situation more complex than that?

The model (M2) in my question is a neural network, so I am estimating the distribution of the weights of a neural network, which are in matrix and vector form. I want to keep the distribution of these parameters fixed for the new model. In your solution, you are using coefficient “4”, which is a scaler. Instead of 4, how can I use a distribution? Also, in my case, coefficients are matrix and scaler, and are distributions. Does that make sense?

I think you want something similar to this discussion? If you want to sample from a fixed distribution I think you have to use a custom step function. You could use pm.Interpolated to convert your posteriors to distributions, then use a custom sampler like this one to draw from them without updating. Is that correct @ricardoV94 ?

That “trick” works for MCMC sampling, but wouldn’t be applicable to VI (there aren’t even step samplers there).

I am not sure whether VI allows you to infer the posterior of some variables, while keeping the prior for others. I don’t even know if it makes conceptual sense in that context.

If it does make sense, perhaps Minibatching across the histogram of the “fixed” variables achieves something like what you want?

Maybe @ferrine or @fonnesbeck can weigh in?

So for now I can just use the mean value of the posterior distribution as a fixed value in the next model.

Let’s say now I relaxed the condition. Now model M1 should be trained from the end point of model M2. Would the below procedure work?

with M2:
        
  inference = pm.ADVI(random_seed=seed)

  pm.set_data({"ann_input": u_scale_train[index_train]})
  tracker = pm.callbacks.Tracker(
  mean= inference.approx.mean.eval,  # callable that returns mean
  std= inference.approx.std.eval  # callable that returns std
  )

  approx = pm.fit(n= 10000, random_seed=seed, callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-2)], method=inference, obj_optimizer=adam(learning_rate=0.05,decay_iter=4000))
  idata = approx.sample(2000,random_seed=seed) 

with M1:
  #inference = pm.ADVI(random_seed=seed)

  pm.set_data({"ann_input": u_scale_train[index_train]})
  tracker = pm.callbacks.Tracker(
  mean= inference.approx.mean.eval,  # callable that returns mean
  std= inference.approx.std.eval  # callable that returns std
  )

  pm.ADVI.refine(inference,n=4000)
  idata_servicer = approx.sample(2000,random_seed=seed)  

In the above procedure, the parameters (p2) of the “M2” get trained and in the second part, I use the same set of parameters and refine the “M1” model which contains some additional parameters (p2+p1). In this procedure, some parameters (p2) of the “M1” are getting trained but not from the very start and I other parameters (p1) are trained from the start. Would the above code work to solve the relaxed problem?
I think I missing something because parameters “p1” are not getting updated.