Problem using previous samples to new inference in new model context

I am trying to use a two-step procedure to estimate some ancillary variables and then use these estimated values in a real run:

the procedure is as follows:

with pm.Model(coords=coords) as baseModel:

    # input data
    NDVI_ = pm.MutableData("NDVI_", NDVI)
    T_ = pm.MutableData("T11_", T_obs)
    sm_ = pm.MutableData("sm_", sm_obs) #ONLY AVAILABLE FOR CALIBRATION

    #priors
    sigma_obs = 0.05
 
    A_ = pm.Uniform('A_', 0, 1)
    B_ = pm.Uniform('B_', 0, 1)

    x1_ = pm.Deterministic('x1_', A_ + B_*NDVI_, dims='timeFrame')
    
    def f(x1 = x1_, sm = sm_):
        out = T(x1, sm)    #external complex function
        pp = len(out)
        a = pt.tensor.zeros((2, sm_.shape))
        for i in range(pp):
            a = pt.tensor.set_subtensor(a[i], out[i])
        return a

    function_pm = pm.Deterministic('f', f(), dims=('obs_dim', 'timeFrame'))

    obs = pm.Normal('obs', mu=function_pm, sigma=sigma_obs, observed=T_, dims=('obs_dim', 'timeFrame'))

with baseModel:
    idatabaseModel = pm.sample()

This produce reasonable good ‘calibrated’ samples of both A_ and B_. Now, using the procedure described here, I want to use these learned samples from A_ and B_ to estimate sm_toEst in a real run,

with pm.Model(coords=coordsNew) as predModel:

    # NEW input data
    NDVI_ = pm.MutableData("NDVI_", NDVI_new)
    T_ = pm.MutableData("T_", T_new)
    
    #priors
    sm_toEst = pm.Uniform('sm_toEst', 0, 1, dims='timeFrame') #target variable
    sigma_obs = 0.05

    A_ = pm.Flat('A_')
    B_ = pm.Flat('B_')

    x1_new = pm.Deterministic('x1_new', A_ + B_*NDVI_, dims='timeFrame')

    def f_new(x1 = x1_new,  sm = sm_toEst):
        out = T(x1, sm)     #external complex function
        pp = len(out)
        a = pt.tensor.zeros((2, sm_.shape))
        for i in range(pp):
            a = pt.tensor.set_subtensor(a[i], out[i])
        return a

    # Likelihood (sampling distribution) of observations
    function_pm = pm.Deterministic('f_new', f_new(), dims=('obs_dim', 'timeFrame'))

    obsNew = pm.Normal('obsNew', mu=function_pm, sigma=sigma_obs, observed=T_, dims=('obs_dim', 'timeFrame'))

    # as stated in the post
    pp_pred = pm.sample_posterior_predictive(
        idatabaseModel,  
        var_names=['obsNew', 'sm_toEst', 'x1_new'], 
        # predictions=True, 
        random_seed=rng,
    )

The problem is that the estimated values of both sm_toEst and obsNew have a lot of variance. In fact, if I relearned the values of A_ and B_ while estimating sm_toEst at the same time, the estimation is much better.

I am afraid I am not using sample_posterior_predictive var_names keyword correctly. But if I don’t include it in the function call, the variable is not sampled. What I am doing wrong?

So one thing that is different in the out of predictions tutorial you linked compared to what you are doing is for the posterior predictions there they don’t use the observed. For instance this is from a block with sample_posterior_predictive

y_pred = pm.Normal("y_pred", mu=beta * x, sigma=0.1, shape=x.shape)

Are you trying to train sm_toEst using sample posterior predictive? As far as I know sample_posterior_predictive is not used for such purposes. I don’t even know what such a variable with defined observed does during sample_posterior_predictive. sample_posterior_predictive only is used to do prediction using the sampled posteriors in the model. For instance in the page you have linked above, they fit a linear regression and then predict what the outcome would have been if the constant input x had changed. This process simply takes the posteriors sampled from your model and combines it with the new x to see how y would have come out in that context.

So my guess is if you define a new prior that does not depend on anything else, it just samples from there using the prior (you can test this by setting the prior to be tight with say normal sd=0.1). sample_posterior_predictive as far I would guess will have nothing to do with the observed (but might be wrong so I will defer that to devs). For that you would have to have to use sample. Any reason you don’t include it in your first model? I notice you used a constant input for it instead, so it could be perhaps normal priors centred around those constant values.

2 Likes

Thank you for the answer. The general scheme I was trying to implement was:

  1. to estimate the posterior of some ancillary variables (A_, B_) given T and NDVI (both input data) in the case I know sm_toEst (calibration).
  2. then to use these posteriors of A_ and B_ to better estimate sm_toEst as a function of T and NDVI.

I understand now that if I define a new prior that does not depend on anything else, the model is sampling from this new prior and all calibration is lost. I just don’t understand how to use the posteriors of A_ B_ in a new inference problem.

If you look for “prior from posterior” in this forum you’ll find a couple of threads on the topic

There are some topics where I think this was discussed extensively. I am aware of the following two for instance:

I think in both cases, the discussion is around how to build a distribution that approximates best the posterior sampled from the “previous model” so that it can be used as the prior for another model and train on new data with those priors from posteriors. And it will rely on certain assumptions sometimes about the shape of the posterior (for instance that it is sufficiently normal etc). If posteriors for A_ and B_ don’t seem to be correlated, you could perhaps simply model them as pair of normals with estimated mean and sd of the posterior sample. Note however in your new model A_ and B_ will also be updated to new posteriors with the presence of new data.

Alternatively, the standard and safer approach is to just resample the whole model with the bigger dataset, instead of doing it sequentially. Sometimes this is not even that much slower

1 Like

I ended up doing exactly this as in my use-case (a rather large model) it ended up being faster sampling everything together (3h30) than doing it sequentially using the empirical distribution (5h30) discussed in these other threads and this pymc-experimental issue. However that might be due to implementation inefficiencies of the proposed empirical distribution, or something else. (I used pymc 5.9 with python 3.11 because the model breaks with subsequent pymc versions for some reason.) I keep thinking a proper (optimized) tool for this would be great to have in a toolbox. It would certainly be great for teaching Bayesian statistics !

2 Likes

Thank you all for your feedback. I was aware of the “histogram approximation” approach, but I also had problems with it in the past. I was under the impression that using the posterior predictive was a way around approximating the posterior using a custom approach (I was wrong).

About @ricardoV94 suggestion, when I have a time series problem I usually do this (i.e. resampling the whole dataset or a moving window). But in this case, I have a variable that in the first period is given (is data) and in the second I want to estimate. I don’t understand how can I build a retrieval scheme in which a variable in one period is data and for the next is to be estimated.

What kind of problem do you want to solve with “first is data, then it’s inferred”?

I will try to clarify my problem. I have observations (NDVI, T) that I want to use to estimate sm. However, I only have a complex physical model that links sm and T like,

T = f(x1, sm)

Luckily, there is evidence that x1 ca be approximated as x1 = A_ + B*NDVI, but I need to infer values for the coefficients. To do this, I have several sites and periods in which I have also data about he variable I want to estimate (sm); let’s call these data sm_obs and these periods calibration. Once the distributions for A_ and B_ are obtained, I want to use them to estimate sm (real run time).

To run everything together seem a great idea. I just need to write a model in which sm is first data (sm_obs) and them a random variable sm.

How about leaving sm in T=f(x1, sm) as a parameter but then adding a likelihood to the system of the form

pm.Normal("sm_obs", sm, sd, observed=sm_obs) 

Is there any reason why you want to estimate sm as a parameter after you have constructed the posteriors for A_ and B_ ? In the “experimental setup” are calibration for sm giving you likely values for sm or are they fixed (like how x are assumed to be fixed in a linear regression).

That being said also there is something called an error-in-variables model where you create some latent variables for something that would normally be fixed but might have error in its measurement (again like the x variable in a linear regression). That might be also useful to you, see for instance:

https://mc-stan.org/docs/2_21/stan-users-guide/bayesian-measurement-error-model.html

Thank you again for all the feedback. Based on @iavicenna idea, I just found out a solution. First, I want to apologize because I didn’t make clear from the beginning that sm is a vector. In fact is a time series, whose value is known for some period (calibration) and must be estimated in the second period (run time). The solution is inelegant (I divided the inference in two periods and the combined it) but works. The results compared to the benchmark (to just ignore the available values of sm available for calibration) are not much better, but could be depending on the actual treds of sm and NDVI. In any case I want to thank you again.

Solution is:

coordsNew = {
    'calTimeFrame': np.arange(len(sm_new)),
    'runTimeFrame': np.arange(len(sm_old)),
    'totalTimeFrame': np.arange(T_obs.shape[1]),
    'obs_dim': ['Channel0', 'Channel1']
}
with pm.Model(coords=coordsNew) as predModel:

    # input data
    NDVI_new_ = pm.MutableData("NDVI_new_", NDVI_new)
    T_obs_ = pm.MutableData("T_obs_", T_obs)

    NDVI_old_ = pm.MutableData("NDVI_old_", NDVI_old)
    T_old_ = pm.MutableData("T_old_", T_old)
    sm_old_ = pm.MutableData("sm_old_", sm_old)

    #priors
    sm_toEst = pm.Uniform('sm_toEst', 0, 1, dims='runTimeFrame')
    A_ = pm.Uniform('A_', 0, 1)
    B_ = pm.Uniform('B_', 0, 1)
    
    sigma_obs = 0.05

    x1_old_ = pm.Deterministic('x1_old_', A_ + B_*NDVI_old, dims='calTimeFrame')
    x1_new_ = pm.Deterministic('x1_new_', A_ + B_*NDVI_new, dims='runTimeFrame')
    
    def f_old(x1 = x1_old_,  sm = sm_old_):
        out = T(x1, sm)    
        pp = len(out)
        a = pt.tensor.zeros((2, T_new.shape[1]))
        for i in range(pp):
            a = pt.tensor.set_subtensor(a[i], out[i])
        return a


    def f_new(x1 = x1_new_,  sm = sm_toEst):
        out = T(x1, sm)    
        pp = len(out)
        a = pt.tensor.zeros((2, T_new.shape[1]))
        for i in range(pp):
            a = pt.tensor.set_subtensor(a[i], out[i])
        return a

    # Likelihood (sampling distribution) of observations
    function_pm_old = pm.Deterministic('f_old', f_old(), dims=('obs_dim', 'calTimeFrame'))
    function_pm_new = pm.Deterministic('f_new', f_new(), dims=('obs_dim', 'runTimeFrame'))

    mu_total = pt.tensor.concatenate([function_pm_old, function_pm_new],axis=1)
 
    obsTotal = pm.Normal('obsTotal', mu=mu_total, sigma=sigma_obs, observed=T_obs_, dims=('obs_dim', 'totalTimeFrame'))