How to create Bayesian data fusion in python with pymc3?

I have been looking into data fusion methods and what caught my eyes is the idea of using Kalman filter which looks into data fusion data which looks into mean and variance of Gaussian distribution and implements the prediction and correction from weak sensor to stronger/more accurate sensor. I looked into the following GitHub links to get a further understanding of fusion techniques:

So, in the first link, I found they were talking about the discrete Bayesian filter, but, they didn’t mention about the continuous Bayesian filter. So, I thought to do the same steps with the idea from Kalman filter to implement a continuous Bayesian filter with the help of PyMC3 package. The steps involved can be found in the second link and code is below.

mean = mean0
var = var0

def predict(var, mean, varMove, meanMove):
    new_var = var + varMove
    new_mean= mean+ meanMove
    return new_var, new_mean

def correct(var, mean, varSensor, meanSensor):
    new_mean=(varSensor*mean + var*meanSensor) / (var+varSensor)
    new_var = 1/(1/var +1/varSensor)
    return new_var, new_mean

# Predict
var, mean = predict(var, mean, varMove, distances[m])
#print('mean: %.2f\tvar:%.2f' % (mean, var))
plt.plot(x,mlab.normpdf(x, mean, var), label='%i. step (Prediction)' % (m+1))
        
# Correct
var, mean = correct(var, mean, varSensor, positions[m])
print('After correction:  mean= %.2f\tvar= %.2f' % (mean, var))
plt.plot(x,mlab.normpdf(x, mean, var), label='%i. step (Correction)' % (m+1))

So, from the data that I am using, I implemented the following link,

Fitting empirical distribution to theoretical ones with Scipy

from the link above and the distribution list, Hypersecant distribution fits with my data. To implement the data fusion idea is to be able to get the initial data and then use the parameters from the first model and implement them into the prediction models and collect parameters from the prediction model to the correction model. The first thing I had to do was to create the custom distribution to create the Hypersecant distribution. For the custom hypersecant distribution, it can be found in the following question:

Creating a Custom Hypersecant distribution

After creating the custom distribution, then the bayesian fusion can be done. Here is my approach to the fusion (I apologise for the lengthy code),

def S2_0_Bayesian_Interface(data):
########################################################################################################################
    with pm.Model() as model_0:
            # Prior Distributions for unknown model parameters:
            sigma_0 = pm.HalfNormal('sigma_0', sd=5)
            mu_0 = pm.Normal('mu_0', mu=0, sd=5)

            # Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
            observed_data_0 = Hypsecant('observed_data_0', mu=mu_0, sigma=sigma_0, observed=data)

            # draw 5000 posterior samples
            trace_0 = pm.sample(draws=1000, tune=500, target_accept = 0.9, chains=3, cores=1, progressbar=True)
    
            # Obtaining Posterior Predictive Sampling:
            post_pred_0 = pm.sample_posterior_predictive(trace_0, samples=1000)
            print(post_pred_0['observed_data_0'].shape)
            print('\nSummary: ')
            print(pm.stats.summary(data=trace_0))
            print(pm.stats.summary(data=post_pred_0))
########################################################################################################################
    return trace_0, post_pred_0

def S2_P_Bayesian_Interface(mu, sigma, data, trace):
########################################################################################################################
    with pm.Model() as model_P:
            # Prior Distributions for unknown model parameters from posterior distribution:
            sigma_P = pm.HalfNormal('sigma_P', sigma=np.std(sigma))
            mu_P = pm.Normal('mu_P', mu=np.mean(mu), sigma=np.std(mu))

            # Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
            observed_data_P = Hypsecant('observed_data_P', mu=mu_P, sigma=sigma_P, observed=data)

            # draw 5000 posterior samples
            trace_P = pm.sample(draws=1000, tune=500, target_accept = 0.9, chains=3, cores=1, progressbar=True)
    
            # Obtaining Posterior Predictive Sampling:
            post_pred_P = pm.sample_posterior_predictive(trace_P, samples=1000)
            # post_pred_P = pm.sample_posterior_predictive(trace_P, samples=1500)
            print(post_pred_P['observed_data_P'].shape)
            print('\nSummary: ')
            print(pm.stats.summary(data=trace_P))
            print(pm.stats.summary(data=post_pred_P))
########################################################################################################################
    return trace_P, post_pred_P

def S1_C_Bayesian_Interface(mu, sigma, data, trace):
########################################################################################################################
    with pm.Model() as model_C:
            # Prior Distributions for unknown model parameters from posterior distribution:
            sigma_C = pm.HalfNormal('sigma_C', sigma=np.std(sigma))
            mu_C = pm.Normal('mu_C', mu=np.mean(mu), sigma=np.std(mu))

            # Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
            observed_data_C = Hypsecant('observed_data_C', mu=mu_C, sigma=sigma_C, observed=data)

            # draw 5000 posterior samples
            trace_C = pm.sample(draws=1000, tune=500, target_accept = 0.9, chains=3, cores=1, progressbar=True)
    
            # Obtaining Posterior Predictive Sampling:
            post_pred_C = pm.sample_posterior_predictive(trace_C, samples=1000)
            print(post_pred_C['observed_data_C'].shape)
            print('\nSummary: ')
            print(pm.stats.summary(data=trace_C))
            print(pm.stats.summary(data=post_pred_C))
########################################################################################################################
    return trace_C, post_pred_C

# Initializing the PDF Distribution:
print('#######################################################################################################')
print('Designing the Bayesian PDF for initial Sensor 2:')
trace_S2_0, post_pred_S2_0 = S2_0_Bayesian_Interface(data=S2_data_list[0])
best_dist = getattr(st, 'hypsecant')
param_S2_0 = [np.mean(trace_S2_0['mu_0']), np.mean(trace_S2_0['sigma_0'])]
print('Parameters values = ', param_S2_0)
pdf_S2_0, B_x_axis_pdf_S2_0, y_axis_pdf_S2_0 = make_pdf(data=S2_data_list[0], dist=best_dist, params=param_S2_0, size=len(trace_S2_0))
print('#######################################################################################################')
plt.plot(B_x_axis_pdf_S2_0, y_axis_pdf_S2_0, label='%i. step (Initial[S2])' % (m + 1))

for m in range(len(S2_data_list)):
    print('\n')
    print('Acessing list number: ', m)
    # Predict:
    print('#######################################################################################################')
    print('Designing the Bayesian PDF for predictive Sensor 2:')
    trace_S2_P, post_pred_S2_P = S2_P_Bayesian_Interface(mu=trace_S2_0['mu_0'], sigma=trace_S2_0['sigma_0'], data=S2_data_list[m])
    best_dist = getattr(st, 'hypsecant')
    param_S2_P = [np.mean(trace_S2_P['mu_P']), np.mean(trace_S2_P['sigma_P'])]
    print('Parameters values = ', param_S2_P)
    pdf_S2_P, B_x_axis_pdf_S2_P, y_axis_pdf_S2_P = make_pdf(data=S2_data_list[m], dist=best_dist, params=param_S2_P, size=len(trace_S2_P))
    print('#######################################################################################################')
    plt.plot(B_x_axis_pdf_S2_P, y_axis_pdf_S2_P, label='%i. step (Prediction[S2])' % (m + 1))

    # Correct:
    print('#######################################################################################################')
    print('Designing the Bayesian PDF for correction Sensor 1:')
    trace_S1_C, post_pred_S1_C = S1_C_Bayesian_Interface(mu=trace_S2_P['mu_P'], sigma=trace_S2_P['sigma_P'], data=S1_data_list[m])
    best_dist = getattr(st, 'hypsecant')
    param_S1_C = [np.mean(trace_S1_C['mu_C']), np.mean(trace_S1_C['sigma_C'])]
    print('Parameters values = ', param_S1_C)
    pdf_S1_C, B_x_axis_pdf_S1_C, y_axis_pdf_S1_C = make_pdf(data=S1_data_list[m], dist=best_dist, params=param_S1_C, size=len(trace_S1_C))
    print('#######################################################################################################')
    plt.plot(B_x_axis_pdf_S1_C, y_axis_pdf_S1_C, label='%i. step (Correction[S1])' % (m + 1))
    
    corr_pdf_S1 = y_axis_pdf_S1_C
    corr_pdf_S1_list.append(corr_pdf_S1)
    print('Corrected PDF: ', corr_pdf_S1)
    print('Length of Corrected PDF list = ', len(corr_pdf_S1_list))
    x_corr_pdf_list.append(B_x_axis_pdf_S1_C)

However, I am not sure if I am doing the steps correctly. Here are my questions:

  • Am I approaching the idea correctly? If not, how can I modify the code? Also, if possible share a tutorial explaining the idea of bayesian fusion/bayesian updating?
  • Everytime I re-run the code, I keep on getting different distributions when fusing the data together. Could the problem lie within the def random section in my custom distribution? or could it be because everytime I run the code I would get different values in mu and sigma which could cause the reason for different fused distribution?
  • Could it be because of the mu code line, e.g. mu=np.mean(mu), sigma=np.std(mu) where it should be mu=np.mean(mu), sigma=np.std(sigma)?

P.S:

for the make_pdf code it can be found in the Fitting empirical distribution to theoretical ones with Scipy.

Other possibilities of having different results in the bayesian fusion:

  • Could it be because I am using weakly priors in the pymc3 models where the weakly priors would cause the bayesian fusion to have different results, everytime I re-run the code?
  • Could it be because the assigned draws and tune parameters in the trace code? as I assign the trace code to,
    trace = pm.sample(draws=1000, tune=500,......)
    Which would possible lead to different prior values?

I have a repo of Kalman Filter implementations for PyMC here, perhaps it can be helpful for seeing how to implement the filter, but these are standard filters that assume normally distributed innovations. I see you’re using a non-normal distribution, so I think you would need some kind of extended filter.

To be honest I am having trouble following your code. The loop over the data should be inside the model (using an aesara/theano Scan Op), because the filter works by iteratively updating its belief about how much each sensor should be trusted. I see that you are fitting each data point then using the estimates to parameterize the priors of the next iteration, but this discards information about the higher order moments of your posterior estimates (skew etc). I also don’t see you calling your predict or correct functions anywhere in the loop, so where is the Kalman filtering happening?

I’d go through the Kalman and Bayesian Filters in Python book you linked. This is an excellent resource that I would have linked you had you not already. You will need to pay attention to the final chapters on non-gaussian filtering, since that seems to be your use case.

Thank you for the reply.

In my code, I have two sensors where S2 is the weak sensor and S1 is the strong sensor, and what I am dealing with is same case as the Kalman Filter in 1 Dimension since I am dealing with 1D data.

So, I created Bayesian Fusion to follow the steps below:

  1. get the first data of S2 and getting their trace of the priors. ('Designing the Bayesian PDF for initial Sensor 2:')
  2. Use the priors trace to create a predictive priors based on the next data of S2. ('Designing the Bayesian PDF for predictive Sensor 2:')
  3. Update the distribution by using the predictive priors from the previous step into S1 and obtain the corrected custom distribution. ('Designing the Bayesian PDF for correction Sensor 1:')

I created this code following the same style as the in Kalman Filter in 1 Dimension since I am dealing with 1D data and changed the approach to fit in the Bayesian package (which is PyMC3).

mean = mean0
var = var0

plt.figure(figsize=(fw,5))
for m in range(len(positions)):
    
    # Predict
    var, mean = predict(var, mean, varMove, distances[m])
    #print('mean: %.2f\tvar:%.2f' % (mean, var))
    plt.plot(x,mlab.normpdf(x, mean, var), label='%i. step (Prediction)' % (m+1))
    
    # Correct
    var, mean = correct(var, mean, varSensor, positions[m])
    print('After correction:  mean= %.2f\tvar= %.2f' % (mean, var))
    plt.plot(x,mlab.normpdf(x, mean, var), label='%i. step (Correction)' % (m+1))
    
plt.ylim(0, 0.1);
plt.xlim(-20, 120)
plt.legend();

Based on your experience with pymc3, could any of the reasons that I mentioned would affect my results in the attempt of creating Bayesian fusion?

There seem to be some misunderstandings. Your data is 1d, but you have two sensors, so you are actually in a multivariate case. If you specifically want to do Kalman filtering, you need to write down a linear system that explains how the senor measurements (the states of the system) are combined into a single observation. The code I linked shows how to implement this, and the book you linked explains how to write down a linear state space. That said, Kalman filtering is very restrictive. The state system must be linear, and all the noise must be Gaussian. You appear to want non-Gaussian noise, which means you need to use an extended Kalman filter. This is non-trivial.

I think your problem is actually much simpler. You can estimate a single weighting parameter to join the measurements of the two sensors, and use the resulting fusion as the mean of your distribution:

 # mean of 0.75, for a strong prior bias towards the "strong" sensor
rho = pm.Beta('senor_1_weight', alpha=6, beta=2)
mu = rho * sensor_1_data + (1 - rho) * sensor_2_data

sigma = pm.HalfNormal('obs_sigma')
obs = Hypsecant('obs', mu=mu, sigma=sigma, observed=data)

You could further refine this by adding sensor noise, for example by adding likelihood functions for sensor_1_data and sensor_2_data, maybe as guassian random walks. You could then estimate noise parameters for each of those.

My case is different, I am not using the sensors as for localization problem, I am using the 1D Kalman and Bayesian filter to combine/fuse multiple distributions together.

The data I am getting is based on the article PDDR: a statistical time-domain damping parameter for structural damage identification. So, I get the probability distributions of several cases and I tend to fuse each distribution together of similar case. I managed to achieve that with the 1D Kalman filter, but I am now trying to achieve the same with Bayesian filter. My aim is concentrating on the sigma parameter since this is important parameter.

I tried to create the Bayesian inference based on the information that I found in the following links:

I think I created the mu as Generic weakly informative prior and same for sigma (hopefully :sweat_smile:), but when it comes to creating the fusion part, I keep on coming back to the priors as I include the observed data as distribution data from sensors. Am I right to think and assume that difference in the reuslts is due to my setting up the model for Bayesian fusion? Like the parametrization like target_accept, draws, tune and the priors.

The arguments to pm.sample have to do with helping the MCMC sampler converge to the true posterior. Are your models converging? What are the r_hat values for each variable you estimate? Are there divergences in the sampler? Asking these questions will guide you to an answer about sampler settings.

Well, when I run my code, I always get the rhat to be at 1.0 for the priors and yes, they do converge, however, sometimes there are divergences, but it sometimes be between 2 to 15 for each chain and in total, from last run, like 35 and lower. Furthermore, when it comes to the posterior predictive sampling in the prediction, it like 1.2 to 3.11 while correction 2.5 to 6.0 for the rhat.

r_hat doesn’t make any sense for a prior, because you already know the target distribution. From what you report about the posterior, your sampler is not converging. The generic solution for divergences is to increase target_accept to be closer to 1.0, but an r_hat of 2.6 to 6 suggests that there is no mixing at all happening between your chains, and there is something more fundamentally wrong in your modeling approach.

Okay, do you mean in this line (for example), or in general my approach of designing the model?