Alternatives to Bayesian updating for non-shareable data?

Hi all. I’ve got a more theoretical question this time. I want to work with some federated data (I don’t want to use federated learning), where observed data cannot be shared across labs. The simplest and most effective approach thus far is doing good old Bayesian updating, and it works generally well (besides being theoretically sound). But still some info is lost in the process (e.g. variation across labs, which could be modelled hierarchically). In short, is there an appropriate way to combine likelihoods/posteriors from different labs? Each lab would have different observed data (i.e. locally observed), but of the same type (theoretically from the same generative distribution), and the model used to analyse each local dataset would be the same. The idea would be to either to generate new posteriors, some sort of weighted average, or maybe some posterior predictions. If valid, what would be the most straightforward way to implement one of these approaches in PyMC?

(P.S. I imagine that combining posterior predictives from each local model and using them as observed data for a global model is highly inappropriate, even if incorporating the uncertainty of the predictions in the model, though I couldn’t fully express a mathematical argument for this intuition).

Sorry for the wordy question. Thanks!

You could try to combine the individual posteriors using inverse variance weighting..

If you could sample in a distributed compute environment, you can just provide parameter settings from the sampler to the individual sites and return back the logp and dlogp.

I wouldn’t be surprised if there was a better answer than mine.

1 Like

Many thanks! This is a very good idea, maybe there are other meta-analysis techniques that could help as well (didn’t occur to me before).

The plan is indeed to have the same set-up per lab and share posteriors and log-likelihoods (i.e. idata minus observations), so we can share logp and lopd. Sorry if I miss something, the idea would be applying inverse variance weighting to posteriors, then how do logp and dlogp come into play?

Yeah, reading your initial post I was thinking this sounds very much like Bayesian meta analysis. This is potentially just simple hierarchical regression. It mostly just depends on the exact form of the data that you have. Or in other words, the kinds of summary stats you have from each study, like Cohens d or R^2 etc.

I have an old (nearly) package, metaBayes, a`which implements a bayesian meta analysis with a notebook showing it’s use here https://github.com/drbenvincent/Veillard-Vincent/blob/master/bayesian%20meta%20analysis%20MCQ.ipynb

But note that this is specifically for the case where we extract Pearson correlation coefficients and sample sizes from different studies/papers. Oh, now I think about it, I think I also made it work with case/control type studies (i.e. operating on Cohen’s D data).

If this is vaguely close to what you want, let me know. I always thought that I should put the work in and release this as a specific package, but it never made it to the top of the priority list.

1 Like

These are two independent, separate approaches you could take.

@michaelosthege has told me about some work he did to compute parts of the logp on different machines.

Many thanks for the reply. This is very helpful. My goal would be to use shape and scale parameters (both Gamma distributions) of a Gamma likelihood. Some maybe the summary stats are akin to mean and std. I haven’t managed to give a detailed look to your code, but I think this is somewhat close to your approach. Maybe the tricky part could be working with non-Gaussian distributions?

Many thanks again. It’d be great if @michaelosthege could give me some advise. I’m not quite sure on how I could retrieve posterior information (as in my post above from scale and shape parameters) from combined logps. (Or on how to combine them appropriately).

Hi @Simon and thanks for tagging me, Thomas,

yes indeed, and actually I took this as the motiviation to finally make the repository public: GitHub - michaelosthege/pytensor-federated: Federated differentiable graph computation using PyTensor

pytensor-federated enables you to asynchronously (and thereby in parallel) perform the compute of certain Ops on other machines.
The README.md has several diagrams explaining how it works, and there are demo_model.py and demo_node.py with a toy example.
Check it out and let me know if this works for you. Feel free to ask questions here, or in a discussion on the project. (But tag me, and if I don’t reply within a few days contact me. Sometimes I overlook notifications from Discourse.)

cheers

2 Likes

Many thanks for sharing this, @michaelosthege . I’m not quite familiar with federated analysis, so there few things I’m finding hard to understand. My main concern is that labs I’m working with cannot share observations (in any way), and probably won’t allow port access. How could pytensor-federated work in this case?

Would it be okay if the labs share only the logp and gradient?
Then you can launch a worker that receives parameters, calculates the logp/dlogp, optionally adds and arbitrary constant to the logp and returns a message with only that number.

You might be working on exactly the use case I had in mind when I created the tools! Feel free to contact me at m.osthege@fz-juelich.de, then we can do a quick video chat about it

Sounds great, thank you! I’ll send you an email.

Also, digging around on methods for combining summary stats of estimates (i.e. mean and SD) I found this oldish paper: An Approach to Combining Results From Multiple Methods Motivated by the ISO GUM - PMC
I don’t know how appropriate the method is respect to the specific use cases in your package, but in the annex of the paper they provide a Bayesian method which is easy to translate to PyMC and gives pretty much the same results as using a hierarchical (across labs) model on estimated means (e.g. each participant/patient’s estimated mean from each lab). Let me know if appropriate/relevant and I can share the code, maybe with some simulated data.

Hi @Simon. I think this thread is quite vague at the moment. What is the form of the data you have? Is your goal a Bayesian meta analysis / hierarchical model? If you just want to get a Bayesian meta analysis done and you don’t want to implement it yourself, then I’d recommend JASP. If you want to build a PyMC model yourself to do your own custom Bayesian meta analysis, then I think the next step is to boil the problem down to the specifics. I think that will unlock the next steps for yourself and/or enable people to point you in the right direction. Hope that helps?

Sorry if I wasn’t clear before and that I didn’t provide more concrete examples. My question was intended to ask for general guidelines rather than to solve a specific modelling issue. The main goal is to have analyses that can be implemented in a federated context. However, we may face several problems regarding data sharing restrictions or labs unable to implement our software/models. In that case, a meta-analytic approach where we can combine the estimates from each lab, would be very useful. In short, there’s no single solution to our problem (no single right direction), but hopefully many that can be implemented together and/or alternatively. So, all ideas related to this topic are welcome.

But to give you a more concrete idea, suppose I want to estimate incubation periods of a disease, the data are exposure start times and symptoms onset times, and the observed data to input in the model would be start - onset (here I represent that simply by sampling randomly from a Gamma distribution):

# -*- coding: utf-8 -*-

import pymc as pm
import arviz as az
import numpy as np
import pytensor.tensor as pt


#lab 1 estimate
N1 = 20
mu1 = 9
sig1 = 1
# exposure start minus symptoms onset
obs1 = np.random.gamma(shape=mu1, scale=sig1, size=N1).round(0)

with pm.Model() as mod1:
    a = pm.Gamma("a", 1, 1)
    b = pm.HalfNormal("b", 1, shape=N1)
    m = pm.Deterministic("m", a + b)  
    s = pm.Gamma("s", 1, 1) #standard deviation parameter
    y = pm.Gamma("y", mu=m, sigma=s, observed=obs1)
    idata1 = pm.sample()

m1 = az.extract(idata1.posterior)['m'].values.mean(axis=1)
s1 = az.extract(idata1.posterior)['s'].values.mean()

az.plot_posterior(idata1, var_names=["mu"], hdi_prob=0.9)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, b, s]
 |████████████| 100.00% [8000/8000 00:31<00:00 Sampling 4 chains, 0 divergences]Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 53 seconds.

image

#lab 2 estimate
N2 = 40
mu2 = 9
sig2 = 1
# exposure start minus symptoms onset
obs2 = np.random.gamma(shape=mu2, scale=sig2, size=N2).round(0)

with pm.Model() as mod2:
    a = pm.Gamma("a", 1, 1)
    b = pm.HalfNormal("b", 1, shape=N2)
    m = pm.Deterministic("m", a + b)  
    s = pm.Gamma("s", 1, 1) #standard deviation parameter
    y = pm.Gamma("y", mu=m, sigma=s, observed=obs2)
    mu = pm.Deterministic("mu", m.mean())
    idata2 = pm.sample()

m2 = az.extract(idata2.posterior)['m'].values.mean(axis=1)
s2 = az.extract(idata2.posterior)['s'].values.mean()

az.plot_posterior(idata2, var_names=["mu"], hdi_prob=0.9)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, b, s]
 |████████████| 100.00% [8000/8000 00:33<00:00 Sampling 4 chains, 0 divergences]Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 55 seconds.

image

Now, supposing that lab1 and lab2 cannot share the observed data, what would be an appropriate metanalytic approach to combine the estimates from parameters m and s? As mentioned in my previous post (reference there), I attempted this:


## approach from Levenson et al (2000)
n = np.array([N1, N2])
x = np.array([m1.mean(), m2.mean()])
s = np.array([s1, s2])

with pm.Model() as mod:
    t = pm.StudentT("t", mu=0, sigma=1, nu=n-1, shape=2)
    m = pm.Deterministic("m", x + (s/pt.sqrt(n))*t)
    l = m.sum()/len(x) - pt.abs(m.max()-m.min())/2
    u = m.sum()/len(x) + pt.abs(m.max()-m.min())/2
    sigma = pm.Deterministic("sigma", pt.abs(m.max()-m.min())/2)
    mu = pm.Uniform("mu", lower=l, upper=u)
    idata_a = pm.sample()
    
az.plot_posterior(idata_a, var_names=["mu"], hdi_prob=0.9)

Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>Metropolis: [t]
>NUTS: [mu]
 |████████████| 100.00% [8000/8000 00:24<00:00 Sampling 4 chains, 0 divergences]Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 46 seconds.

image

Which gave results not too different from a hierarchical model like this one:

## hierarchical model using estimates as observed
n = np.array([N1, N2])
x = np.concatenate([m1, m2])
s = np.array([s1, s2])
j = np.concatenate([np.repeat(0,N1), np.repeat(1,N2)])

with pm.Model() as mod:
    ms = pm.HalfNormal("ms", sigma=1)
    ml = pm.HalfNormal("ml", sigma=1)
    m = pm.Gamma("m", mu=ml, sigma=ms, shape=len(n))
    sigma = pm.HalfNormal("sigma", s.mean())
    y = pm.Gamma("y", mu=m[j], sigma=sigma, observed=x)
    mu = pm.Deterministic("mu", m.mean())
    idata_b = pm.sample()
    
az.plot_posterior(idata_b, var_names=["mu"], hdi_prob=0.9)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [ms, ml, m, sigma]
 |████████████| 100.00% [8000/8000 00:35<00:00 Sampling 4 chains, 0 divergences]Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 56 seconds.

image

Are these approaches appropriate? Are they outside the context/scope of a meta-analytic approach? Thanks.

Ah ok. And apologies, I seemed to go into autopilot and just assumed the goal was a model, rather than just discussion and input :slight_smile: It sounds like a cool problem. From all the thing mentioned in this thread I only really know about meta analysis. That approach could be useful if the data from each lab are basically summary statistics (sample size and some mean and spread parameters and or effect sizes). The specifics about how to deal with that does depend on the form of the data though. I don’t have any immediate “ah ha” about the solution though, but I’ll keep an eye on this thread as I’m interested in what route you take :slight_smile:

1 Like

I have an example here that might be relevant: Grid algorithms for hierarchical models — Think Bayes

It uses a grid algorithm rather than PyMC, but I think it does what you are asking about: a federated implementation of a hierarchical model.

1 Like

Many thanks. I gave it a quick look, but it seems to be a very good option as well. If I’m correct, the parallel updates could work asynchronously as long as I add each new likelihood to the hyper_likelihood array before updating?

Yes, it’s a two-phase update. In phase 1, everyone computes a likelihood based on their data and sends it to the central organizer. The organizer does an update, for each participant, that includes all likelihood functions except their own, and sends the results to each participant. Then each participant does an update based on the all-but-one prior they received.

2 Likes