Chainsail, the web service for sampling multimodal probability distributions, is now open-source!

simeoncarstens · March 2, 2023, 11:22am

In August '22, I presented Chainsail, an internal project my colleagues and me over at Tweag have been working on for quite a while. It is a cloud-based web service that implements Parallel Tempering / Replica Exchange to help with sampling multimodal probability distributions. You can learn more about Chainsail in our announcement blog post.

Back then, we invited users to participate in a beta test, but it was a closed-source project. Today we’re happy to announce that Chainsail is now open source!

The Chainsail development team encourages all kinds of contributions to the Chainsail code base and is looking forward to see reuses of parts of the Chainsail code for other projects. A blog post on the occasion of the open-source release outlines the service architecture, points to relevant parts of the source code and proposes a couple of future extensions we at Tweag would, together with interested community members, enjoy working on.

The Chainsail source code is available in the following GitHub repository: https://github.com/tweag/chainsail

Don’t hesitate to hit us with any questions or comments about the project, either in this thread or via a GitHub issue!

ckrapu · March 9, 2023, 3:23pm

Neat idea. Do you think the architecture allows for easy message passing within chains to accommodate large models?

Swapping state elements for parallel tempering makes sense, but for models with okay mixing but very, very large datasets & parameter sets, it might be nice to give each worker a subset of the data and then use message passing for the Gibbs updates across workers.

simeoncarstens · March 14, 2023, 9:00am

@ckrapu I’m not sure I fully understand what you would like to do. Off the top of my head I’d say that the existing architecture (and here I think this really only concerns the MPI Replica Exchange implementation) can be easily adapted to make a kind of distributed Gibbs sampler. MPI is only used to pass around the state and the log-probabilities, and for a distributed Gibbs sampler you would just need to modify that part to pass around the state in one sense only (from, say, replica A to B, instead of also B and A as in Replica Exchange) and get rid of the log-probability passing.
But given the sequential nature of Gibbs sampling, wouldn’t you need the sampler for one variable always have to wait for a new value of a different variable to arrive? Seems like this would not allow for actual parallel sampling then.

Also, how does splitting up the data lead to an algorithm that yields samples from the desired distribution (of all parameters given the full data)?

I’d love to understand better - I’d also be happy to hop on a call to discuss, so I can give a more useful answer here

Topic		Replies	Views
Chainsail, a web service for sampling multimodal distributions: opinions and beta testers wanted! Sharing sampling	3	627	August 16, 2022
GSoC 2020 Project proposal Development	5	690	March 20, 2020
Gsoc 2023 proposal feedback Development gsoc2023	8	509	April 2, 2023
Contribute to PyMC4 PyMC4	3	1296	December 26, 2019
Participation Needed: Probabilistic Programming Study	1	65	March 10, 2025

Chainsail, the web service for sampling multimodal probability distributions, is now open-source!

Related topics