Reconciling two datasets using PyMC

ffgg · November 21, 2023, 6:19pm

This is a specific example about a more general question, regarding how to reconcile data from different datasets.

I have 2 datasets related to fishing. One tells me how many hours of fishing was done by each vessel in each area, and another one tells me how much fish each vessel has caught in each area. I also know the gear type and size of most of these vessels.

The datasets are both dirty, with various types of bias and error. For example, I know which type of area/vessel to trust less than others.

I would like to reconcile these two datasets, come up with an estimate of how much fish each vessel has caught, in how many hours, and compute a metric of how much each vessel can catch in 1 hour, which depends on gear and size.

I have a feeling I am in the right place with PyMC, but I have very little experience. I looked for some tutorials on your website, for example I am pretty sure this is relevant for me A Primer on Bayesian Methods for Multilevel Modeling — PyMC example gallery

But I need a nudge in the right direction, many thanks!

Tyler_Biggs · November 21, 2023, 6:37pm

It sounds like you should start with the binomial regression example. If you need more advice, I would recommend posting the header to your data, or a description if sharing is off the table.

Topic		Replies	Views
Beginner question - Comparing two posterior predictive distributions with different number of observed data v5	8	665	July 12, 2023
Alternatives to Bayesian updating for non-shareable data? v5 modeling	17	535	September 25, 2023
Dealing With Missing Data Questions	1	3068	August 18, 2017
Combining the outputs from two models trained using different data Questions	3	769	January 27, 2019
Using multiple datasets to get a single parameter estimation v5 modeling	7	56	January 13, 2025

Reconciling two datasets using PyMC

Related topics