Computing likelihood multiple datasets

TroubledStudent · August 9, 2025, 12:10pm

Hello there,

Excuse me if this wasn’t posted under the right forum/tag.

I am currently using the PyMC built-in Metropolis-Hastings algorithm to estimate parameters of a given exponential model.

I would like to use the data of multiple datasets to compute the likelihood of a given parameter set. In the tutorial this is shown for a single dataset as follows:

Y_obs = pm.Normal(“Y_obs”, mu=mu, sigma=sigma, observed=Y)

My question now is how to combine this for multiple, independent datasets? Especially when they have different numbers of rows? Does anyone have a clue?

jessegrabowski · August 10, 2025, 4:39am

You’ll need to provide some more details, your post is too vague with respect to exactly what you need. Do you want the same parameters for both datasets? Or different parameters? Do the datasets have the same features? What have you tried?

TroubledStudent · August 10, 2025, 5:17am

Thank you for your reply!

I am indeed trying to infer the same set of parameters for all the datasets, but want to include all of their likelihoods. The datasets have the same features.

I have read elsewhere that by adding multiple likelihood function lines of code, the combined likelihood gets computed behind the scenes. So I have tried the following:

y_like2 = pm.Normal(“y_like2”, mu=mu, sigma=sigma, observed=data2)

y_like3 = pm.Normal("y_like3", mu=mu, sigma=sigma, observed=data3)

However, the error that is raised after this is that the input shapes (data2 and data3) are uneven. This is true, because the observed datasets have different amounts of records.

My question now is how to make it work despite the uneven shapes. It isn’t uncommon for datasets modeling the same thing to have different amounts of records, right?

jessegrabowski · August 10, 2025, 5:37am

I guess because mu has the same shape as the data (it comes from a regression or something?)

Is there any reason why you can’t just concatenate the datasets together? Otherwise you will need to compute a separate mu for each observed outcome distribution.

TroubledStudent · August 10, 2025, 6:21am

The mu comes from the following:

# Priors for parameters

alpha = pm.HalfNormal("alpha", sigma=2)         # initial capacity > 0

b     = pm.HalfNormal("b", sigma=1)             # decay rate > 0

sigma = pm.HalfNormal("sigma", sigma=0.1)       # observation noise

# degradation model

mu = alpha * pm.math.exp(-b * t)

And the datasets are separate time series, which is why I wanted to include them separately… Do you think adding an extra column to uniquely identify the datasets might work if I concatenate them?

jessegrabowski · August 10, 2025, 6:25am

Where does t come from? I assume it’s just a column of feature values that happens to represent time. In this case, you can indeed just concatenate everything, because all the rows of your data are conditionally independent once you know the value of t for that row. You don’t have a recursive time series function like x_t = f(x_{t-1}), so there’s no need to handle time in a special way.

You can shuffle the rows of a single dataset and check that you get the same answer to convince yourself this is the case.

TroubledStudent · August 10, 2025, 6:46am

In this case, it doesn’t matter if for example the query t = 10 gives multiple records? I am worried the data will look like it is representing some form of periodicity.

jessegrabowski · August 10, 2025, 6:46am

Why are you querying?

TroubledStudent · August 10, 2025, 6:49am

Ah sorry, to clarify: I am not querying, but I meant that at a given time step you will acquire values from both datasets. And I was wondering if that was problematic.

jessegrabowski · August 10, 2025, 6:54am

You can add a column to identify which dataset the row is from

TroubledStudent · August 16, 2025, 5:39pm

Dear Jesse,

Excuse me for my late reply. There were some urgent personal matters this week that took away all my attention.

And also thank you for your swift response! This perfectly resolved my issue, and I can now continue with my thesis.

Best regards!

Topic		Replies	Views
Using multiple datasets to get a single parameter estimation v5 modeling	7	97	January 13, 2025
Model Fitting using datasets of different observation instruments Questions	11	2560	April 15, 2020
How to do MCMC sampling with Multiple likelihood functions？ Questions	4	1516	November 23, 2020
Passing observations of varying size to a likelihood function Questions	2	813	May 22, 2019
How to implement the product of multiple likelihood functions Questions	4	1462	January 29, 2020

Computing likelihood multiple datasets

Related topics