Using multiple datasets to get a single parameter estimation

annavv · January 9, 2025, 3:17pm

Hi,

I am working on the following problem:

I have 10 datasets (lets say they contain measurements of a X and a Y value and each dataset has a different length).
I want to use pymc to infer the parameters of a model. Lets assume I do a simple linear regression as in the GLM: linear regression notebook glm-linear) .
The envisioned result is one estimation of the slope and one estimation of the intercept based on the model and 10 datasets.
I am using the example notebook (using-data-containers-to-mutate-data) as a reference to fit the same model to several datasets.
- But this gives me a list with 10 “idatas” with 10 estimations and thus 10 slopes and 10 intercepts.

What methodology could use to combine the results to get a best guess of the parameters?

Side mark:
I was looking into the part of named dimensions with data containers (named-dimensions-with-data-containers) to see if I can modify this method to make my problem fit into this format, but my datasets have different sizes and no overlapping X values.

jessegrabowski · January 9, 2025, 7:12pm

pm.set_data is not what you want (as you already found). If you just have tabular data, you should just stack it all up and run a single model. If you don’t, I’d need more information to recommend an approach.

annavv · January 13, 2025, 8:42am

Thanks for your response.
This is what I already thought the solution would be.

About “stacking it all up”: is this the way of working?

Assume the model for which i do parameter a fit is nonlinear and looks like this:
Y = some_non_linear_model(X, param_a, param_b)

and the code to setup the pymc model looks something like this:

model = pm.Model()

with model:
    param_a = pm.Normal("param_a", mu=1, sigma=2)
    param_b = pm.Normal("param_b", mu=1, sigma=2)
    
    # Expected value of outcome
    mu = some_non_linear_model(x_obs, param_a, param_b)

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal("Y_obs", mu=mu, sigma=0.5, observed=y_obs)

I can get this to work with 1 single x_obs and a list of y’s: y_obs = [y_1, ..., y_10]
But i am not sure how to handle x_obs (being a list of x_1 up to x_10 with all different lengths) and thus mu to work with multiple observations.

jessegrabowski · January 13, 2025, 8:50am

You are going to have to be more concrete about the shapes involved in your problem. If you have two datasets (X_1, y_1) and (X_2, y_2) with shapes (n_1 \times k), (n_1,) and (n_2 \times k), (n_2, ), then you simply form X = np.concat([X_1, X_2], axis=0), y= np.concat([y_1, y_2], axis=0) and then mu = f(X, a, b) has shape (n_1 + n_2, ).

annavv · January 13, 2025, 10:08am

Thanks, I think I was making it too complex… I will try if this works and will come back to that .

I might need some shifting and re-indexing of my timeseries to make it work.

ricardoV94 · January 13, 2025, 10:35am

You can also just keep the logic separate and add 10 observed variables without trying to concatenate or stack them. If they are using the same unobserved variables the information will flow the same way and you get the posterior corresponding to the multiple evidences

annavv · January 13, 2025, 11:15am

That is actually the most easiest way of making this work. And with about 10 observations this is also not that much of an effort.

Thanks for your help

This is how I did interpret your last suggestion


  # Expected value of outcome
  mu1 = some_non_linear_model(X_obs[0], a, b)
  mu2 = some_non_linear_model(X_obs[1],  a, b)


  # Likelihood (sampling distribution) of observations
  Y_obs = pm.Normal("Y_obs", mu=mu1, sigma=1, observed=observed_data[0])
  Y_obs2 = pm.Normal("Y_obs2", mu=mu2, sigma=1, observed=observed_data[1])

ricardoV94 · January 13, 2025, 11:32am

Yes

Topic		Replies	Views
Computing likelihood multiple datasets v5 modeling	9	28	August 10, 2025
Model Fitting using datasets of different observation instruments Questions	11	2553	April 15, 2020
Combining multiple models into single model for uncertainty propagation v5 gaussian_process , modeling , bart	4	527	March 18, 2024
How do I combine two models into one larger model? modeling	11	1072	July 31, 2023
Including Observations in the Model, a beginner question version agnostic gaussian_process , modeling , sampling	14	127	October 20, 2024

Using multiple datasets to get a single parameter estimation

Related topics