Model Fitting using datasets of different observation instruments

HenryTeng · May 29, 2018, 2:42am

Hello,

I would like to fit a model with two data sets given by two different observations.
Take linear regression as an example. My code is as follow:

test.py (1.4 KB)

The model is simply
y = alpha + beta*X
with two observation data sets (X0, Y0, err0) and (X1, Y1, err1)
and 4 parameters (alpha, beta, s0, s1)

I would like my log likelihood of posterior probability (Normal distributed) to be:

-0.5* ( \Sigma( \frac{(Y_0 - mu_0)^2}{(err_0^2 + s_0^2)} + log( 2\pi (err_0^2 + s_0^2) ) + \Sigma( \frac{(Y_1 - mu_1)^2}{(err_1^2 + s_1^2)} + log( 2\pi(err_1^2 + s_1^2) ))

How should I define
pm.Normal(‘Y_obs’, mu= ??, sd= ??, observed= ??)

Thank you very much

junpenglao · May 29, 2018, 5:52am

(I edited your post so that the formula is more compact with latex display)

The easiest way would be to define two observed, which internally added together.

But your model set up is a bit unconventional - in linear regression you dont usually observed the error explicitly, but here you are assuming you observed the error at each measure?

HenryTeng · May 29, 2018, 6:23am

I tried to use pymc3.DensityDist to define a new probability density:

with basic_model:
    # Priors for unknown model parameters
    alpha  = pm.Uniform('a',  0.0, 3.0)
    beta   = pm.Uniform('b',  0.0, 3.0)
    s0     = pm.Uniform('s0', 0.0, 3.0)
    s1     = pm.Uniform('s1', 0.0, 3.0)
    # Expected value of outcome
    mu0    = alpha + beta * X0
    mu1    = alpha + beta * X1
    def logp(mu0,mu1):
        return -0.5* (
        ( (Y0 - mu0) ** 2 / (err0 ** 2 + s0 ** 2)  + np.log( 2*np.pi * (err0 ** 2 + s0 ** 2) )).sum()
        + ( (Y1 - mu1) ** 2 / (err1 ** 2 + s1 ** 2)  + np.log( 2*np.pi * (err1 ** 2 + s1 ** 2) )).sum()
        )
    Y_obs = pm.DensityDist('Y_obs', logp(mu0,mu1), observed=(Y0, Y1) )

But I have a ValueError: setting an array element with a sequence.
So how should I correct it?
Or should I correct the sentences with the theano.tensor?

rlouf · May 29, 2018, 6:56am

A hierarchical model would be the way to go here, since the instruments are different. For instance:

alpha = pm.Normal('alpha', 0, sd=100, shape=2)
beta = pm.Normal('alpha', 0, sd=100, shape=2)

mu0 = alpha[0] + beta[0]*X0
mu1 = alpha[1] + beta[1]*X1

results0 = pm.Normal('results0', mu0, err0^2+s0^2, observed=Y0)
results1 = pm.Normal('results1', mu1 err1^2+s1^2, observed=Y1)

I forgot the priors on s_1 and s_0 but you get the idea. If you really want to make it work with DensityDist I found that you need to use the following trick when passing two variables:

def logp(Y):
    Y_0 = Y[0]
    Y_1 = Y[1]
    return -0.5* (( (Y0 - mu0) ** 2 / (err0 ** 2 + s0 ** 2) ** 0.5 + np.log( 2np.pi * (err0 ** 2 + s0 ** 2) )).sum() 
             + ( (Y1 - mu1) ** 2 / (err1 ** 2 + s1 ** 2) ** 0.5 + np.log( 2np.pi * (err1 ** 2 + s1 ** 2) )).sum())

Y_obs = pm.DensityDist('Y_obs', logp, observed=[Y_0, Y_1])

I don’t know the codebase enough to know why your solution does not work (it is more intuitive). Let me know if this works.

junpenglao · May 29, 2018, 7:03am

For more information of using DensityDist you can also see this post:

rlouf · May 29, 2018, 7:04am

The above solution will probably work because of the way Python deals with scope AND your function is defined after \mu_0 and \mu_1. However, this will break if for some reason (cleaner code usually) you decide to define the log-likelihood before. Do the following instead, it is more explicit:

import functools as ft

def logp(Y, mu0, mu1):
    Y_0 = Y[0]
    Y_1 = Y[1]
    return -0.5* (( (Y0 - mu0) ** 2 / (err0 ** 2 + s0 ** 2) ** 0.5 + np.log( 2np.pi * (err0 ** 2 + s0 ** 2) )).sum() 
             + ( (Y1 - mu1) ** 2 / (err1 ** 2 + s1 ** 2) ** 0.5 + np.log( 2np.pi * (err1 ** 2 + s1 ** 2) )).sum())

likelihood = ft.partial(logp, mu0=mu0, mu1=mu1)
Y_obs = pm.DensityDist('Y_obs', likelihood, observed=[Y_0, Y_1])

More generally, read about functools if you don’t know it yet. It’s awesome.

junpenglao · May 29, 2018, 7:08am

Nice trick! It never occurs to me I can use partial this way - it is way cleaner

HenryTeng · May 29, 2018, 7:29am

Try:
Y_obs = pm.DensityDist('Y_obs', likelihood, observed=dict(Y=[Y0, Y1]))

^ I have tried this sentence, it seems not wok.

HenryTeng · May 29, 2018, 7:38am

Thank you in advance for the information of using DensityDist!

rlouf · May 29, 2018, 7:54am

It has a problem on my side too. I’ll look into it when I have time today!

HenryTeng · May 29, 2018, 8:19am

@junpenglao
@rlouf

Thank you for your help!
I have successfully solve the problem!
I cannot remember who mentioned hierarchical model, but actually that is.
It is a great hint!

Nandita_Khetan · April 15, 2020, 4:05am

Hello,
I have a similar problem but for me the two data sets are of different lengths and using hierarchical model creates error :
ValueError: Input dimension mis-match. (input[0].shape[0] = 26, input[1].shape[0] = 140)
What can I do?
For clarity, the problem is
Y1 = alpha + beta_0*X_0 + beta_1*X_1 + beta_2*X2 + mu
here I know Y1, X0, X1, X2 and mu and the length is 26

Y2 = alpha + beta_0*X_0 + beta_1*X_1 + beta_2*X2 + gamma*Z
here I know Y2 , all Xs and Z and need to estimate: alpha, beta, and gamma altogether. The length here is 140. gamma*Z is equivalent to mu but for the first part I know the mu while in the second part, I want to estimate the gamma so as to calculate mu.
mu in first model is basically acting as a calibrator/hinge for defining alpha and betas.

I would like to do it in one step evaluating all alpha, betas and gamma simultaneously but how do I include mu or gamma*Z for the two different samples.

Please make my life easier
Thank you so much.

Topic		Replies	Views
Problem model definition PyMC3 Questions	12	4725	September 21, 2017
Trouble using DensityDist with a subclassed Op v5	6	776	April 29, 2022
Using multiple datasets to get a single parameter estimation v5 modeling	7	77	January 13, 2025
How to set up a custom likelihood function for two variables Questions	8	13264	March 9, 2018
How to use multiple weighted likelihood functions? v5 modeling	0	166	June 5, 2024

Model Fitting using datasets of different observation instruments

Related topics