Problem model definition PyMC3

Hi, I have a custom likelihood and I’m trying to determine the posterior probabilty for my parameters , given Uniform priors. I’ve written (trying to follow https://arxiv.org/pdf/1507.08050.pdf):

with Model() as paper:
#priors
    m =  Uniform('m',0.,1.)
    q =  Uniform('q',0.,5.)
    sigmay = Uniform('sigmay',0.,1.)

# deterministic rule
    logEpeak_i=(q)+(m*logEiso)

#likelihood definition   
    def likelihoodMCMC(m,q,sigmay,logEiso,logEpeak_i,logErrEiso,logErrEpeak_i):
return np.exp(0.5*np.sum(np.log(np.divide(1,2*np.pi*(sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2))) - np.divide((logEpeak_i - m*logEiso - q)**2,sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2)))

Data likelihood

    obs_peaks=DensityDist('obs_peak', likelihoodMCMC,observed=(logEiso,logEpeak_i,logErrEiso,logErrEpeak_i))

Obtaining the following error:

TypeError: likelihoodMCMC() missing 6 required positional arguments: 'q', 'sigmay', 'logEiso', 'logEpeak_i', 'logErrEiso', and 'logErrEpeak_i'

while, changing the last part in:

obs_peaks=DensityDist('obs_peak', likelihoodMCMC(m,q,sigmay,logEiso,logEpeak_i,logErrEiso,logErrEpeak_i),observed=(logEiso,logEpeak_i,logErrEiso,logErrEpeak_i))

I obtain the following:

TypeError: sum() got an unexpected keyword argument 'out'

Could you suggest me something, please?

Thank you in advance

You need to define the observed in a python dict, eg:

Otherwise, you need to wrap it in another function, eg:
http://docs.pymc.io/notebooks/gaussian-mixture-model-advi.html?highlight=densitydist

1 Like

Ok, but now I obtain the following error:

Traceback (most recent call last):
File "MCMC.py", line 105, in <module>
obs_peaks=DensityDist('obs_peak', likelihoodMCMC,observed=   {'logEiso':logEiso,'logEpeak_i':logEpeak_i,'logErrEiso':logErrEiso,'logErrEpeak_i':logErrEpeak_i})
 File "/Users/renatomartone/anaconda/lib/python3.6/site-packages/pymc3/distributions/distribution.py", line 31, in __new__
return model.Var(name, dist, data)
File "/Users/renatomartone/anaconda/lib/python3.6/site-packages/pymc3/model.py", line 294, in Var
model=self)
File "/Users/renatomartone/anaconda/lib/python3.6/site-packages/pymc3/model.py", line 623, in __init__
self.logp_elemwiset = distribution.logp(**self.data)
TypeError: likelihoodMCMC() missing 3 required positional arguments: 'm', 'q', and 'sigmay'

While, specifying all the arguments, I obtain:

Traceback (most recent call last):
File "MCMC.py", line 105, in <module>
obs_peaks=DensityDist('obs_peak', likelihoodMCMC(m,q,sigmay,logEiso,logEpeak_i,logErrEiso,logErrEpeak_i),observed={'logEiso':logEiso,'logEpeak_i':logEpeak_i,'logErrEiso':logErrEiso,'logErrEpeak_i':logErrEpeak_i})
File "MCMC.py", line 102, in likelihoodMCMC
return np.exp(0.5*np.sum(np.log(np.divide(1,2*np.pi*(sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2))) - np.divide((logEpeak_i - m*logEiso - q)**2,sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2)))
File "/Users/renatomartone/anaconda/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1812, in sum
return sum(axis=axis, dtype=dtype, out=out, **kwargs)
TypeError: sum() got an unexpected keyword argument 'out'

In terms of the first error, that’s because you need to also specify m,q,sigmay in observed:

observed={'m':m, 'q':q, 'sigmay':sigmay, 
          'logEiso':logEiso, 'logEpeak_i':logEpeak_i, 
          'logErrEiso':logErrEiso, 'logErrEpeak_i':logErrEpeak_i})

As for the second error, you need to rewrite likelihoodMCMC into a Theano function:

def likelihoodMCMC(m,q,sigmay,logEiso,logEpeak_i,logErrEiso,logErrEpeak_i):
    return tt.exp(0.5*tt.sum(
            tt.log(1/(2*np.pi*(sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2))) -
            (logEpeak_i - m*logEiso - q)**2/(sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2)))

(BTW, I did not run the code above, so please double check the likelihoodMCMC function)

Thanks, but I don’t understand why I should consider m, q and sigmay as observed since that are not data. The data are represented by ‘logEiso’, ‘logEpeak’, logErrEiso and logErrEpeak_i. It is not clear to me if I should (at this stage) write my likelihood as function of both fit parameters and data or function of only data or only fit parameters.

In pm.DensityDist, observed are the parameters and data you pass to the custom log-likelihood function.

If you want to make it more clear, you can also rewrite the logp function like this:

# Log likelihood function
def logp_function(parameters):
    def logp(observed):
        some_scaler = some_function(parameters, observed)
        return some_scaler

    return logp # returns a function

In your case, it would be:

def likelihoodMCMC(m, q, sigmay):
    def logp(value):
        return tt.exp(0.5*tt.sum(
            tt.log(1/(2*np.pi*(sigmay**2 + value[3]**2 + (m*value[2])**2))) -
            (value[1] - m*value[0] - q)**2/(sigmay**2 + value[3]**2 + (m*value[2])**2)))
    return logp

where value = (logEiso, logEpeak_i, logErrEiso, logErrEpeak_i)

Then you can call it as:

with pm.Model() as model:
    ...
    obs_peaks = pm.DensityDist('obs_peak', likelihoodMCMC(m,q,sigmay), 
               observed=(logEiso, logEpeak_i, logErrEiso, logErrEpeak_i))
1 Like

Thank you! As you kindly suggested, I’ve modified my code as follows:

def likelihoodMCMC(m,q,sigmay):
    def logp(logEiso,logEpeak_i,logErrEiso,logErrEpeak_i):
        return 0.5*tt.sum(tt.log((1/(2*np.pi*(sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2)))) - ((logEpeak_i - m*logEiso - q)**2)/(sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2))
    return logp

with Model() as paper:
#priors
 m =  Uniform('m',0.,1.)
 q =  Uniform('q',0.,5.)
 sigmay = Uniform('sigmay',0.,1.)

#deterministic relation
logEpeak_i=(q)+(m*logEiso)

obs_peaks=pm.DensityDist('obs_peak', likelihoodMCMC(m,q,sigmay),observed=(logEiso,logEpeak_i,logErrEiso,logErrEpeak_i))

but now I return the following:

Traceback (most recent call last):
File "MCMC.py", line 124, in <module>
obs_peaks=pm.DensityDist('obs_peak', likelihoodMCMC(m,q,sigmay),observed=(logEiso,logEpeak_i,logErrEiso,logErrEpeak_i))
File "/Users/renatomartone/anaconda/lib/python3.6/site-packages/pymc3/distributions/distribution.py", line 31, in __new__
return model.Var(name, dist, data)
File "/Users/renatomartone/anaconda/lib/python3.6/site-packages/pymc3/model.py", line 303, in Var
distribution=dist, model=self)
File "/Users/renatomartone/anaconda/lib/python3.6/site-packages/pymc3/model.py", line 583, in __init__
data = as_tensor(data, name, model, distribution)
File "/Users/renatomartone/anaconda/lib/python3.6/site-packages/pymc3/model.py", line 537, in as_tensor
data = pandas_to_array(data).astype(dtype)
ValueError: setting an array element with a sequence.

Could you please provide the data logEiso,logEpeak_i,logErrEiso,logErrEpeak_i?

Also, if the logEpeak_i is deterministic, you should just computed it within the logp function

1 Like

I imported data in the following way:

name, z, Eiso, ErrEiso, Epeak_i, ErrEpeak_i = np.genfromtxt('epeiso.dat', unpack=True,comments='#')

And calculated those quantities as follows:

logEpeak_i=np.log10(Epeak_i)
logEiso=np.log10(Eiso)
logErrEpeak_i=ErrEpeak_i/(Epeak_i*np.log(10))
logErrEiso=ErrEiso/(Eiso*np.log(10))

“Also, if the logEpeak_i is deterministic, you should just computed it within the logp function”

To be clearer, logEiso and logEpeak_i are the quantities connected by the deterministic formula and for which I have observations with relative errors (logErrEiso and logErrEpeak_i).
I’m now trying to get the posterior distribution for m, q and sigmay, that represent the parameters of the linear relationship between them (they are three because sigmay represents an additional parameter connected with the fact that I’m handling logarythms).

What I meant is could you provide (a subset) of data here, as it is usually difficult to see where the error comes from without seeing the actual data.

If you are doing logEpeak_i=(q)+(m*logEiso) after with pm.Model(): it basically overwrite logEpeak_i=np.log10(Epeak_i) you were doing after you import the data.

Again, it would be much easier if you could provide the code and data, so we can run and reproduce the error.

1 Like

Sorry, that’s my subset and data.

share.txt (738 Bytes)
MCMC_share.py (2.4 KB)

Thank you!

OK, given that I dont know much of what you are trying to model, I suggest the following:

def likelihoodMCMC(m,q,sigmay,logEiso,logEpeak_i,logErrEiso,logErrEpeak_i):
    return tt.exp(0.5*tt.sum(
            tt.log(1/(2*np.pi*(sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2))) -
            (logEpeak_i - m*logEiso - q)**2/(sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2)))
    
with pm.Model() as paper:
    #priors
    m =  pm.Uniform('m',0.,1.)
    q =  pm.Uniform('q',0.,5.)
    sigmay = pm.Uniform('sigmay',0.,1.)

    #deterministic relation
    # logEpeak_i=(q)+(m*logEiso)

    obs_peaks = pm.DensityDist('obs_peak', likelihoodMCMC, 
               observed={'m':m, 'q':q, 'sigmay':sigmay, 
                          'logEiso':logEiso, 'logEpeak_i':logEpeak_i, 
                          'logErrEiso':logErrEiso, 'logErrEpeak_i':logErrEpeak_i})
    # obtain starting values via MAP
    start = pm.find_MAP()
    # draw 2000 posterior samples
    trace = pm.sample(2000, start=start, njobs=4)
pm.traceplot(trace);

By writing the logp function this way, the code would run even with logEpeak_i=(q)+(m*logEiso) uncomment.

Again, dont be too hung up on the observed=, they are just inputs into your logp function. You can define the logp inside with pm.Model() as paper: and remove the m,q,sigmay in the function input.

with pm.Model() as paper:
    #priors
    m =  pm.Uniform('m',0.,1.)
    q =  pm.Uniform('q',0.,5.)
    sigmay = pm.Uniform('sigmay',0.,1.)

    #deterministic relation
    # logEpeak_i=(q)+(m*logEiso)
    def likelihoodMCMC(logEiso,logEpeak_i,logErrEiso,logErrEpeak_i):
        return tt.exp(0.5*tt.sum(
                tt.log(1/(2*np.pi*(sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2))) -
                (logEpeak_i - m*logEiso - q)**2/(sigmay**2 + logErrEpeak_i**2 + (m*logErrEiso)**2)))

    obs_peaks = pm.DensityDist('obs_peak', likelihoodMCMC, 
               observed={'logEiso':logEiso, 'logEpeak_i':logEpeak_i, 
                         'logErrEiso':logErrEiso, 'logErrEpeak_i':logErrEpeak_i})
    # obtain starting values via MAP
    start = pm.find_MAP()
    # draw 2000 posterior samples
    trace = pm.sample(2000, start=start, njobs=4)
pm.traceplot(trace);
1 Like

Perfect, the code seems to work well now. Thank yout for your precious help!

1 Like