How to set up a custom likelihood function for two variables


#1

In my case, likelihood function was defined as f(theta_1, theta_2), the expression is already known not a standard distribution. Prior of theta_1 and theta_2 are following lognormal distribution. How can I get the posterior distribution with certain sampling method?

My code

with pm.Model() as model:
    
    theta_1 = pm.Lognormal('theta_1', mu=mu_1, sd=sigma_1)
    theta_2 = pm.Lognormal('theta_2', mu=mu_2, sd=sigma_2)     
    like = pm.DensityDist('like', likelihood(theta_1,theta_2),shape=(2,))
                           
    start = pm.find_MAP(model = model)
    step = pm.Metropolis()
    trace = pm.sample(10000, step = step, start = start)

I got the error: ‘TensorVariable’ object is not callable. I haven’t found a proper solution yet. How can I define two variables for a joint likelihood in pymc3


How to use a Deterministic in a mixture?
pm.Potential without Theano?
Error while using DensityDist
#2

pm.DensityDist needs to evaluate on an observed value, you should pass a function to it instead of a tensor. For example, in this case you should pass likelihood instead of the output of likelihood(theta_1, theta_2):

exp_surv = pm.DensityDist('like', likelihood, observed={'theta_1':theta_1, 'theta_2':theta_2})

Otherwise, you can also add the custom logp directly to the logp as a potential:

like = pm.Potential('like', likelihood(theta_1,theta_2))

#3

Thanks for the information. But still it seems like it skips the likelihood function, just samples from the prior distribution.
The whole problem is summarized like: theta_1,theta_2 are the control parameters, given by lognormal prior distribution, they somehow determine the observations f_1, f_2 through a function. Given a pair of theta, f_1 and f_2 could be calculated.

Now, I want to infer theta given by an observation of fm_1,fm_2. Likelihood function is set up by a custom function.

Based on your advise, I tried both of them, I don’t know where is wrong:
(1) The effect of the observation has already been taken into account. The likelihood function just simply loop through all the parameters. pm.Potential was used.

with pm.Model() as model:
    
    theta_1 = pm.Lognormal('theta_1', mu=mu_1, sd=sigma_1)
    theta_2 = pm.Lognormal('theta_2', mu=mu_2, sd=sigma_2) 

    def likelihood(theta_1,theta_2):
        a = m_1*m_2
        b = m_1*k_2*theta_2+m_2*k_2*theta_2+m_2*k_1*theta_1
        c = theta_1*theta_2*k_1*k_2
        
        # Controlled parameter
        i0 = 9
        
        w_1_square = (b-np.sqrt(b**2-4*a*c))/(2*a)
        w_2_square = (b+np.sqrt(b**2-4*a*c))/(2*a)
        
        f_1_square = w_1_square/(4*np.pi**2)
        f_2_square = w_2_square/(4*np.pi**2)
        
        return np.exp(-2**(i0-2)*((f_1_square/fm_1**2-1)**2+
                            (f_2_square/fm_2**2-1)**2))
        
    
    like = pm.Potential('like',likelihood(theta_1,theta_2))
        
    
    start = pm.find_MAP(model = model)
    step = pm.Metropolis()
    trace = pm.sample(50000, step = step,start=start)
    
pm.traceplot(trace)
print(pm.summary(trace))

(2) The second method is to consider the observation separately: using pm.DensityDist

with pm.Model() as model:
    
    theta_1 = pm.Lognormal('theta_1', mu=mu_1, sd=sigma_1)
    theta_2 = pm.Lognormal('theta_2', mu=mu_2, sd=sigma_2) 

    def likelihood(f_1, f_2):
        a = m_1*m_2
        b = m_1*k_2*theta_2+m_2*k_2*theta_2+m_2*k_1*theta_1
        c = theta_1*theta_2*k_1*k_2
        
        # Controlled parameter
        i0 = 9
           
        f_1_square = (b-np.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)
        f_2_square = (b+np.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)
        
        return np.exp(-2**(i0-2)*((f_1_square/f_1**2-1)**2+
                            (f_2_square/f_2**2-1)**2))
        
    like = pm.DensityDist('like',likelihood, observed={'f_1':fm_1,'f_2':fm_2})
        
    
    start = pm.find_MAP(model = model)
    step = pm.Metropolis()
    trace = pm.sample(10000, step = step,start=start)

Apparently, they are sampling from the prior distribution. I just start to use PyMc3 this week and fail to figure out what went wrong.


#4

Your set up is not quite right in both cases. Put it this way, your logp should be evaluted based on the input of both the observed data and the parameters. So if you think of your logp as a function, it should has (f_1, f_2, theta_1, theta_2) as inputs. The differences between Potential and Densitydist is that, in Potential you dont distinguish between data and parameters (i.e., everything is just input to the logp), whereas in Densitydist you set up the parameterization, and evaluate on the data (the logp conditioned on the parameters).

(1) Using Potential:

def likelihood(theta_1, theta_2, f_1, f_2):
    a = m_1*m_2
    b = m_1*k_2*theta_2+m_2*k_2*theta_2+m_2*k_1*theta_1
    c = theta_1*theta_2*k_1*k_2

    # Controlled parameter
    i0 = 9

    f_1_square = (b-tt.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)
    f_2_square = (b+tt.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)

    return tt.exp(-2**(i0-2)*((f_1_square/f_1**2-1)**2+
                        (f_2_square/f_2**2-1)**2))

with pm.Model() as model:
    
    theta_1 = pm.Lognormal('theta_1', mu=mu_1, sd=sigma_1)
    theta_2 = pm.Lognormal('theta_2', mu=mu_2, sd=sigma_2) 

    like = pm.Potential('like', likelihood(theta_1, theta_2, fm_1, fm_2))
    
    trace = pm.sample()

(1) Using Densitydist:

def likelihood(theta_1, theta_2):
    
    def logp_(value):
        a = m_1*m_2
        b = m_1*k_2*theta_2+m_2*k_2*theta_2+m_2*k_1*theta_1
        c = theta_1*theta_2*k_1*k_2

        # Controlled parameter
        i0 = 9

        f_1_square = (b-tt.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)
        f_2_square = (b+tt.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)

        return tt.exp(-2**(i0-2)*((f_1_square/value[0]**2-1)**2+
                            (f_2_square/value[1]**2-1)**2))
    
    return logp_
    

with pm.Model() as model:
    
    theta_1 = pm.Lognormal('theta_1', mu=mu_1, sd=sigma_1)
    theta_2 = pm.Lognormal('theta_2', mu=mu_2, sd=sigma_2) 

    like = pm.DensityDist('like', likelihood(theta_1, theta_2), observed=[fm_1, fm_2])
    
    trace = pm.sample()

Also notices that I am sampling using the default setting. It is much better than Metropolis with find_MAP, which we now strongly discouraged.

You can see more information from other post on the discourse: Difference between DensityDist and Potential?, Problem with pymc3.DensityDist() #2846, Likelihood evaluation and DensityDist


Frequently Asked Questions
Model Fitting using datasets of different observation instruments
#5

Thanks for the response. However, I still encountered some errors. I took the method like you suggested, but get a error like this:

CustomizablePickler(buffer, self._reducers).dump(obj)

AttributeError: Can't pickle local object 'likelihood.<locals>.obs'

Here is my complete code.

# Underlying parameters for prior 1
sigma_1 =  0.49786792462
mu_1 = 0.510236734833

# Underlying parameters for prior 2
sigma_2 = 0.62667474617
mu_2 = 0.169577686173

# Physical properties
# Unit as default(KG)
k_1 = 29.7*10**(6)
k_2 = 29.7*10**(6)

m_1 = 16.5*10**(3)
m_2 = 16.1*10**(3)

# Frequency observations
fm_1 = 3.13
fm_2 = 9.83 
     
with pm.Model() as model:    
    theta_1 = pm.Lognormal('theta_1', mu=mu_1, sd=sigma_1)
    theta_2 = pm.Lognormal('theta_2', mu=mu_2, sd=sigma_2) 

    def likelihood(theta_1,theta_2):        
        def obs(f):
            a = m_1*m_2
            b = m_1*k_2*theta_2+m_2*k_2*theta_2+m_2*k_1*theta_1
            c = theta_1*theta_2*k_1*k_2
    
        # Contour parameter
            i0 = 9
        
            f_1_square = (b-tt.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)
            f_2_square = (b+tt.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)
    
            return tt.exp(-2**(i0-2)*((f_1_square/f[0]**2-1)**2+
                            (f_2_square/f[1]**2-1)**2))
        return obs

    like = pm.DensityDist('like', likelihood(theta_1,theta_2), observed=[fm_1,fm_2])

    start = pm.find_MAP(model = model)
    step = pm.Metropolis()
    trace = pm.sample(10000,step=step,start=start)

pm.traceplot(trace)

Another potential method seems not working as well. Still only sampling from prior, likelihood function doesn’t play a role at all. As this whole posterior can be integrated, the marginalized distribution of theta_1 should have two peaks, which is not in my script.

def likelihood(theta_1, theta_2, fm_1, fm_2):
    a = m_1*m_2
    b = m_1*k_2*theta_2+m_2*k_2*theta_2+m_2*k_1*theta_1
    c = theta_1*theta_2*k_1*k_2

# Pattern parameter
    i0 = 9
   
    f_1_square = (b-tt.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)
    f_2_square = (b+tt.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)

    return tt.exp(-2**(i0-2)*((f_1_square/fm_1**2-1)**2+
                        (f_2_square/fm_2**2-1)**2))  

with pm.Model() as model: 
    theta_1 = pm.Lognormal('theta_1', mu=mu_1, sd=sigma_1)
    theta_2 = pm.Lognormal('theta_2', mu=mu_2, sd=sigma_2)             
    like = pm.Potential('like',likelihood(theta_1,theta_2,fm_1,fm_2))

#6

The pickling issue is because it cannot pickle a function for parallel chain sampling. Try this instead:

def likelihood(theta_1, theta_2, f_1, f_2):
    a = m_1*m_2
    b = m_1*k_2*theta_2+m_2*k_2*theta_2+m_2*k_1*theta_1
    c = theta_1*theta_2*k_1*k_2

    # Contour parameter
    i0 = 9

    f_1_square = (b-tt.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)
    f_2_square = (b+tt.sqrt(b**2-4*a*c))/(2*a)/(4*np.pi**2)

    return tt.exp(-2**(i0-2)*((f_1_square/f_1**2-1)**2 +
                              (f_2_square/f_2**2-1)**2))


with pm.Model() as model:
    theta_1 = pm.Lognormal('theta_1', mu=mu_1, sd=sigma_1)
    theta_2 = pm.Lognormal('theta_2', mu=mu_2, sd=sigma_2)

    like = pm.DensityDist('like', likelihood, observed=dict(theta_1=theta_1,
                                                            theta_2=theta_2,
                                                            f_1=fm_1,
                                                            f_2=fm_2))

I will have a quick look at your model.


#7

You are right: looks like your model is just sampling from the prior. I would suggest you to double check your implementation of the logp function, as I dont know much of what you are trying to implement here.

I would take a guess here that you should return the logp instead of the pdf:

    return -2**(i0-2)*((f_1_square/fm_1**2-1)**2 +
                       (f_2_square/fm_2**2-1)**2)

which gives:

pm.traceplot(trace, combined=True,
            priors=[theta_1.distribution, theta_2.distribution]);


#8

Yes, indeed. I didn’t pay attention to the log probability thing. And I’ve check previous pm.Potential syntax, it works now. Thanks for the elaboration.


#9

No problem! Also, FYI, sampling multi-modal posterior is somewhat difficult, especially if the modes are well separated. You should use lots of chains with different starting points. Also I recommend you to try the SMC sampler http://docs.pymc.io/notebooks/SMC2_gaussians.html