How to model product of multiple random variables

My question is to build a super-likelihood function from several random variables. The distribution of each variable is standard. Using two random variables as an example, the target super-likelihood is like this:
S = F1^w1 * F2^w2 (w1 + w2 = 1)
and
logS = w1 logF1 + w2 log F1 (w1 + w2 = 1)

Where F1 ~ Normal distribution and F2 ~ Bernoulli distribution

I use the following codes

data = <load my data> 

[w1,w2] = [0.5,0.5]

with Model() as model:
    mu = pm.Uniform('mu',lower=0,upper=1)
    sd = pm.Uniform('sd',lower=0,upper=1)
    p = pm.Uniform('p',lower=0,upper=1)

    F1 = pm.Normal("F1", mu = mu, sigma = sd)
    F1 = pm.Bernoulli("F2",p)

    S = pm.Deterministic('S',F1**w1*F2**w2, observed=data)

    step = Metropolis()
    trace = pm.sample(2000, step=step)

But it does not work.

My question is: how to implement such a weighted model in pyMC3?

See eg Observed Deterministic for related discussion.

Hi junpenglao,
Maybe my description of the question is misleading. I do not want to do observed deterministic inference, but try to write codes to build customized likelihood from a Normal and a Bernoulli variable by

logp_ = w1 * ln( LL(feat1) ) + w2 * ln( LL(feat2) )

Where feat1 ~ Bernoulli distribution and feat2 ~ Normal distribution

Some one told me about using Poisson zeros trick (Bernoulli ones trick). But I want to know how to implement it in pymc3 (and want to know how these tricks work).

I am new to python and pymc3, it would be good to have an example that provide detailed annotations to help me learning how to do it and how to expand it to more than two random variables.

Really appreciate if you can help on this.

The following are my complete codes.

When running them, I got errors:
RuntimeError: The communication pipe between the main process and its spawned children is broken.
In Windows OS, this usually means that the child process raised an exception while it was being spawned, before it was setup to communicate to the main process.
The exceptions raised by the child process while spawning cannot be caught or handled from the main process, and when running from an IPython or jupyter notebook interactive kernel, the child’s exception and traceback appears to be lost.
A known way to see the child’s error, and try to fix or handle it, is to run the problematic code as a batch script from a system’s Command Prompt. The child’s exception will be printed to the Command Promt’s stderr, and it should be visible above this error and traceback.
Note that if running a jupyter notebook that was invoked from a Command Prompt, the child’s exception should have been printed to the Command Prompt on which the notebook is running.

Could you please check what is incorrect?

import pymc3 as pm
import numpy as np
import scipy.stats as stats
#simulate samples from a bernoulli variable: feat1
p0 = 0.5 #true p of the Bernoulli
obs_feat1 = stats.bernoulli.rvs(0.5, size=1000)

#simulate samples from a normal variable: feat2
mu0 = 0 #true mu of Gaussian
sigma0 = 1 #true sigma of Gaussian

obs_feat2 = stats.norm.rvs(loc=mu0, scale=sigma0, size=1000)

#use Densitydist to get customized 'joint' distribution for the two random varaibles
#logp_ = w1 * ln( LL(feat1) ) + w2 * ln( LL(feat2) )

(w1,w2) = (0.5,0.5)

import theano.tensor as tt

def log_likelihood(p_1, mu_2, sigma_2, f1, f2, w1, w2):
    #p_1: parameter for bernoulli variable (1st variable)
    #mu_2, sigma_2: parameters for normal variable (2nd varaiable)
    #f1: an observation of feat1
    #f2: an observation of feat2
    #w1, w2: weight for building likelihood

    LL1 = tt.switch( f1, tt.log(p_1), tt.log(1 - p_1) )

    tau_2 = 1. / (sigma_2**2)
    LL2 = (-tau_2 * (f2 - mu_2)**2 + tt.log(tau_2 / np.pi / 2.)) / 2.

    logP = sum(w1*LL1 + w2*LL2)

    return(logP)

with pm.Model() as model:

    p_1 = pm.Uniform('p_1', lower=0, upper=1)
    mu_2 = pm.Uniform('mu_2', lower=-1, upper=1)
    sigma_2 = pm.Uniform('mu_2', lower=0, upper=2)

    like = pm.DensityDist('like', log_likelihood, 
                                                observed=dict(p_1=p_1,
                                                            mu_2=mu_2,
                                                            f1=obs_feat1,
                                                            f2=obs_feat2,
                                                            w1 = w1,
                                                            w2 = w2,)
                     )
    step = pm.Metropolis()
    trace = pm.sample(10000,step=step)

So, if I am gathering everything correctly, you want to create a model whose error (I do not like such Frequentist terms, but I don’t know the appropriate Bayesian way to say it) is distributed according to some custom product of standard distributions (in this case, a Bernoulli and Normal distribution)?

I have never considered such a statistical problem, and I have not seen anyone else try doing this in Pymc3 (though other people may chime in at some point).

How would you do this outside of Pymc3 first? Maybe that can help us figure out how to do this in Pymc3.

Hi Gon_F,

Someone told me that Poisson zeros ones trick can do this. And I found this can be done using JAGS. Please see the link below. I want to do similar thing - using several standard distribution to construct a new likelihood.

http://www.medicine.mcgill.ca/epidemiology/Joseph/courses/common/Tricks.html

Seems like you are looking for a zero inflated poisson distribution.
https://docs.pymc.io/api/distributions/discrete.html#pymc3.distributions.discrete.ZeroInflatedPoisson

1 Like

Could be. But I do not know how to use it. Can you tell me how to use it into my question?

Also, can you take a look at the codes? I draft these codes based on a question you have answered before, but it does not work:

try to write codes to build customized likelihood from a Normal and a Bernoulli variable by

logp_ = w1 * ln( LL(feat1) ) + w2 * ln( LL(feat2) )

It sounds like you are dealing with a mixture model, which means the data is generated from multiple distribution and mixed together.

You should instead have a look at how to model mixture model (zero inflated poisson distribution is actually one type of mixture model), you can also check related post like: