How to model product of multiple random variables

George · June 24, 2019, 5:32pm

My question is to build a super-likelihood function from several random variables. The distribution of each variable is standard. Using two random variables as an example, the target super-likelihood is like this:
S = F1^w1 * F2^w2 (w1 + w2 = 1)
and
logS = w1 logF1 + w2 log F1 (w1 + w2 = 1)

Where F1 ~ Normal distribution and F2 ~ Bernoulli distribution

I use the following codes

data = <load my data> 

[w1,w2] = [0.5,0.5]

with Model() as model:
    mu = pm.Uniform('mu',lower=0,upper=1)
    sd = pm.Uniform('sd',lower=0,upper=1)
    p = pm.Uniform('p',lower=0,upper=1)

    F1 = pm.Normal("F1", mu = mu, sigma = sd)
    F1 = pm.Bernoulli("F2",p)

    S = pm.Deterministic('S',F1**w1*F2**w2, observed=data)

    step = Metropolis()
    trace = pm.sample(2000, step=step)

But it does not work.

My question is: how to implement such a weighted model in pyMC3?

junpenglao · June 24, 2019, 6:04pm

See eg Observed Deterministic for related discussion.

George · June 27, 2019, 1:56pm

Hi junpenglao,
Maybe my description of the question is misleading. I do not want to do observed deterministic inference, but try to write codes to build customized likelihood from a Normal and a Bernoulli variable by

logp_ = w1 * ln( LL(feat1) ) + w2 * ln( LL(feat2) )

Where feat1 ~ Bernoulli distribution and feat2 ~ Normal distribution

Some one told me about using Poisson zeros trick (Bernoulli ones trick). But I want to know how to implement it in pymc3 (and want to know how these tricks work).

I am new to python and pymc3, it would be good to have an example that provide detailed annotations to help me learning how to do it and how to expand it to more than two random variables.

Really appreciate if you can help on this.

The following are my complete codes.

When running them, I got errors:
RuntimeError: The communication pipe between the main process and its spawned children is broken.
In Windows OS, this usually means that the child process raised an exception while it was being spawned, before it was setup to communicate to the main process.
The exceptions raised by the child process while spawning cannot be caught or handled from the main process, and when running from an IPython or jupyter notebook interactive kernel, the child’s exception and traceback appears to be lost.
A known way to see the child’s error, and try to fix or handle it, is to run the problematic code as a batch script from a system’s Command Prompt. The child’s exception will be printed to the Command Promt’s stderr, and it should be visible above this error and traceback.
Note that if running a jupyter notebook that was invoked from a Command Prompt, the child’s exception should have been printed to the Command Prompt on which the notebook is running.

Could you please check what is incorrect?

import pymc3 as pm
import numpy as np
import scipy.stats as stats
#simulate samples from a bernoulli variable: feat1
p0 = 0.5 #true p of the Bernoulli
obs_feat1 = stats.bernoulli.rvs(0.5, size=1000)

#simulate samples from a normal variable: feat2
mu0 = 0 #true mu of Gaussian
sigma0 = 1 #true sigma of Gaussian

obs_feat2 = stats.norm.rvs(loc=mu0, scale=sigma0, size=1000)

#use Densitydist to get customized 'joint' distribution for the two random varaibles
#logp_ = w1 * ln( LL(feat1) ) + w2 * ln( LL(feat2) )

(w1,w2) = (0.5,0.5)

import theano.tensor as tt

def log_likelihood(p_1, mu_2, sigma_2, f1, f2, w1, w2):
    #p_1: parameter for bernoulli variable (1st variable)
    #mu_2, sigma_2: parameters for normal variable (2nd varaiable)
    #f1: an observation of feat1
    #f2: an observation of feat2
    #w1, w2: weight for building likelihood

    LL1 = tt.switch( f1, tt.log(p_1), tt.log(1 - p_1) )

    tau_2 = 1. / (sigma_2**2)
    LL2 = (-tau_2 * (f2 - mu_2)**2 + tt.log(tau_2 / np.pi / 2.)) / 2.

    logP = sum(w1*LL1 + w2*LL2)

    return(logP)

with pm.Model() as model:

    p_1 = pm.Uniform('p_1', lower=0, upper=1)
    mu_2 = pm.Uniform('mu_2', lower=-1, upper=1)
    sigma_2 = pm.Uniform('mu_2', lower=0, upper=2)

    like = pm.DensityDist('like', log_likelihood, 
                                                observed=dict(p_1=p_1,
                                                            mu_2=mu_2,
                                                            f1=obs_feat1,
                                                            f2=obs_feat2,
                                                            w1 = w1,
                                                            w2 = w2,)
                     )
    step = pm.Metropolis()
    trace = pm.sample(10000,step=step)

Gon_F · June 27, 2019, 6:34pm

So, if I am gathering everything correctly, you want to create a model whose error (I do not like such Frequentist terms, but I don’t know the appropriate Bayesian way to say it) is distributed according to some custom product of standard distributions (in this case, a Bernoulli and Normal distribution)?

I have never considered such a statistical problem, and I have not seen anyone else try doing this in Pymc3 (though other people may chime in at some point).

How would you do this outside of Pymc3 first? Maybe that can help us figure out how to do this in Pymc3.

George · June 27, 2019, 7:02pm

Hi Gon_F,

Someone told me that Poisson zeros ones trick can do this. And I found this can be done using JAGS. Please see the link below. I want to do similar thing - using several standard distribution to construct a new likelihood.

http://www.medicine.mcgill.ca/epidemiology/Joseph/courses/common/Tricks.html

junpenglao · June 27, 2019, 8:14pm

Seems like you are looking for a zero inflated poisson distribution.
https://docs.pymc.io/api/distributions/discrete.html#pymc3.distributions.discrete.ZeroInflatedPoisson

George · June 27, 2019, 9:13pm

Could be. But I do not know how to use it. Can you tell me how to use it into my question?

George · June 27, 2019, 9:16pm

Also, can you take a look at the codes? I draft these codes based on a question you have answered before, but it does not work:

junpenglao · June 28, 2019, 8:04am

try to write codes to build customized likelihood from a Normal and a Bernoulli variable by

logp_ = w1 * ln( LL(feat1) ) + w2 * ln( LL(feat2) )

It sounds like you are dealing with a mixture model, which means the data is generated from multiple distribution and mixed together.

You should instead have a look at how to model mixture model (zero inflated poisson distribution is actually one type of mixture model), you can also check related post like:

Topic		Replies	Views
Product of two probabilities distributions Questions	9	3966	June 8, 2021
How to set up a custom likelihood function for two variables Questions	8	13273	March 9, 2018
Likelihood given by product of two distributions? Questions	6	2199	August 4, 2018
Can not correctly sample high dimensional RVs from customized likelihood Questions	5	569	October 24, 2022
Sampling many dimensions, with different PDF in each dimension version agnostic modeling	13	698	March 30, 2023

How to model product of multiple random variables

Related topics