Condition Categorical Variable on Bernoulli Parents

I recently asked a question about how to condition a normally distributed variable on bernoulli parents. After looking at the code, and after some discussion, I understood what was going on.

In my setting, the dependent variable either takes the form of a particular distribution given the parents, or it should take the NA value (or its equivalent). So, in this case, suppose that I have two Bernoulli variables, On and Triangle, and I have one Categorical variable, Name, that depends on both of these. If On is true, then Name follows a categorical distribution determined by On. If Triangle is true, Name takes another categorical distribution determined by Triangle. In the case where both On and Triangle are False, Name should essentially remain undefined.

The following is mock code:

tri_names_given_on = numpy.array([.1, .5, .2, .2])
tri_names_given_triangle = numpy.array([.1, .1, .1, .1, .1, .3, .3])

block_names_given_on = numpy.array([.3, .4, .3])
block_names_given_block = numpy.array([.1, .2, .3, .4])

NA_ENCODING = -10.

with pymc3.Model() as model:

   on = pymc3.Bernoulli('on', pOn)
   triangle = pymc3.Bernoulli('triangle', pTri_given_not_on + on * tri_delta_on)
   block = pymc3.Bernoulli('block', pBlock_given_not_on + on* block_delta_on)
   arg1 = NA_ENCODING + on * (-NA_ENCODING + tri_names_given_on) + (1 - on) * triangle * (-NA_ENCODING + tri_names_given_triangle)
   arg2 = NA_ENCODING + on * (-NA_ENCODING + block_names_given_on) + (1 - on) * block * (-NA_ENCODING + block_names_given_block)

   triangle_name = None
   block_name = None

   if arg1 != NA_ENCODING:
      triangle_name = pymc3.Categorical('triangle_name', arg1)
   else:
      triangle_name = pymc3.Deterministic('triangle_name', NA_ENCODING)

   if arg2 != NA_ENCODING:
      block_name = pymc3.Categorical('block_name', arg2)
   else:
      block_name = pymc3.Deterministic('block_name', NA_ENCODING)

I’ve been having difficulty specifying my model in PyMC, so I think I’m missing some fundamental knowledge. I’ve looked at some of the tutorials, but right now, they don’t seem comprehensive. I seem to have a lot of gaps in my ability. So, any help is greatly appreciated.

You should have a look at marginalized mixture model - whenever you have discrete variables in your model, you should first try to find way to marginalized it :slight_smile:

Frequently Asked Questions is a good place to start.

Ok, I will look into this! Thank you

1 Like

@junpenglao, I came up with this. Please let me know what you think.

tri_name_giv_tri_on_dist = numpy.array([.4, .4, .2])
tri_name_giv_tri_not_on_dist = numpy.array([.2,.3, .5])

block_name_giv_block_on_dist = numpy.array([.3, .3, .4])
block_name_giv_block_not_on_dist = numpy.array([.1, .3, .6])

NA_ENCODING = -10.

with pymc3.Model() as model:

   # Internal NA random variable
   NA = pymc3.Deterministic('NA', NA_ENCODING)
   
   on = pymc3.Bernoulli('on', pOn)
   triangle = pymc3.Bernoulli('triangle', pTri_given_not_on + on * tri_delta_on)
   block = pymc3.Bernoulli('block', pBlock_given_not_on + on* block_delta_on)

   triangle_mixture_weights = numpy.array([on * triangle, (1 - on) * triangle, (1 - on) * (1 - triangle)])   
   tri_name_given_tri_and_on = pymc3.Categorical.dist('tri_name_given_tri_and_on', tri_name_giv_tri_on_dist)
   tri_name_given_tri_and_not_on = pymc3.Categorical.dist('tri_name_given_tri_and_not_on', tri_name_giv_tri_not_on_dist)
   triangle_name = pymc3.mixture('triangle_name' w=triangle_mixture_weights, comp_dists=[tri_name_given_tri_and_on, tri_name_given_tri_and_not_on, NA])
  
   block_mixture_weights = numpy.array([on * block, (1 - on) * block, (1 - on) * (1 - block)])   
   block_name_given_block_and_on = pymc3.Categorical.dist('block_name_given_block_and_on', block_name_giv_block_on_dist)
   block_name_given_block_and_not_on = pymc3.Categorical.dist('block_name_given_block_and_not_on', block_name_giv_block_not_on_dist)
   block_name = pymc3.mixture('block_name' w=block_mixture_weights, comp_dists=[block_name_given_block_and_on, block_name_given_block_and_not_on, NA])

You are getting there! You should rewrite the following into a continuous variable, either use the parameters (i.e., pOn, pTri_given_not_on + on * tri_delta_on here) directly, or wrap it into a Beta distribution:

   on = pymc3.Bernoulli('on', pOn)
   triangle = pymc3.Bernoulli('triangle', pTri_given_not_on + on * tri_delta_on)
   block = pymc3.Bernoulli('block', pBlock_given_not_on + on* block_delta_on)

Into:

   on = pOn
   triangle = pTri_given_not_on + on * tri_delta_on
   block = pBlock_given_not_on + on* block_delta_on

@junpenglao, So, you’re saying I should do something like this?

   # Internal x,y,z position variable to be transformed
   pos = pymc3.Normal('pos', 0., 1., shape=3)

   # Internal NA random variable
   NA = pymc3.Deterministic('NA', NA_ENCODING)
   
   # On 
   pOn = pymc3.Beta('on', alpha=on_count, beta = schema_count - on_count);
   on = pymc3.Bernoulli('on', pOn)

   # Triangle
   triangle_mixture_weights = np.array([on, (1 - on)])
   tri_giv_on = pymc3.Bernoulli.dist('tri_giv_on', pTri_given_not_on + tri_delta_on)
   tri_giv_not_on = pymc3.Bernoulli.dist('tri_giv_not_on', pTri_given_not_on)
   triangle = pymc3.Mixture('triangle', w=triangle_mixture_weights, comp_dists=[tri_giv_on, tri_giv_not_on])
   
   triangle_name_mixture_weights = np.array([on * triangle, (1 - on) * triangle, (1 - on) * (1 - triangle)])   
   tri_name_given_tri_and_on = pymc3.Categorical.dist('tri_name_given_tri_and_on', tri_name_giv_tri_on_dist)
   tri_name_given_tri_and_not_on = pymc3.Categorical.dist('tri_name_given_tri_and_not_on', tri_name_giv_tri_not_on_dist)
   triangle_name = pymc3.Mixture('triangle_name', w=triangle_name_mixture_weights, comp_dists=[tri_name_given_tri_and_on, tri_name_given_tri_and_not_on, NA])

   x1 = pymc3.Deterministic('x1', NA_ENCODING + on * (-NA_ENCODING + pos[0] * std_x1_on + mu_x1_on) + \
                            (1 - on) * triangle * (-NA_ENCODING + pos[0] * std_x1_tri + mu_x_tri))
   y1 = pymc3.Deterministic('y1', NA_ENCODING + on * (-NA_ENCODING + pos[1] * std_y1_on + mu_y1_on) + \
                            (1 - on) * triangle * (-NA_ENCODING + pos[1] * std_y1_tri + mu_y_tri))
   z1 = pymc3.Deterministic('z1', NA_ENCODING + on * (-NA_ENCODING + pos[2] * std_z1_on + mu_z1_on) + \
                            (1 - on) * triangle * (-NA_ENCODING + pos[2] * std_z1_tri + mu_z_tri))

And similarly for the block?

Nope - I mean try to avoid something like on = pymc3.Bernoulli('on', pOn) as it is an unobserved discrete variable.

Ah, Ok, I see… You are suggesting that I marginalize it out, right?

yep :slight_smile:

So, I’m not sure what to do about marginalization. In my setting, I will want to infer the values of these hidden variables. If they’re marginalized out like that, I won’t be able to do this, right?

You can infer the continuous mixture weight instead - if you want explicit latent label you can sample from the mixture weight with a categorical, or applying argmax.

If you have some resources you could point me to, I’d appreciate that. I’m not sure I follow 100% what you’re suggesting that I do.

Also, in my setting,it is possible to mark some of the hidden variables as observed. For example, someone might say that on is true, while being interested in the values for triangle and block.

You have seen the answer in FAQ right? You can also have a look at this notebook: https://github.com/junpenglao/advance-bayesian-modelling-with-PyMC3/blob/master/Notebooks/Code10%20-%20Schizophrenic_case_study.ipynb

Hope it gives you some inspirations!

Thanks for the link!

For the most part I understood your example, but I couldn’t grasp what was happening when you wrote:

Z_latent = pm.Uniform('Z_latent', 0., 1., shape=(6, Nt))
Z = pm.Deterministic('Z',
                         pm.theanof.tt_rng().binomial(
                             n=1, p=Z_latent, size=(6, Nt)))

My guess is that you’re marginalizing Z, but it’s not clear to me. Z_latent will always be 1, and you’re feeding that into some theano binomial distribution parameterized by one trial and probability of success, p=Z_latent=1?

I looked up pm.theanof.tt_rng(), but wasn’t able to find clear information on it.

Yeah that example is a bit convoluted :sweat_smile: - it is more to show what it is possible but I would not recommend it - the other two ways of modeling it is probability what you should focus on

Hey, I think I’ve made a lot of progress on my problem. Do you have any examples, you can point me to about inferring the mixture weights?

Do you mean doing something like this:

#Mixture weights
pOns = pymc3.Dirichlet('pOns', numpy.array(on_alphas))

on = pymc3.Categorical('on' pOns)

But for the mixture model, I would parameterize it using pOns, rather than on? Like:

triangle = pymc3.Mixture('triangle', w = pOns, \
                            comp_dists=[tri_giv_not_on, tri_giv_on], \
                            testval=1, dtype="int64", observed=1)

Thanks!

Yes! That’s in general the idea of marginalization.

As for inferencing mixture, there are a few posts on the discourse you can look at. You can start with the discussion here: Properly sampling mixture models