Yes, shape=1 or just leave it out of the mixture’s construction
If you wouldn’t mind, I’d like to ask you another question.
I’ve extended this model to have a triangle_name
variable that comes from a mixture of categorical variables. I’ve defined this mixture very similarly to the triangle
mixture. To make things simple, I made triangle=0.
, rather than defining it as a random variable. I expected everything to go well, but I get an error about bad initial energy for that value of triangle
. When triangle=1.
, the sampler works.
Here is some sample code.
pTri_given_on = 1.
pTri_given_not_on = .7
tri_delta_on = pTri_given_on - pTri_given_not_on
tri_name_giv_tri_on_dist = numpy.array([.4, .4, .2])
tri_name_giv_tri_not_on_dist = numpy.array([.2,.3, .5])
n = 1000
NA_ENCODING = -10
with pymc3.Model() as model:
NA = pymc3.Constant.dist(c=NA_ENCODING)
# On
pOn = pymc3.Beta('pOn', alpha=on_count, beta=(schema_count - on_count))
on = pymc3.Bernoulli('on', p=pOn)
triangle = 0.
triangle_name_mixture_weights = [on * triangle, (1. - on) * triangle, on * (1. - triangle) + (1. - on) * (1. - triangle)]
tri_name_given_tri_and_on = pymc3.Categorical.dist(p=tri_name_giv_tri_on_dist)
tri_name_given_tri_and_not_on = pymc3.Categorical.dist(p=tri_name_giv_tri_not_on_dist)
triangle_name = pymc3.Mixture('triangle_name', w=triangle_name_mixture_weights, \
comp_dists=[tri_name_given_tri_and_on, tri_name_given_tri_and_not_on, NA], \
shape=1, testval=0., dtype="int64")
res=pymc.sample(n)
This code generates the following error:
Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>NUTS: [pOn]
>BinaryGibbsMetropolis: [on]
>Metropolis: [triangle_name]
Sampling 2 chains: 0%| | 0/3000 [00:00<?, ?draws/s]/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
Bad initial energy, check any log probabilities that are inf or -inf, nan or very small:
triangle_name NaN
pymc3.parallel_sampling.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/pymc3/parallel_sampling.py", line 160, in _start_loop
point, stats = self._compute_point()
File "/usr/local/lib/python3.5/dist-packages/pymc3/parallel_sampling.py", line 191, in _compute_point
point, stats = self._step_method.step(self._point)
File "/usr/local/lib/python3.5/dist-packages/pymc3/step_methods/compound.py", line 27, in step
point, state = method.step(point)
File "/usr/local/lib/python3.5/dist-packages/pymc3/step_methods/arraystep.py", line 247, in step
apoint, stats = self.astep(array)
File "/usr/local/lib/python3.5/dist-packages/pymc3/step_methods/hmc/base_hmc.py", line 144, in astep
raise SamplingError("Bad initial energy")
pymc3.exceptions.SamplingError: Bad initial energy
"""
The above exception was the direct cause of the following exception:
pymc3.exceptions.SamplingError: Bad initial energy
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "test.py", line 109, in <module>
res = pymc3.sample(n)
File "/usr/local/lib/python3.5/dist-packages/pymc3/sampling.py", line 432, in sample
trace = _mp_sample(**sample_args)
File "/usr/local/lib/python3.5/dist-packages/pymc3/sampling.py", line 965, in _mp_sample
for draw in sampler:
File "/usr/local/lib/python3.5/dist-packages/pymc3/parallel_sampling.py", line 393, in __iter__
draw = ProcessAdapter.recv_draw(self._active)
File "/usr/local/lib/python3.5/dist-packages/pymc3/parallel_sampling.py", line 297, in recv_draw
raise error from old_error
pymc3.parallel_sampling.ParallelSamplingError: Bad initial energy
The reason why I ask is because I want to make triangle_name
depend on the system’s beliefs about on
and triangle
. When I make this model explicit:
# Triangle
triangle_mixture_weights = [on, (1. - on)]
tri_giv_on = pymc3.Bernoulli.dist(pTri_given_not_on + tri_delta_on)
tri_giv_not_on = pymc3.Bernoulli.dist(pTri_given_not_on)
triangle = pymc3.Mixture('triangle', w=triangle_mixture_weights, \
comp_dists=[tri_giv_on, tri_giv_not_on], \
shape=1, testval=0., dtype="int64")
triangle_name_mixture_weights = [on * triangle, (1. - on) * triangle, on * (1. - triangle) + (1. - on) * (1. - triangle)]
tri_name_given_tri_and_on = pymc3.Categorical.dist(p=tri_name_giv_tri_on_dist)
tri_name_given_tri_and_not_on = pymc3.Categorical.dist(p=tri_name_giv_tri_not_on_dist)
triangle_name = pymc3.Mixture('triangle_name', w=triangle_name_mixture_weights, \
comp_dists=[tri_name_given_tri_and_on, tri_name_given_tri_and_not_on, NA], \
shape=1, testval=0., dtype="int64")
I get the following error.
Traceback (most recent call last):
File "test.py", line 107, in <module>
res = pymc3.sample(n)
File "/usr/local/lib/python3.5/dist-packages/pymc3/sampling.py", line 401, in sample
step = assign_step_methods(model, step, step_kwargs=kwargs)
File "/usr/local/lib/python3.5/dist-packages/pymc3/sampling.py", line 150, in assign_step_methods
return instantiate_steppers(model, steps, selected_steps, step_kwargs)
File "/usr/local/lib/python3.5/dist-packages/pymc3/sampling.py", line 71, in instantiate_steppers
step = step_class(vars=vars, **args)
File "/usr/local/lib/python3.5/dist-packages/pymc3/step_methods/arraystep.py", line 65, in __new__
step.__init__([var], *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/pymc3/step_methods/metropolis.py", line 136, in __init__
self.delta_logp = delta_logp(model.logpt, vars, shared)
File "/usr/local/lib/python3.5/dist-packages/pymc3/step_methods/metropolis.py", line 624, in delta_logp
[logp0], inarray0 = pm.join_nonshared_inputs([logp], vars, shared)
File "/usr/local/lib/python3.5/dist-packages/pymc3/theanof.py", line 264, in join_nonshared_inputs
xs_special = [theano.clone(x, replace, strict=False) for x in xs]
File "/usr/local/lib/python3.5/dist-packages/pymc3/theanof.py", line 264, in <listcomp>
xs_special = [theano.clone(x, replace, strict=False) for x in xs]
File "/usr/local/lib/python3.5/dist-packages/theano/scan_module/scan_utils.py", line 247, in clone
share_inputs)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 232, in rebuild_collect_shared
cloned_v = clone_v_get_shared_updates(outputs, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates
clone_v_get_shared_updates(i, copy_inputs_over)
File "/usr/local/lib/python3.5/dist-packages/theano/compile/pfunc.py", line 96, in clone_v_get_shared_updates
[clone_d[i] for i in owner.inputs], strict=rebuild_strict)
File "/usr/local/lib/python3.5/dist-packages/theano/gof/graph.py", line 246, in clone_with_new_inputs
new_node = self.op.make_node(*new_inputs)
File "/usr/local/lib/python3.5/dist-packages/theano/tensor/elemwise.py", line 230, in make_node
% (self.input_broadcastable, ib)))
TypeError: The broadcastable pattern of the input is incorrect for this op. Expected (True,), got (False,).
I am assuming that this error is related to the first one, but I may be wrong.
For the last error, I think that you have to either use shape=None
or pass an array with a single element as the testval instead of a scalar.
The first error has to do with your model being ill fit for your problem. You can check to see if there are obvious mistakes using model.check_test_point
.
I don’t want to pry into why you chose the model you did, but I find it strange that you are using a Bernoulli random variable as the mixtures’ weight. Why not use the probabilities of on and off directly? I think that you are mixing up the definitions of mixtures using latent indexes with the marginalized representation, which is what Mixture
is for. This could be behind your bad initial energy
So, you’re not the first person to suggest this to me. The only thing is, I don’t fully understand what you mean by “use the probabilities directly”. So, the scenario I am trying to model is a simple block-example, where a triangle is on top of a block. In this scenario, we have a triangle and block predicate:
(triangle name=?triangle x=?x1 y=?y1 z=?z1)
(block name=?block x=?x2 y=?y2 z=?z2).
The ?
denotes some kind of distribution for the predicate’s argument. To represent the on
relation we have another predicate:
(on arg1=?triangle1 arg2=?block1)
I wanted to represent these predicates as Bayesian networks (tree structures, more specifically) where the name of the predicate is the root of a tree, and from there, directed edges would flow to the arguments of the predicate. The idea I want to capture is that whenever the system has observed on
, it will change the distribution for the arguments of the triangle
and block
predicates. The on
triangle
and block
predicates are either true or false in the state, so to capture this, I thought creating a mixture model, where the mixture weights are a function of on
triangle
and block
, made sense to me. Hence, why I made:
and
In this context, could you clarify what you mean by use the probabilities for on
and triangle
directly?
I think that you are close to understanding how to use the probabilities as mixture weights. As a first step, you have to write down an explicit model that uses the on and off states. Something like this:
Where p_{on} is the constant that you chose (0.7), and p2 is an array [1, 0.7], and p2_{on_{triangle}} indexes into said array. This model explicitly samples the a priori hidden on_{triangle} state. I say hidden because I imagine it is unobserved. Now, you may see that there is no mixture distribution in the math written above. To get the mixture, one assumes that the on_{triangle} discrete state is not observed, the only thing that is observed is the final triangle state, so you can sum over the the possible values of on_{triangle} (this gives you the marginal probability distribution) and reads as follows:
Where w=[(1-p_{on}), p_{on}]. This second parametrization gives you the same probability distribution for sampling triangle but you removed the discrete on_triangle variable by marginalizing out and getting a mixture model written down. This is what I meant by using the probabilities directly. You should check out the examples of mixture models on our website. There one in particular about the two parametrizations of a mixture model, I’ve with the latent variable and one without, and talks about the differences.
One last thing. In all of the examples you posted, no distribution had observed
. That means that sample
will not infer a posterior distribution. It will draw samples that should mimic the prior.
Ok,
So, I came up with this. Thanks to your help, I was able to debug my model and make it compile. Please let me know what you think.
schema_count = 10
on_count = 3
not_on_count = schema_count - on_count + .0001
on_alphas = [not_on_count, on_count]
tri_and_on_count = 3
not_tri_and_on_count = on_count - tri_and_on_count + .0001
tri_on_alphas = [not_tri_and_on_count, tri_and_on_count]
tri_and_not_on_count = 2
not_tri_and_not_on_count = not_on_count - tri_and_not_on_count + .0001
tri_not_on_alphas = [not_tri_and_not_on_count, tri_and_not_on_count]
with pymc3.Model() as model:
# On
pOns = pymc3.Dirichlet('pOns', numpy.array(on_alphas))
# Triangle
#You have on. What's the probability of (not) having triangle
pTri_ons = pymc3.Dirichlet('pTri_ons', numpy.array(tri_on_alphas))
#You have don't on. What's the probability of (not) having triangle
pTri_not_ons = pymc3.Dirichlet('pTri_not_ons', numpy.array(tri_not_on_alphas))
tri_giv_on = pymc3.Categorical.dist(pTri_ons)
tri_giv_not_on = pymc3.Categorical.dist(pTri_not_ons)
triangle = pymc3.Mixture('triangle', w = pOns, \
comp_dists=[tri_giv_not_on, tri_giv_on], \
testval=1, dtype="int64", observed=1)