How do we get mixture variables and components from traces?

I have a mixture RV that I have created as follows:

sampling_dist = pm.Mixture("Sampling distribution " + f, w=[0.5,0.5], comp_dists=[condition, rn],

But when I sample from the model containing this variable, I don’t see it’s name as one of the varnames in the resulting trace. So how can I look at its values?
In this case, an even more important question is how can I look at the values of the component distributions as they evolve over time?
Finally, am I just doing The Wrong Thing? For this sum of two random variables, should I be creating a Deterministic object instead?

Partial answer is “no, I should not be using a Deterministic variable,” because it’s not possible to observe the value of a deterministic variable.

However, as far as I can tell, I have to explicitly tell PyMC3 to track the value of my mixture variable, by passing a list of variables into the sample method. I’m not sure why this is necessary: previously sample was logging all of my random variables, but it only logged the mixture when I explicitly asked it to.

A big question remains, though, because I would like to know the values of the components of my mixture distribution. However, I cannot log them in a trace, because they are distribution objects, instead of random variable objects. I tried wrapping them in Deterministic variables, but that didn’t work, either. Attempts to do so got me this error:

TypeError: Outputs must be theano Variable or Out instances. Received <pymc3.distributions.continuous.Normal object at 0x1374ca910> of type <class 'pymc3.distributions.continuous.Normal'>

I suspect that there’s some operation I can do to log the components, but I haven’t been able to figure it out, especially since the PyMC3 online docs describe how to set up mixture random values, but don’t carry through to explaining how to collect information about them from traces.

thanks for any assistance.

An observed node does not appear on the trace, as there is no free variable to sample.

As for knowing the values of the components, I am not exactly sure what you meant - if you want to know the distribution of each component, you can reconstruct it from the trace, eg see last figure of

I looked at that example, but it only has NormalMixture examples, which seems to work differently from the more basic Mixture distribution. I haven’t found any example online that does what I want (plot the contributors to a general Mixture distribution).
Also, to be honest, I find that example extremely difficult to follow, and to apply to my problem. There’s some explanation of the mathematics, but the reader (me in this case) is really left to figure out how the math maps to python for him or herself.
I think you are referring to input [13] where the weighted mixture components are plotted. Is that correct? If so, isn’t the author there plotting the results of a forward simulation of a known distribution using numpy instead of pymc3?
There’s a normal mixture model down in input [2], but that is a normal mixture, parameterized by a set of means and precisions, right? That may well be plot-able, but I think it’s a different case, because the Mixture is parameterized by a set of distributions instead of a set of random variables, as in the NormalMixture.
So all of this leads me back to my original question: how do I plot the component distributions in a (non-Normal) mixture, since the components are not random variables?
Maybe the answer is hidden in that page, but I’m afraid I don’t know how to extract it.

In the doc above, the second part is a Poisson mixture and the last cell is about plotting the Mixture components each with a Poisson distribution. But you are right: the result is plot by forward simulation using numpy as we dont have an easy function to reconstruct mixture components yet. Contributions are always welcome if you think the docs is unclear or there is feature missing.

For now, the workaround to plot each component is a bit involved. Say you have a model like below:

sample = np.hstack([np.random.randn(100),np.random.rand(100)])
with pm.Model() as m:
    mu = pm.Normal('mu')
    sd = pm.HalfNormal('sd', 1)
    condition = pm.Normal.dist(mu, sd)
    rn = pm.Uniform.dist(-5., 5.)
    sampling_dist = pm.Mixture(
        "Sampling distribution ",
        w=[0.5, 0.5],
        comp_dists=[condition, rn],

Now the logp of sampling_dist is not really what you want, as it is conditioned on the observation - it takes the free parameters mu and sd as input and output the sum of logp of the observations conditioned on the input.
So you need the raw logp function to evaluate on a grid of points:

x = np.linspace(-10, 10, 100)
py_ = tt.sum(
        sampling_dist.distribution.w * tt.exp(

However, you cannot display py_ directly in most case, as it is a tensor; and evaluating it using .eval() will gives you an error, as you need to provide the input (in this case mu and sd). So you need to compile it into a theano function:

f = theano.function(m.free_RVs, py_)

to check what input it accepts, do:

# [mu, sd_log__]

So to evaluate the function, you provide the required input as a list:

py = f(0., np.log(1.)) # mu=0 and sd=1
plt.plot(x, py);

Hopefully this clear it up a bit for you.

1 Like

Thank you so much for taking the time to explain this. So if I was to be learning the values of mu and sd_log__ in this example, I would take the final values of those variables from the trace? Or the average values? And I would push those fixed arguments into f, as you did, in order to yield a function of x that I can plot when the system is done running. Is that right?

Am I right in thinking that you are here extracting one of two contributors to the sampling_dist (in this case, rn)? Is that what the axis=1 does? So that I should do this a second time with axis=0 to get the contribution from condition?

Thanks for all of the help. I am very grateful.

You should take the average of the trace, as that would be the posterior expectation, and then you plug them into f

Say you have 2 component, then when you evaluate each logp on the input x you will have 2 rows of value. Multiply each row by the respective weight and sum over the columns you then get the mixture probability density function evaluate at each of the x. I am not sure what you meant by contribution from condition - if you meant the mixture contribution, they are different at each x, so you should no sum them together. if you are interested in the overall contribution then it is just the weight you are multiplying with.

Thanks again. When I said “the contribution from condition,” I meant the variable called condition (not a great name in general, but it fits my application).
The weights here are constant, so what I was looking for (and not explaining well) is the values of the two contributing variables, condition and rn (random noise).
I guess since those variables get their parameters from other variables that are visible in the trace, I should just look at those other variables. I was simply interested in looking at the way that the components appear in sampling_dist as a way of debugging and making sure I understood what was going on.

I understand that the mixture is not sampled (only the parameter of the mixture are sampled in a observed mixture)

My problem : I need to add an extra constraint on the mean of the mixture.
(Reason : the mixture parameters are optimized for likelyhood, which leads to a biaised mean estimate when re-sample it, I hope that adding a constraint on the mean will help.)

So I’d like to add a penalty when the mean of the re-sampled mixture is going too far from the observed data mean.
How can I re-sample from the mixture ?
My goal is to compute it’s mean and create a Potential to force the optimization to produce a non mean-biaised model.

I know that I can use the “sample_ppc” function to sample from my mixture, but the problem is that it only works when the optimisation is finished. I want to do it while optimizing to be able to constraint the mean of the mixture.

Remark : I’m using ADVI optmization, like in this example.

Any help will be appreciated.

It might be better to discuss in a separate post, but what you are trying to do is correct, ie:

But I think in this case it is much better to encode the penalty into the prior, for example changing the prior of mu so that there are more weights around the observed data mean.

I’d say ok If the observables data were modeled as a single distribution, but with a mixture, I confess that I have no clue how to do that…

The distribution is a mixture, with unknown weights and unknown mixture component parameters. How can I set prior on these values in order that the mixture of the posterior will lead to a given mean ? (And to be exact, in my case, it’s a hierarchical mixture of mixtures!!)

I can’t believe that there is no way to sample from my mixture
during the optimization (since I have all the current parameters, and there even is a function that do it after the optimization.)

I tried to declare another mixture re-using the variables from my observed mixture, but I do not observe this mixture. The trace from this mixture then is available during optimization, but I noticed that it modified the model behavior (even if not observed!!) . The resulting mixture parameters are not the same, and the predictive posterior check is different (and wrong!).
I don’t get how/why sampling parameters without observed data can change the behavior of a model.

I still don’t understand what you are trying to do (i.e., this re-sampling you are talking about). Could you please open a new post with some more information (ideally a notebook with simulated data)?