How to access dimension of a multivariate 'dist' object?

I discovered that I can’t access dimensions of a ‘dist’ object (the kind of distributions you need to use when building a Mixture).

The following code :

import numpy as np
import pymc3 as pm
 
c = pm.Uniform.dist(lower=0,upper=5,shape=2)
print(c[0])

yields that error message :

I really don’t understand the problem, c is multi-dimensionnal (shape=2), so why can’t I use the [ ] operator on it ?!

Any help appreciated.

If you type print(type(c)), it shows:

<class 'pymc3.distributions.continuous.Uniform'>,

it is a function, or better, a distribution; to access the methods you type print(dir(c)) and to see some values of that distribution:

print(c.random()).

2 Likes

Thanks for this information, but that doesn solve my problem…

This is just a toy code showing the core of the problem.

I my real problem, I need to create a linear regression, which will be a component of a mixture… So I need to access individual dimension of a multidimensionnel ‘dist’ in order to build the linear regression.

Hi Hubert,
When you’re inside a model context, if you try c.tag.test_value.shape, does it work?

I don’t know what you expected from this line…

But I think this answer won’t help… ;(

Hi @hwassner

As pointed out by @rosgori, variable c is a PyMC3 distribution. To my knowledge every distribution in PyMC3 has shape attribute assigned to np.atleast_1d(shape). Maybe this snippet solves the issue -

>>> import pymc3 as pm
>>> c = pm.Uniform.dist(lower=0,upper=5,shape=2)
>>> c.shape
array([2])
>>> type(c.shape)
<class 'numpy.ndarray'>
>>> c.shape[0]
2
>>>
2 Likes

Didn’t see you were using the dist attribute – my answer is for tensors, not distributions. I think Sayam’s and David’s answers will be helpful :ok_hand:

ok, this confirm that c is multi-dimensionnal, but the problem is not to access it dimension…
The problem is to acces it’s individual components (in order to create a linear regression), but any call to [ ] operator lead to an error !

@hwassner perhaps it is best to show a more expanded example where c[0] is failing.

Accessing individual elements of a dist isn’t a well defined thing to do, it is only something you can do with a random variable (the things returned by eg pm.Normal('y')). The dist is just an object that allows you to compute the logp of a value, so it is more or less just a thing with a logp function, that takes values as input.
Can you maybe elaborate a bit more on what you are trying to do?

3 Likes

Ok this is a little more complex code, closer to what I’m really trying to do :

I try to build a mixture where one component is a linear combinaison (simplyfied to a simple sum of 2 sub components to make the example shorter).

import numpy as np
import pymc3 as pm

    data = np.random.uniform(low=0,high=5,size=100)

    with pm.Model() as model:      
        c = pm.Uniform.dist(lower=0,upper=1,shape=2)
        c_ = pm.Deterministic('c_',c[0]+c[1])
        n = pm.Normal.dist(mu=10,sd=1) 
        w = pm.Dirichlet('w',a=np.array([1,1]))
        mix = pm.Mixture('mix',w=w,comp_dists=[c_,n],observed=data,shape=len(data))
        
        trace = pm.sample()

This code fail at the Deterministic line :

Other attemtps, like using the sum function instead of writing it explicitely with the + and [ ] operators , also lead to error linked to the limited nature of ‘dist’ objects…

In this particular case this is still pretty simple to work around, more generally however, this might be tricky.
The problem is that pymc never explicitly samples the values of c, but only asks: “given the observed data and weights w, what is the probability of observing that data”. It tries to compute P(mix=observed|w).
It uses the definition of a mixture: P(mix=observed|w) = \sum_i P(\text{mix=observed and mix from comp_dists[i]})\cdot w_i, so it needs to compute comp_dists[i].logp(observed). This is exactly what a dist object does for you.
But if comp_dists[i] is a sum of different things, then you have to work out the probability density of the sum and if it doesn’t exist already, write your own dist. The distribution of the sum of independent uniform distributed variables for example follows an Irwin–Hall distribution (triangular dist for n=2). This is hard to work out in general though, so pymc can’t do this for you.
You could probably also use the sampler to help along, if the density is to complicated to work out. You might be able to sample all but one of the things you want to sum up, and then just compute the density of the sum given all the explicitly sampled values.
Maybe I can help a bit if I know the actual dist you would like to write as a dist.

3 Likes

Hi @aseyboldt this is very kind …

Basically what I’m trying to do is a linear model, were variables are binary and output is a rate (my first try used a sigmoid function to fit the output in [0;1], I hope it’s ok…), the mixture will have to components :

  • the linear distribution
  • a beta distribution
    The mixture ouput is used in a binomial RV with observed data.

any help will be appreciated.