How to access dimension of a multivariate 'dist' object?

hwassner · March 27, 2020, 5:16pm

I discovered that I can’t access dimensions of a ‘dist’ object (the kind of distributions you need to use when building a Mixture).

The following code :

import numpy as np
import pymc3 as pm
 
c = pm.Uniform.dist(lower=0,upper=5,shape=2)
print(c[0])

yields that error message :

I really don’t understand the problem, c is multi-dimensionnal (shape=2), so why can’t I use the [ ] operator on it ?!

Any help appreciated.

rosgori · March 29, 2020, 4:41am

If you type print(type(c)), it shows:

<class 'pymc3.distributions.continuous.Uniform'>,

it is a function, or better, a distribution; to access the methods you type print(dir(c)) and to see some values of that distribution:

print(c.random()).

hwassner · March 30, 2020, 8:39am

Thanks for this information, but that doesn solve my problem…

This is just a toy code showing the core of the problem.

I my real problem, I need to create a linear regression, which will be a component of a mixture… So I need to access individual dimension of a multidimensionnel ‘dist’ in order to build the linear regression.

AlexAndorra · March 30, 2020, 9:14am

Hi Hubert,
When you’re inside a model context, if you try c.tag.test_value.shape, does it work?

hwassner · March 30, 2020, 4:16pm

I don’t know what you expected from this line…

But I think this answer won’t help… ;(

Sayam753 · March 31, 2020, 4:35am

Hi @hwassner

As pointed out by @rosgori, variable c is a PyMC3 distribution. To my knowledge every distribution in PyMC3 has shape attribute assigned to np.atleast_1d(shape). Maybe this snippet solves the issue -

>>> import pymc3 as pm
>>> c = pm.Uniform.dist(lower=0,upper=5,shape=2)
>>> c.shape
array([2])
>>> type(c.shape)
<class 'numpy.ndarray'>
>>> c.shape[0]
2
>>>

AlexAndorra · March 31, 2020, 8:02am

Didn’t see you were using the dist attribute – my answer is for tensors, not distributions. I think Sayam’s and David’s answers will be helpful

hwassner · March 31, 2020, 8:12am

ok, this confirm that c is multi-dimensionnal, but the problem is not to access it dimension…
The problem is to acces it’s individual components (in order to create a linear regression), but any call to [ ] operator lead to an error !

nkaimcaudle · March 31, 2020, 10:07am

@hwassner perhaps it is best to show a more expanded example where c[0] is failing.

aseyboldt · March 31, 2020, 10:24am

Accessing individual elements of a dist isn’t a well defined thing to do, it is only something you can do with a random variable (the things returned by eg pm.Normal('y')). The dist is just an object that allows you to compute the logp of a value, so it is more or less just a thing with a logp function, that takes values as input.
Can you maybe elaborate a bit more on what you are trying to do?

hwassner · March 31, 2020, 12:56pm

Ok this is a little more complex code, closer to what I’m really trying to do :

I try to build a mixture where one component is a linear combinaison (simplyfied to a simple sum of 2 sub components to make the example shorter).

import numpy as np
import pymc3 as pm

    data = np.random.uniform(low=0,high=5,size=100)

    with pm.Model() as model:      
        c = pm.Uniform.dist(lower=0,upper=1,shape=2)
        c_ = pm.Deterministic('c_',c[0]+c[1])
        n = pm.Normal.dist(mu=10,sd=1) 
        w = pm.Dirichlet('w',a=np.array([1,1]))
        mix = pm.Mixture('mix',w=w,comp_dists=[c_,n],observed=data,shape=len(data))
        
        trace = pm.sample()

This code fail at the Deterministic line :

Other attemtps, like using the sum function instead of writing it explicitely with the + and [ ] operators , also lead to error linked to the limited nature of ‘dist’ objects…

aseyboldt · April 1, 2020, 8:10am

In this particular case this is still pretty simple to work around, more generally however, this might be tricky.
The problem is that pymc never explicitly samples the values of c, but only asks: “given the observed data and weights w, what is the probability of observing that data”. It tries to compute P(mix=observed|w).
It uses the definition of a mixture: P(mix=observed|w) = \sum_i P(\text{mix=observed and mix from comp_dists[i]})\cdot w_i, so it needs to compute comp_dists[i].logp(observed). This is exactly what a dist object does for you.
But if comp_dists[i] is a sum of different things, then you have to work out the probability density of the sum and if it doesn’t exist already, write your own dist. The distribution of the sum of independent uniform distributed variables for example follows an Irwin–Hall distribution (triangular dist for n=2). This is hard to work out in general though, so pymc can’t do this for you.
You could probably also use the sampler to help along, if the density is to complicated to work out. You might be able to sample all but one of the things you want to sum up, and then just compute the density of the sum given all the explicitly sampled values.
Maybe I can help a bit if I know the actual dist you would like to write as a dist.

hwassner · April 6, 2020, 8:40am

Hi @aseyboldt this is very kind …

Basically what I’m trying to do is a linear model, were variables are binary and output is a rate (my first try used a sigmoid function to fit the output in [0;1], I hope it’s ok…), the mixture will have to components :

the linear distribution
a beta distribution
The mixture ouput is used in a binomial RV with observed data.

any help will be appreciated.

Topic		Replies	Views
How to transform a 'dist' object? Questions	0	360	April 9, 2020
Custom multivariate density via DensityDist v5	4	723	July 1, 2022
Size of sd_dist in LKJCholeskyCov is ambiguous Questions	3	563	October 31, 2018
How to access i-th RV of a multivariate Normal with shape >1 Questions	2	432	November 8, 2018
Dimension of "mu" in Normal Dist? Questions	1	462	May 9, 2020

How to access dimension of a multivariate 'dist' object?

Related topics