Create Custom Joint Distribution

I want to sample a random variable X = (x_1,x_2,…,x_n) from a custom joint distribution such that x_i is an integer between 0 to i-1, i.e.,

P[x_1,x_2,…,x_n] = 1/n! if x_i is in between 0 to i-1 for all i and 0 otherwise.

It can be seen that this is a product of n independent discrete uniform distributions (the i th dimension is an uniform distribution with on the set of integers from 0 to i-1). How do we define such joint distribution with parameter n in PyMC3?

I tried the following:

with pm.Model() as model:
X = []
for i in range(5):
    X.append(pm.DiscreteUniform('x_{}'.format(i),lower =0,upper=i))
trace =  pm.sample()

pm.traceplot(trace)

Is this the right way to do this?

1 Like

Use pm.sample_prior_predictive(). This will utilize the underlying random() function, as opposed to trying to use Monte Carlo sampling.

For whatever reason (maybe @colcarroll can explain in a blog post), Monte Carlo sampling purely from the priors (without an observed=) almost always results in divergences and low effective sample sizes. So use sample_prior_predictive() whenever possible.

That’s because when you are sampling from the prior using pm.sample without observed, the observed node is now also a random variable. It effectively add n_observed dimensions to your model (if you are doing a simple linear regression with a slope and intercept on 100 K observation, your model parameter goes from 3 to 3+100K). And higher the dimension of the model the more tricky it becomes.

1 Like

Thanks @chartl and @junpenglao for the feedback!

We are trying to model using a time series data (size = n).
We are trying to learn the following parameters A, W, nu, lambda0,X (see the code uploaded)

On small n i.e like 10, the following code works. But when n is large like 100 we are bumping into some strange theano compilation error. The trace is too big to be pasted here.

If you can simple run it and provide us any direction, we will really appreciate it.

custom_time_series.py (2.0 KB)

On a first glance, you should try getting rid of the for loop, and also try rewriting the model into a marginalized mixture model instead of using a latent discrete variable index.
You can have a search here on this discourse, the topic of rewrite model into marginalized mixture came up quite a lot.

1 Like

Running into a similar problem. I am very new to pymc3. @junpenglao, I do not fully understand mixture models or how to construct marginalized models. Is there a tutorial that I should follow?

Thank you in advance.

You can start with this Frequently Asked Questions

Thanks @junpenglao for the pointer. For us the latent variable X is an n dimensional variable where i th dimension can take i possible values. So according to this approach we will have n^2 components. Our n can be as big as 2000. Isn’t that a problem?

The latent variable is generated from a distribution, so usually you can just model it with 1 random variable with k (2 in your case I guess) components.

I am sorry I don’t follow. Our distribution for the latent variable is as follows:

P[x_1,x_2,…,x_n] = 1/n! if x_i is in between 0 to i-1 for all i and 0 otherwise.

In other words, the i th dimension is an uniform distribution with on the set of integers from 0 to i-1. Can you please elaborate what do you mean by k=2 components?