Model fits fine on OSx but not on Ubuntu 16.04

luca_giacomel · July 27, 2018, 1:12pm

Hi there, I am using a 16.04 ubuntu machine with latest theano (1.02) and pymc (3.5) and I can’t fit this toy model

import pymc3 as pm
import numpy as np
import theano
print(pm.__version__, np.__version__, theano.__version__)

data = [np.random.geometric(i) for i in np.random.beta(0.5, 0.5, size=2000)]

model = pm.Model()

with model:
    p = pm.Beta('p', pm.HalfFlat('alpha'), pm.HalfFlat('beta'), shape=2000)
    pm.Geometric('observed', p, observed=data)
    trace = pm.sample()
    pm.traceplot(trace)

I get the standard NUTS “bad energy” error (from one of the chains).

Nevertheless the same model runs fine on my mac (mbp 2018 i9) with the same libraries.

Any hint of what could be the issue here? I am quite puzzled

luca_giacomel · July 27, 2018, 1:15pm

Note that if I change the model from Beta(0.5, 0.5) to Beta(0.8, 0.8), which generates much smaller values from the Geometric output node (maximum value gets 1/1000 smaller) the model fits fine on both machines

falk · July 28, 2018, 5:37pm

Errors like this might be due to the random character of MCMC sampling, which might occasionally get stuck (in which case it would not be system-specific, but coincidence; see suggestion below). However, I guess you tried more than once.

In your definition of p, you use the shape parameter:

Why do you set shape=2000? This creates an “array” (sorry if the term is inappropriate) of 2000 “pm.Beta” objects.
instead, I guess you want one ‘p’, right?
Best,

Falk

To compare across systems, you can try fixing the random state in numpy and pymc3/theano, so that the sampling trace should be identical on both machines:

choose a number (seed)
set np.random.seed(seed) just after the import of numpy
sample with: trace = pm.sample(random_seed = seed)

junpenglao · July 28, 2018, 6:36pm

Side note, even with the same seed macOS and Linux might not be the same, as the same seed does not produce same result under different OS.

luca_giacomel · July 30, 2018, 8:44pm

Falk, it’s a different model actually. I want to generate 2000 Geometric with different p genereted from a Beta(0.5, 0.5).

Yes the seed fixing doesn’t really change things. Still quite puzzled as I tried running this analysis a ton of time and still obtain the same results.

falk · July 31, 2018, 7:16am

So what do you intend to model?
The problem I see, though I might be wrong, is that you have observed data of shape (2000,) and you put that into an array of distributions with shape (2000,). I guess the program interprets this as 2000 single observations for 2000 variables, which means there is 1 observation for each Geometric. The p you receive is then just the hyperprior of all those individual pm.Geometric objects in the array.

So maybe you intended to do this:

observations_per_variable = 1000
n_variables = 2000
data = [np.random.geometric(i) \
            for i in np.random.beta(0.5, 0.5 \
                              , size=(observations_per_variable,n_variables) \
                )]

?

When I did this, though, I got ValueError: Mass matrix contains zeros on the diagonal. , so there must be something else.
I also tried adjusting your alpha and beta priors:

 p = pm.Beta('p', pm.HalfNormal('alpha', 1.), pm.HalfNormal('beta', 1.) \
                , shape=(1,n_variables))

but still the error.
I’ve never used Geometric, so someone else might help. But maybe you can tell what it is that you actually want to model?
Best,

Falk

(sorry for my lack of deeper knowledge of the seed thing - thanks @junpenglao for putting it right)

luca_giacomel · July 31, 2018, 2:50pm

Thanks for taking your time to reply.

I dont’ think that the problem is what I am trying to model here. It’s a toy example where I try to recover the parameters of some Beta distribution (0.5 and 0.5) that generates the p`s for a geometric distribution. There might be no practical reason for this, but what happens is that the sape of Beta(.5,.5) places high probability on values near zero and values near one

svg

And given that I sample 2000 points from there, you can assume some will have very low values.

The geometric distribution will then spit out very big numbers for very low values of P and that’s what I believe to be the issue here across different OSes (but I don’t know how/why this happens). To confirm this, with higher parameters for the original Beta, the Geometric will output much smaller numbers and I am able to fit the model on both OSes.

Topic		Replies	Views
Value error of pm.model Questions	8	599	May 28, 2019
PyMC3 on AWS EC2 Questions	15	1745	December 12, 2017
`sample_node` no longer working? Questions	10	707	September 11, 2017
Bad initial energy error Questions	2	831	May 11, 2021
Numeric instability in model? v5 theano , bug	7	383	November 28, 2022

Model fits fine on OSx but not on Ubuntu 16.04

Related topics