Help with fitting Gamma distribution with more than 10k samples

hwassner · March 21, 2019, 4:38pm

Hi,

I’m having hard time to fit a Gamma distribution… The weird thing is that It work nice when the observable samples are <= 10k, at 50k or more I get some convergence warning, that can be mitigated by increasing the number of tuning steps…

But at 200k samples it crashes all the time, with this error message :

ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV alpha_log__.ravel()[0] is zero.

This is my toy dataset (my ‘observables’) :

data = np.random.gamma(shape=1.6,scale=1.0/0.00169,size=10000)

This 10k dataset histogram looks like this :

This is my model :

with pm.Model() as model:
    alpha = pm.Exponential('alpha',lam=1)
    beta = pm.Exponential('beta',lam=1)
    
    obs = pm.Gamma('obs',alpha=alpha, beta=beta, observed=data)
        
    trace = pm.sample(draws=1000, tune=1000,chains=1)

It samples fast : (>500 it/s)

… And nice :

The hidden parameters are correctly estimated. I’m happy …

But now simply adding more observables data, from 10k to 200k for instance make the model crash !!

My toy dataset is then :
data = np.random.gamma(shape=1.6,scale=1.0/0.00169,size=200000)

The model starts to sample ok, but suddenly, at around 30%, it stalls, and then crashes with this error message.

ValueError: Mass matrix contains zeros on the diagonal. 
The derivative of RV `alpha_log__`.ravel()[0] is zero.

I don’t understand this behavior, a model is either good or bad but it should not depend on the number of observables? Intuitively I would even think that the more observable I have, the better it is, but in this case this intuition seems wrong…

Any help will be appreciated.

regard,
Hubert.

junpenglao · March 21, 2019, 4:53pm

I would reason that the increase of observation makes the posterior distribution much more peaky (concentrated), which makes the sampler quickly adapted to a small step size (as the parameter space is much smaller now), which makes it much easier to stumble upon the local maximum and returns 0 gradient error (the derivative at the maximum or minimum is 0).

Gon_F · March 21, 2019, 6:27pm

I am with hwassner here that this behavior seems counter-intuitive, as the data points are all perfectly sampled from the distribution in question, and more datapoints should (in some ways) ease the inferential burden on PyMC’s sampler.

Are the suggestions for efficiently dealing with this to simply mess with tuning and warmup settings, or is this more just an unfortunate downside to nuts that just has to be accepted?

hwassner · March 22, 2019, 9:05am

ok thanks @junpenglao, this interpretation totally make sens for me.

But how can I correct it ?

Is there a way to forbid the too small step size ?
or adding a small jitter to the step size ?

And I’m looking for a scalable solution, able to deal with whatever observable size…

Topic		Replies	Views
Model diagnostics for "Mass matrix contains zeros on the diagonal" Questions	13	15117	May 22, 2022
Gamma GLM problem: ValueError: Mass matrix contains zeros on the diagonal Questions	3	741	November 30, 2018
Use of Inverse Gamma for standard deviation and overflow ("ValueError: Mass matrix contains zeros on the diagonal") Questions	2	584	February 21, 2019
Help with fitting Gamma Distribution Questions	4	5628	February 1, 2019
Help understanding why sampling fails in Zero-Inflated Gamma likelihood model Questions	4	2475	November 29, 2017

Help with fitting Gamma distribution with more than 10k samples

Related topics