Recovering parameters of long tailed Beta-Geometric data

Very simple question. I am trying to model long tailed Beta-Geometric synthetic data which I create as:

from matplotlib import pylab as plt
import numpy as np
import pandas as pd
import pymc3 as pm
from pymc3.variational.callbacks import CheckParametersConvergence
%matplotlib inline 

population_size = 100000
a, b = 0.33, 0.85

be = np.random.beta(a, b, size=population_size)
geo = []
for i in be:
    geo.append(np.random.geometric(p=(1-i)))
geo = np.array(geo)

print("Beta mean:", np.mean(be))

print("Syntethic population data follows a Beta distribution with a=%s; b=%s and mean=%s" % (a, b, np.mean(be)))
plt.grid()

plt.title('Observed Geometric')
_ = plt.hist(geo, bins=20)

download-99

And then I try to recover my beta parameters using Metropolis:

import pymc3 as pm

model = pm.Model()

with model:    
    beta_alpha = pm.Uniform('beta_alpha', 0.0001, 1)
    beta_beta = pm.Uniform('beta_beta', 0.0001, 1)
    
    beta = pm.Beta('beta',
                   alpha=beta_alpha,
                   beta=beta_beta)
    
    renewals = pm.Geometric('renewals',
                            p=(1-beta), observed=geo)
    
    trace = pm.sample(
        100000,
        step=pm.Metropolis(),
        njobs=4
    )
    pm.traceplot(trace)

The issue here is that the parameters are recovered incorrectly and also the mean of the beta seems to be wrong

Probably the reason is that my distribution is very very long tailed and this messes up with the simulation. However maybe you can you point me in some direction to improve this?

Thanks in advance for the help!

Your data generation method differs from the model: In the data generation you have one beta variable per observation, in the model you only have one global beta distributed parameter. Just add shape=population_size to beta = pm.Beta(....):

with model:    
    beta_alpha = pm.Uniform('beta_alpha', 0.0001, 1)
    beta_beta = pm.Uniform('beta_beta', 0.0001, 1)
    
    beta = pm.Beta('beta',
                   alpha=beta_alpha,
                   beta=beta_beta,
                   shape=population_size)
    
    renewals = pm.Geometric('renewals',
                            p=(1-beta), observed=geo)
    
    trace = pm.sample()
    pm.traceplot(trace)

You might want to decrease population_size, you’ll have a hard time with that many parameters.

There is no reason to use metropolis, just use the defaults (nuts).

1 Like