Very simple question. I am trying to model long tailed Beta-Geometric synthetic data which I create as:
from matplotlib import pylab as plt
import numpy as np
import pandas as pd
import pymc3 as pm
from pymc3.variational.callbacks import CheckParametersConvergence
%matplotlib inline
population_size = 100000
a, b = 0.33, 0.85
be = np.random.beta(a, b, size=population_size)
geo = []
for i in be:
geo.append(np.random.geometric(p=(1-i)))
geo = np.array(geo)
print("Beta mean:", np.mean(be))
print("Syntethic population data follows a Beta distribution with a=%s; b=%s and mean=%s" % (a, b, np.mean(be)))
plt.grid()
plt.title('Observed Geometric')
_ = plt.hist(geo, bins=20)
And then I try to recover my beta parameters using Metropolis:
import pymc3 as pm
model = pm.Model()
with model:
beta_alpha = pm.Uniform('beta_alpha', 0.0001, 1)
beta_beta = pm.Uniform('beta_beta', 0.0001, 1)
beta = pm.Beta('beta',
alpha=beta_alpha,
beta=beta_beta)
renewals = pm.Geometric('renewals',
p=(1-beta), observed=geo)
trace = pm.sample(
100000,
step=pm.Metropolis(),
njobs=4
)
pm.traceplot(trace)
The issue here is that the parameters are recovered incorrectly and also the mean of the beta seems to be wrong
Probably the reason is that my distribution is very very long tailed and this messes up with the simulation. However maybe you can you point me in some direction to improve this?
Thanks in advance for the help!