 # Appropriate params for NegativeBinomial Likelihood and model evaluation

My discrete target variable (between 0 and 20) distribution is following : I’m trying to build Model by using NegativeBinomial likelihood.
my “beta” priors can’t be negative due to business problem.
How do I specify mu in my case:

mu = alpha + pm.math.dot(x, beta)
or:
mu = pm.math.exp(alpha + pm.math.dot(x, beta)

with pm.Model() as model_negative_binomial:

``````    # Intercept
alpha = pm.Normal('alpha', mu=y.mean(), sd=10)
# Slope

beta = pm.HalfNormal('beta', sd = 10, shape = len(data.columns[:-1]))
# Error term
eps = pm.Gamma('eps', alpha = 1, beta = 0.5)

# Expected value of outcome (ML Regression with vectors)
mu = alpha + pm.math.dot(x, beta)
# mu = pm.math.exp(alpha + pm.math.dot(x, beta)

# Likelihood (sampling distribution) of observations
conv = pm.NegativeBinomial('conv',
mu = mu,
alpha= eps,
observed=y)
pm.Potential('constrain', tt.switch(conv > y.max(), -np.inf, 0.))
trace_negative_binomial = pm.sample(chains = 4, target_accept = 0.95)
``````

I’m trying to run pm.sample_prior_predictive() and getting foll. errors:

• with:
mu = pm.math.exp(alpha + pm.math.dot(x, beta))
ValueError: lam value too large
• with:
mu = alpha + pm.math.dot(x, beta)
ValueError: Domain error in arguments.

About evaluation (can R2 be considered as appropriate eval. metric in case NegB as Likelighood?)
There is example in pyc3 documentation how to Use GLM with NegativeB likelihood but no evaluation or modelfit explanation. (GLM: Negative Binomial Regression — PyMC3 3.11.2 documentation).

my notebook:

or model comparison:

So I was assuming that ValueError: Domain error in arguments was given because likelihood pm.NegBinom was getting not acceptable values. I fixed it with setting
mu = pm.math.invlogit(alpha + pm.math.dot(x, beta)) and able to get prior check which gave me what I look for :
with model_negative_binomial:
prior_pred = pm.sample_prior_predictive()

plt.hist(prior_pred[‘conv’],
color = ‘cornflowerblue’,
width=0.9, bins =100)

plt.title(label= ‘Target variable distribution’)
plt.show() But then posterior_pred_check looks like :

With very low R2. I’m trying to find what would be appropriate metrics to understand if model has a good fit