HalfStudentT distribution for the Likelyhood. The derivative is zero problem

This is my target Distribution:

I have lots of features for this model and using GLM module:

formula for bayesian multiple regression

formula = ‘TOTAL_CONV ~ ’ + ’ + ‘.join([’%s’ % variable for variable in data.columns[:-1]])

constructing priors:

d_coef = {}
for val in data.columns[:-1]:
d_coef[val] = d_coef.get(val, pm.HalfStudentT.dist(sd= 20, nu=1/30))

d_intercept = {‘Intercept’: pm.Normal.dist(mu=0,sd=20)}

combining priors

d_priors = {**d_intercept,**d_coef}

context for the model

with pm.Model() as model_glm_halfstud:

pm.GLM.from_formula(formula,
                    data = data, 
                    priors = d_priors
                   )
trace_glm_halfstud = pm.sample(chains = 2, target_accept = 0.9)

It brakes on Chaine ‘0’ and give me this:

ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV TV_SYFY_log__.ravel()[0] is zero.
The derivative of RV TV_SMITHSONIANNETWORK_log__.ravel()[0] is zero.
The derivative of RV TV_LIFETIME_log__.ravel()[0] is zero.
The derivative of RV TV_IFC_log__.ravel()[0] is zero.
The derivative of RV TV_BETHER_log__.ravel()[0] is zero.
The derivative of RV TV_ANE_log__.ravel()[0] is zero.

Do I need to change Constraint on priors ? And How ? Thank you.

Often, you are going to want to use a HalfStudentT (or a HalfNormal or HalfCauchy) when you want to constrain a distribution away from a boundary. This boundary is often zero. So all of these “half” distributions are specified such that their “full” counterparts would have a location (e.g., mean) of zero.

So I assume you are dealing with one of two situations. First, maybe you are using a Student’s t distribution to allow for observations far from the central “pile” of observations and you also want to constrain the likelihood to be positive (e.g., if negative observations are impossible). Second, maybe you are confident that your data is distributed according to a half-Student’s t, but aren’t sure where the boundary is. I assume the former, because the latter seems implausible to me (correct me if I am mistaken).

If it is indeed the former, then I would suggest that a HalfStudentT isn’t what you are looking for. Maybe something more like Gamma? Or even exGaussian? Those both get you positive support and some control over the location. There are others as well.

Thank you cluhmann!
I updated my question just now while you were replying.
So I’m using GLM module and specifying HalfStudent.
Now it gives me this:

ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV TV_SYFY_log__.ravel()[0] is zero.
The derivative of RV TV_SMITHSONIANNETWORK_log__.ravel()[0] is zero.
The derivative of RV TV_BETHER_log__.ravel()[0] is zero.

Now it appears that you have HalfStudentT priors on your coefficients? Is that what you intended?

Yes. Using HalfStudent but getting derivatives of RV features zero:

The derivative of RV TV_WETV_log__.ravel()[0] is zero.
The derivative of RV TV_TRUTV_log__.ravel()[0] is zero.
The derivative of RV TV_SYFY_log__.ravel()[0] is zero.
The derivative of RV TV_SMITHSONIANNETWORK_log__.ravel()[0] is zero.
The derivative of RV TV_POP_log__.ravel()[0] is zero.
The derivative of RV TV_OPRAHWINFREYNETWORK_log__.ravel()[0] is zero.
The derivative of RV TV_LIFETIME_log__.ravel()[0] is zero.
The derivative of RV TV_IFC_log__.ravel()[0] is zero.
The derivative of RV TV_GAMESHOWNETWORK_log__.ravel()[0] is zero.
The derivative of RV TV_COMEDYCENTRAL_log__.ravel()[0] is zero.
The derivative of RV TV_BRAVO_log__.ravel()[0] is zero.
The derivative of RV TV_BETHER_log__.ravel()[0] is zero.
The derivative of RV TV_BBCAMERICA_log__.ravel()[0] is zero.
The derivative of RV TV_AMC_log__.ravel()[0] is zero.
The derivative of RV TV_ANE_log__.ravel()[0] is zero.
The derivative of RV Intercept.ravel()[0] is zero

Ok, then I am confused. Now you expect that all of your predictors/features will have positive marginal associations? And your outcome variable can take on any value, including negative values?

I would sort out what you are trying to accomplish first and only then start building your model (and do it bit by bit). Dumping everything you have into a model that hasn’t been well-designed isn’t going to get you very far.

Back to this model.
My Outcome can take any continuous value from [0, infinity)

My Intercept can be normally distributed
My slope can have only positive outcome [0, infinity)

What should be my Likelihood? And how should I code it to including ‘mu’ ?

with pm.Model() as model_mlr_log:

# Intercept
alpha = pm.Normal('alpha', mu=1, sd=10)

# Slope
beta = pm.HalfNormal('beta', sd =10, shape = len(data.columns[:-1]))

# Error term
eps = pm.HalfCauchy('eps', 5)

# Expected value of outcome (ML Regression with vectors)
mu = alpha + pm.math.dot(x, beta)

# Likelihood
??????????????

# posterior
trace_log = pm.sample(chains = 4, target_accept = 0.9)

if I make Likelihood as Gamma:

conversion = pm.Gamma(‘conversion’, mu= mu, sigma= eps, observed=y)

It doesn’t work as my target variable has values “0”.
getting this error :

Initial evaluation results:
alpha -3.22
beta_log__ 0.00
eps_log__ -1.14
conversion -inf
Name: Log-probability of test_point, dtype: float64

It works When I delete data that contains target variables as “0”

The basic problem is that gamma only has support on x \in (0, \infty) so your likelihood isn’t going to be able to accommodate observed values of zero. Your intercept is very likely negative (mean=1, but SD=10), which could (depending on the values of your predictors and coefficients) yield negative (i.e., invalid) means for your likelihood.

As I stated here, outcomes that are counts are commonly modeled as Poisson, ZIP, and negative binomial.

Thank you for helping.
I didn’t think that I can use Poisson, ZIP if my outcome values - continuous. So I ran the following:

with pm.Model() as model_poisson:

# Intercept
alpha = pm.Normal('alpha', mu=y.mean(), sd=10)

# Slope
beta = pm.HalfFlat('beta', shape = len(data.columns[:-1]))

# Error term
eps = pm.HalfCauchy('eps', 5)

# Expected value of outcome (ML Regression with vectors)
mu = alpha + pm.math.dot(x, beta)

# Likelihood
conversion = pm.Poisson('conversion', mu= mu, observed=y)

# posterior
trace_poisson = pm.sample(chains = 4)

Then I did Posterior_predictive :

ppc_poisson = pm.sample_posterior_predictive(trace_poisson, samples=200, model=model_poisson)

and plot ppc results:

ppc_poisson

Then I look at r2 score:
az.r2_score(data.TOTAL_CONV.values, ppc_poisson[‘conversion’])

r2 0.101319
r2_std 0.023821
dtype: float64

My r2 is so low. Is it normal? What does it mean? Why did it happened as ppc mean align with Observed values as I can see from the Graph?
Please advise If there any other metrics for model evaluation. Thank you.

The topic of model checking can be a bit involved because it is often very dependent on your goals (which I am still a bit confused about here). I would check the Diagnostics and Model Criticism notebooks available here. You can also check out several talks from last year’s PyMCon, like this one and this one. For a deeper dive, you can check out Aki’s keynote.

So I’m using ZeroInflated Poisson as Likelihood:

with pm.Model() as model_zero_poisson:

    # Intercept
    alpha = pm.Normal('alpha', mu=y.mean(), sd=10)

    # Slope
    beta = pm.HalfFlat('beta', shape = len(data.columns[:-1]))

    # Error term
    eps = pm.HalfCauchy('eps', 5)

    # Expected value of outcome (ML Regression with vectors)
    mu = alpha + pm.math.dot(x, beta)

    # Likelihood
    conv = pm.ZeroInflatedPoisson('conv', 
                                  theta= mu, 
                                  psi= 0.7,  
                                  observed=y)

    trace_zero_poisson = pm.sample(chains = 4, target_accept = 0.9)

I’m getting R2 for Model is : 0.825

My target variable range is from 0 to 20. But predictions from posterior pred check goes from 0 to 60 as shown on graph below:

image

I found in pymc3 documentation following:

  • Bounds cannot be given to variables that are observed . To model truncated data, use a Potential() in combination with a cumulative probability function. See this example notebook.

The links to example doesn’t work. How can I use Potential() to set limit on my likelihood or predictions ?
Thank you.