Modeling Zero-Inflation on continuous outcome

tjburch · February 18, 2021, 7:15pm

I’m trying to model an outcome that’s effectively log-normal. However, there’s also a large “zero-inflation” on top of this. I put that in quotes because only familiar with models for zero-inflation for count data, so I’m not positive how to implement this in the context of a continuous distribution.

My thought was to model “is this value zero” via logistic regression and “if it’s non-zero, what will the value be” via a linear regression model. It’s not immediately clear to me how to tie the two together, though. (Or if there’s a more standard way to approach this problem)

The outcome:

Log-transform for values >0:

Here’s what my attempt looked like:

 with pm.Model() as lin_model:

    # Data 
    x_one = pm.Data("x_one", X_train["x_one"])
    x_two = pm.Data("x_two ", X_train["x_two"])
    x_three = pm.Data("x_three", X_train["x_three"])

    # Linear Model
    α = pm.Normal("α", 0, 10)
    β_one = pm.Normal("β_one", 0, 5)
    β_two = pm.Normal("β_two", 0, 5)
    β_three = pm.Normal("β_three", 0, 5)
    
    μ = pm.Deterministic(
        "μ",
        α + 
        β_one * x_one +
        β_two * x_two +
        β_three * x_three 
    )
    
    # Logistic Regression
    θ = pm.Deterministic("θ", pm.math.sigmoid(μ))
    log_response = pm.Bernoulli('y_logistic', p=θ)    

    # Linear Regression
    σ = pm.HalfCauchy("σ", 20)            
    likelihood = log_response * pm.Normal('y', 
        μ,
        σ,
        observed=y_train
    )
    
    model_trace = pm.sample(return_inferencedata=True)

But this fails with a “Wrong number of dimensions” error on the pm.Bernoulli line, probably because I’m using in this weird way. Any suggestions on approach here would be appreciated.

RavinKumar · February 20, 2021, 4:39pm

This is an off the cuff answer so don’t take it as true fact, but one trick may to just add 1 to everything and shift the distribution so you can model it

ricardoV94 · February 25, 2021, 7:19am

Are your values actually zero (hard boundary) or just close to zero (soft boundary)? Do the zeros have any informative value for your modeling goals or are they just inconsequential ”flukes"?

tjburch · February 25, 2021, 3:00pm

@ricardoV94 - It’s sort of both, but closer to a hard boundary - 70% of the data set is exactly 0. The zeros do have a very informative value for the modeling goal, they’re cases in which an initial threshold was not passed, so no value was accumulated.

jonsedar · March 24, 2021, 5:29am

I needed a similar likelihood (actually two) for an insurance loss-cost frequency-severity model.

You sometimes see these referred to as “zero-augmented” likelihoods, since the non-zero marginal dist isn’t defined at x=0. Though many people still call them zero-inflated.

In any case I’ve found you can treat them as a hard mixture-model, with a binary likelihood for zero/non-zero (Bernouilli etc) fitted using the full dataset, and your non-zero marginal diet of choice (here a lognormal) fitted using only the dataset with non-zeros in the target feature.

McElreath has a nice paper that uses this principle: “Using Multilevel Models to Estimate Variation in Foraging Returns: Effects of Failure Rate, Harvest Size, Age, and Individual Heterogeneity. Human Nature”, 25, 100-120. GitHub - rmcelreath/mcelreath-koster-human-nature-2014: Data and model fitting scripts from McElreath & Koster. 2014. Using Multilevel Models to Estimate Variation in Foraging Returns: Effects of Failure Rate, Harvest Size, Age, and Individual Heterogeneity. Human Nature, 25, 100-120.

tjburch · March 24, 2021, 1:58pm

That’s great info - thanks @jonsedar!

Mehrdad_Vossoughi · November 11, 2024, 7:28pm

Hi every one
Try Zero-inflated gamma regression. The distribution approximate the raw data and also log-normal transformed data properly.

Topic		Replies	Views
Likelihood for regression problem in which the response is continous and zero-inflated, mixtures? any examples? v5	2	592	April 7, 2023
Zero inflated Normal Questions development	5	3081	March 8, 2021
Zero One Inflated Beta Regression Questions	14	1171	January 26, 2024
Zero-inflated Student-T Questions	6	948	January 25, 2021
Literature on implementing a Zero-Inflated Beta likelihood Questions	19	2269	January 18, 2019

Modeling Zero-Inflation on continuous outcome

Related topics