Modeling Zero-Inflation on continuous outcome

I’m trying to model an outcome that’s effectively log-normal. However, there’s also a large “zero-inflation” on top of this. I put that in quotes because only familiar with models for zero-inflation for count data, so I’m not positive how to implement this in the context of a continuous distribution.

My thought was to model “is this value zero” via logistic regression and “if it’s non-zero, what will the value be” via a linear regression model. It’s not immediately clear to me how to tie the two together, though. (Or if there’s a more standard way to approach this problem)

The outcome:
image

Log-transform for values >0:
image

Here’s what my attempt looked like:

 with pm.Model() as lin_model:

    # Data 
    x_one = pm.Data("x_one", X_train["x_one"])
    x_two = pm.Data("x_two ", X_train["x_two"])
    x_three = pm.Data("x_three", X_train["x_three"])

    # Linear Model
    α = pm.Normal("α", 0, 10)
    β_one = pm.Normal("β_one", 0, 5)
    β_two = pm.Normal("β_two", 0, 5)
    β_three = pm.Normal("β_three", 0, 5)
    
    μ = pm.Deterministic(
        "μ",
        α + 
        β_one * x_one +
        β_two * x_two +
        β_three * x_three 
    )
    
    # Logistic Regression
    θ = pm.Deterministic("θ", pm.math.sigmoid(μ))
    log_response = pm.Bernoulli('y_logistic', p=θ)    

    # Linear Regression
    σ = pm.HalfCauchy("σ", 20)            
    likelihood = log_response * pm.Normal('y', 
        μ,
        σ,
        observed=y_train
    )
    
    model_trace = pm.sample(return_inferencedata=True)

But this fails with a “Wrong number of dimensions” error on the pm.Bernoulli line, probably because I’m using in this weird way. Any suggestions on approach here would be appreciated.

This is an off the cuff answer so don’t take it as true fact, but one trick may to just add 1 to everything and shift the distribution so you can model it

Are your values actually zero (hard boundary) or just close to zero (soft boundary)? Do the zeros have any informative value for your modeling goals or are they just inconsequential ”flukes"?

@ricardoV94 - It’s sort of both, but closer to a hard boundary - 70% of the data set is exactly 0. The zeros do have a very informative value for the modeling goal, they’re cases in which an initial threshold was not passed, so no value was accumulated.