I’m trying to model an outcome that’s effectively log-normal. However, there’s also a large “zero-inflation” on top of this. I put that in quotes because only familiar with models for zero-inflation for count data, so I’m not positive how to implement this in the context of a continuous distribution.
My thought was to model “is this value zero” via logistic regression and “if it’s non-zero, what will the value be” via a linear regression model. It’s not immediately clear to me how to tie the two together, though. (Or if there’s a more standard way to approach this problem)
Log-transform for values >0:
Here’s what my attempt looked like:
with pm.Model() as lin_model: # Data x_one = pm.Data("x_one", X_train["x_one"]) x_two = pm.Data("x_two ", X_train["x_two"]) x_three = pm.Data("x_three", X_train["x_three"]) # Linear Model α = pm.Normal("α", 0, 10) β_one = pm.Normal("β_one", 0, 5) β_two = pm.Normal("β_two", 0, 5) β_three = pm.Normal("β_three", 0, 5) μ = pm.Deterministic( "μ", α + β_one * x_one + β_two * x_two + β_three * x_three ) # Logistic Regression θ = pm.Deterministic("θ", pm.math.sigmoid(μ)) log_response = pm.Bernoulli('y_logistic', p=θ) # Linear Regression σ = pm.HalfCauchy("σ", 20) likelihood = log_response * pm.Normal('y', μ, σ, observed=y_train ) model_trace = pm.sample(return_inferencedata=True)
But this fails with a “Wrong number of dimensions” error on the
pm.Bernoulli line, probably because I’m using in this weird way. Any suggestions on approach here would be appreciated.