I am having trouble figuring out how to combine two normal distributions into the NormalMixture distribution. In general, I have data that can belong to one of two processes:
- A linear regression of the observation onto multiple explanatory variables where each explanatory variable is categorical and pooled together. The observed data are standardized to be unit Gaussian.
- A “dirac delta” around some outlier in the neighborhood of -5. This can be estimated as a Normal distribution with mean -5 and sd = epsilon for some small value of epsilon, say 1e-5.
If I ignore the second process, the following code will produce a pm.Model object describing the generating process:
def assemble_coefficient_vector(feature_shapes, default_sd=0.25):
"""Using the dictionary of features and their dimensions (shapes), assemble
the coefficient vector to be used for regression. We assume that the design_matrix
does not contain a vector of 1's for the intercept.
"""
coefficient_vector = []
for feature in feature_shapes:
# Get number of features
n_k = feature_shapes[feature]
sigma = pm.HalfNormal(f'σ_{feature}', sd=default_sd, shape=(1, 1))
raw = pm.Normal(f'β_{feature}_raw', mu=0.0, sd=1.0, shape=(n_k, 1))
coefficients = pm.Deterministic(f'β_{feature}', raw * sigma)
coefficient_vector.append(coefficients)
coefficient_vector = theano.tensor.concatenate(coefficient_vector, axis=0)
return coefficient_vector
with pm.Model() as model:
coefficient_vector = assemble_coefficient_vector(feature_shapes, fit_intercept=fit_intercept)
mu = theano.sparse.dot(X, coefficient_vector)
kappa = pm.HalfNormal('κ', sd=3.0)
likelihood = pm.Normal('likelihood', mu=mu, sd=kappa, observed=success_rate)
where X is a matrix of shape M x N with M being the number of observations and the feature_shapes variable is a dictionary of key-value pairs that are features-number of covariates, e.g.
feature_shapes = {'feature_1': n_1, 'feature_2': n_2, ..., 'feature_k': n_k}
where \sum_{i=1}^k n_k = N, the number of columns in the design_matrix X. Similarly, success_rate is an M x 1 vector of response data.
It’s not clear how to extend my model, i.e. change the vectors mu and sd in pm.NormalMixture, to include the “choice of either belonging to the first process or the second process” when I have mu of the first process as an M x 1 vector and mu of the second process a 1 x 1 vector of a scalar value.
Note that this question is similar to Literature on implementing a Zero-Inflated Beta likelihood where @lucianopaz was able to provide an example of a mixture model of a “Dirac delta” at zero and a beta distribution.
Also, I tried following along to the following post to no avail: Error creating NormalMixture with standard deviation from list
Is there something obvious that is wrong?