NormalMixture regression

I am having trouble figuring out how to combine two normal distributions into the NormalMixture distribution. In general, I have data that can belong to one of two processes:

  1. A linear regression of the observation onto multiple explanatory variables where each explanatory variable is categorical and pooled together. The observed data are standardized to be unit Gaussian.
  2. A “dirac delta” around some outlier in the neighborhood of -5. This can be estimated as a Normal distribution with mean -5 and sd = epsilon for some small value of epsilon, say 1e-5.

If I ignore the second process, the following code will produce a pm.Model object describing the generating process:

def assemble_coefficient_vector(feature_shapes, default_sd=0.25):
    """Using the dictionary of features and their dimensions (shapes), assemble
    the coefficient vector to be used for regression. We assume that the design_matrix
    does not contain a vector of 1's for the intercept.
    """
    coefficient_vector = []

    for feature in feature_shapes:
        # Get number of features
        n_k = feature_shapes[feature]

        sigma = pm.HalfNormal(f'σ_{feature}', sd=default_sd, shape=(1, 1))
        raw = pm.Normal(f'β_{feature}_raw', mu=0.0, sd=1.0, shape=(n_k, 1))
        coefficients = pm.Deterministic(f'β_{feature}', raw * sigma)

        coefficient_vector.append(coefficients)

    coefficient_vector = theano.tensor.concatenate(coefficient_vector, axis=0)

    return coefficient_vector

 with pm.Model() as model:
        coefficient_vector = assemble_coefficient_vector(feature_shapes, fit_intercept=fit_intercept)
    
        mu = theano.sparse.dot(X, coefficient_vector)
        kappa = pm.HalfNormal('κ', sd=3.0)
       
        likelihood = pm.Normal('likelihood', mu=mu, sd=kappa, observed=success_rate)

where X is a matrix of shape M x N with M being the number of observations and the feature_shapes variable is a dictionary of key-value pairs that are features-number of covariates, e.g.

feature_shapes = {'feature_1': n_1, 'feature_2': n_2, ..., 'feature_k': n_k} 

where \sum_{i=1}^k n_k = N, the number of columns in the design_matrix X. Similarly, success_rate is an M x 1 vector of response data.

It’s not clear how to extend my model, i.e. change the vectors mu and sd in pm.NormalMixture, to include the “choice of either belonging to the first process or the second process” when I have mu of the first process as an M x 1 vector and mu of the second process a 1 x 1 vector of a scalar value.

Note that this question is similar to Literature on implementing a Zero-Inflated Beta likelihood where @lucianopaz was able to provide an example of a mixture model of a “Dirac delta” at zero and a beta distribution.

Also, I tried following along to the following post to no avail: Error creating NormalMixture with standard deviation from list

Is there something obvious that is wrong?

Alternatively, we can also say that the first process distribution is SkewNormal distributed and can use pm.Mixture to define the mixture itself.

Maybe you could try to tile the second process’ mu M times to end up with an M x 1 vector with the same variable (or value) repeated M times. That way you would be able to get a nice stack of equally shaped mu tensors and use the NormalMixture.

I think that the tile operation should be something like this

...
_mu2 = pm.Normal('mu2', 0, 1) # Note that mu2 is a scalar
# First we convert mu2 to a (1, 1) shaped array
mu2 = tt.shape_padleft(_mu2, 2)
# Next we tile the (1, 1) array (M, 1) times to end with a (M, 1) shaped array
mu2 = tt.tile(mu2, reps=[M, 1])
...

I’m not entirely sure if this will work well because mu2's shape will be symbolic and would require an .eval() or theano.function() to compute, but try it out.

1 Like

I’ll give that a shot. Thanks!

In a toy example, I’m able to use the tile function to produce a working example:

import scipy
import theano
import pymc3 as pm

X = np.array([[1, 0], [1, 0], [0, 1], [1, 0], [1, 0], [0, 1], [0, 1], [0, 1], [0, 1], [1, 0], [0, 1], [0, 1], [1, 0]])

X = scipy.sparse.csr_matrix(X)
Y = np.array([[5.28708186], [4.96509597], [0.96509597], [ 5.3526662], [6.19951688], [-0.96509597], [0.49597], [-0.56509597], [0.00053345], [4.19951688], [-10], [-10], [-10]])

X = theano.sparse.as_sparse_variable(X)
Y = theano.tensor.as_tensor_variable(Y)

with pm.Model() as model:
    w = pm.Dirichlet('w', a=np.array([1, 1]))

    sd = pm.HalfNormal('sd', sd=1.0, shape=(1, 1))
    mu = pm.Normal('mu', mu=0, sd=sd, shape=(2, 1))

    mu = theano.sparse.dot(X, mu)
    kappa = pm.HalfNormal('kappa', sd=1.0, shape=(1, 1))

    mu2 = theano.tensor.as_tensor_variable(-10)
    mu2 = theano.tensor.shape_padleft(mu2, 2)
    mu2 = theano.tensor.tile(mu2, reps=[13, 1])

    sd2 = theano.tensor.as_tensor_variable(1e-5)
    sd2 = theano.tensor.shape_padleft(sd2, 2)

    pm.NormalMixture('likelihood', w=w, mu=theano.tensor.stack([mu, mu2]), 
                     sd=theano.tensor.stack([kappa, sd2]), observed=Y)

However, when I use this methodology in my current model factory function, I get the following traceback.

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-149-56c6fce440e4> in <module>
      2 preprocessors['design_matrix'].fit(train_data)
      3 
----> 4 model = construct_model(train_data, preprocessors, fit_intercept=False)
      5 
      6 with model:

<ipython-input-148-0dcee7f04157> in construct_model(data, preprocessors, fit_intercept)
     47 
     48         pm.NormalMixture('likelihood',  w=w, 
---> 49                          mu=theano.tensor.stack([mu, degenerate_mu]),
     50                          sd=theano.tensor.stack([kappa, degenerate_sd]),
     51                          observed=success_rate)

/anaconda3/envs/audience-concentration-forecast-service-env/lib/python3.6/site-packages/theano/tensor/basic.py in stack(*tensors, **kwargs)
   4726         dtype = scal.upcast(*[i.dtype for i in tensors])
   4727         return theano.tensor.opt.MakeVector(dtype)(*tensors)
-> 4728     return join(axis, *[shape_padaxis(t, axis) for t in tensors])
   4729 
   4730 

/anaconda3/envs/audience-concentration-forecast-service-env/lib/python3.6/site-packages/theano/tensor/basic.py in join(axis, *tensors_list)
   4500         return tensors_list[0]
   4501     else:
-> 4502         return join_(axis, *tensors_list)
   4503 
   4504 

/anaconda3/envs/audience-concentration-forecast-service-env/lib/python3.6/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    613         """
    614         return_list = kwargs.pop('return_list', False)
--> 615         node = self.make_node(*inputs, **kwargs)
    616 
    617         if config.compute_test_value != 'off':

/anaconda3/envs/audience-concentration-forecast-service-env/lib/python3.6/site-packages/theano/tensor/basic.py in make_node(self, *axis_and_tensors)
   4233 
   4234         return self._make_node_internal(
-> 4235             axis, tensors, as_tensor_variable_args, output_maker)
   4236 
   4237     def _make_node_internal(self, axis, tensors,

/anaconda3/envs/audience-concentration-forecast-service-env/lib/python3.6/site-packages/theano/tensor/basic.py in _make_node_internal(self, axis, tensors, as_tensor_variable_args, output_maker)
   4285                             continue
   4286                         if bflag:
-> 4287                             bcastable[current_axis] = True
   4288                 try:
   4289                     bcastable[axis] = False

IndexError: list assignment index out of range

Here is the model factor code snippet.

with pm.Model() as model:
    coefficient_vector = dfn.assemble_coefficient_vector(feature_shapes, fit_intercept=fit_intercept)
    mu = theano.sparse.dot(X, coefficient_vector)
    kappa = pm.HalfNormal('κ', sd=1.0, shape=(1, 1))

    w = pm.Dirichlet('w', a=np.array([1, 1]))

    degenerate_mu = theano.tensor.as_tensor_variable(DEGENERATE_VALUE)
    degenerate_mu = theano.tensor.shape_padleft(degenerate_mu, 2)
    degenerate_mu = theano.tensor.tile(degenerate_mu, reps=[M, 1])

    degenerate_sd = theano.tensor.as_tensor_variable(1e-5)
    degenerate_sd = theano.tensor.shape_padleft(degenerate_sd, 2)

    pm.NormalMixture('likelihood',  w=w, 
                     mu=theano.tensor.stack([mu, degenerate_mu]), 
                     sd=theano.tensor.stack([kappa, degenerate_sd]), 
                     observed=success_rate)

Does this error make sense?

Ok, I was able to use the tile function to get the model to build and sample, see below:

with pm.Model() as model:
        coefficient_vector = dfn.assemble_coefficient_vector(feature_shapes, fit_intercept=fit_intercept)
        mu = theano.sparse.dot(X, coefficient_vector)
        kappa = pm.HalfNormal('κ', sd=1.0, shape=(1, 1))

        w = pm.Beta('w', alpha=1, beta=1)
    
        degenerate_mu = theano.tensor.as_tensor_variable(DEGENERATE_VALUE)
        degenerate_mu = theano.tensor.tile(degenerate_mu, reps=[M, 1])

        degenerate_sd = theano.tensor.as_tensor_variable(np.array([[1e-5]]))

        pm.NormalMixture('likelihood',  w=[w, 1 - w], 
                         mu=theano.tensor.stack([mu, degenerate_mu]), 
                         sd=theano.tensor.stack([kappa, degenerate_sd]), 
                         observed=success_rate)

Unfortunately, it’s quite slow and I think part of the reason is that mu is a 2 x M vector with M > 6000, but I’m not sure how to refactor this to use in a regression scenario.