Switching between class distributions

Hello,

I am new to probabilistic modeling/programming so please forgive me if I am using the wrong (or weird) terms.
For my research I am trying to model IR spectra by considering them as a summation of peaks, a baseline model and residual noise. The peak shapes and baseline model will evolve into a more complex model eventually, but I want to start simple and build up complexity as I go along. For now I have Gaussian peaks, a constant (y)-offset and normally distributed noise. Depending on the class to which the spectrum belongs the class y-offset is either from distribution A or B. Currently I am using generated data which looks like this:


The blue lines are from class A and red lines from class B.

The model looks something like:

npeaks = 3
nclasses = 2
with pm.Model() as model:
    # priors
    amp = pm.Uniform('amp', 0, x_s.max(), shape=(nclasses,npeaks))
    mu = pm.Normal('mu', mu=np.linspace(x_val.min(), x_val.max(), npeaks), sd=50, 
               shape=(nclasses,npeaks), transform=pm.distributions.transforms.ordered)
    sigma = pm.HalfNormal('sigma', sd=100, shape=(nclasses,npeaks))
    a_ = [pm.Normal('a_%d' % i, mu=0, sd=10, shape=(len(x_s),1)) for i in range(nclasses)]
    
    # spectral lines (per class)
    y_ = [pm.Deterministic('y_%d' % i, (amp[i] * np.exp(-(x_val - mu[i])**2/(2 * sigma[i]**2))).sum(axis=1) 
                       + a_[i]) for i in range(nclasses)]

    # labels (observed from the data, 0 or 1)
    theta = pm.Beta('theta', 1., 1.)
    label = pm.Bernoulli('label', p=theta, observed=Y)

    # switch to spectrum class A/B depending on label
    y_obs = pm.math.switch(pm.math.eq(label, 1), y_[0], y_[1])
    
    sigma_e = pm.Gamma('sigma_e', alpha=1., beta=1.)
    epsilon = pm.HalfNormal('epsilon', sd=sigma_e)

    y_pred = pm.Normal('y_pred', mu=y_obs, sd=epsilon, observed=x_s)

So, depending on the label in the training data (Y), the model switches to variable y_0 or y_1.

The problem I have with this basic model is this:

  • I am now switching between 2 classes but also would like to switch between 4 classes, or N-classes for that matter. Normally I would use a switch case statement, but that is not available. How can I achieve this in the current context?

Any help is greatly appreciated!

Hello. This is definitely possible, and I think you need to utilize a Dirichlet prior to make this work. Look through these two resources for a primer, link1 and link2.

1 Like

I have looked at the two links, but I think the problems solved are different from the one I am trying to solve.

  1. The number of classes is known and fixed (2 or 4 in my case)
  2. Per spectrum I know the label Y (to which class the spectrum belongs)

In the end, the goal is to introduce shared variables for Y and the spectral observations (x_s) so that (after inference) I can:

  1. set the variable Y=z and sample from the distribution z
  2. set a (unseen) spectrum x_s[a] and sample from the posterior distribution label, so I can calculate to which class the spectrum belongs.

An ideal solution would be something like this:

# labels (observed from the data, 0 or 1)
theta = pm.Beta('theta', 1., 1.)
label = pm.Bernoulli('label', p=theta, observed=Y)

sigma_e = pm.Gamma('sigma_e', alpha=1., beta=1.)
epsilon = pm.HalfNormal('epsilon', sd=sigma_e)

y_pred = pm.Normal('y_pred', mu=y_[label], sd=epsilon, observed=x_s) 

In this way, whatever the label code is ([0|1] in this case, but [0|1|2|3] for 4 classes), but that gives the error:

`TypeError: list indices must be integers or slices, not ObservedRV`
1 Like