Multivariate Switch Function

I’m trying to create a multinomial change point model but am running into problems due to my inexperience with Theano. My attempt looks like this:

import pymc3 as pm    
import theano.tensor as tt
from theano.compile.ops import as_op
import numpy as np
import scipy.stats as stats

X = []
X.extend(stats.multinomial.rvs(n=10, p=[0.1,0.2,0.3,0.4,0.0], size=100))
X.extend(stats.multinomial.rvs(n=10, p=[0.1,0.2,0.2,0.3,0.2], size=25))
X = np.array(X)

@as_op(itypes=[tt.lscalar, tt.dvector, tt.dvector], otypes=[tt.dvector])
def mswitch(tau, p1, p2):
    out = np.empty((len(X),5))
    out[:tau] = p1.reshape(1,-1)
    out[tau:] = p2.reshape(1,-1)
    return out

with pm.Model() as model:
    p1 = pm.Dirichlet('p1', np.ones(5))
    p2 = pm.Dirichlet('p2', np.ones(5))
    tau = pm.DiscreteUniform('tau', 0, len(X))      
    p = mswitch(tau, p1, p2)
    X_ = pm.Multinomial('X_',10, p, observed=X)

    start = pm.find_MAP()
    trace = pm.sample(50, start=start)

This gives me an error on defining X_:

TypeError: For compute_test_value, one input test value does not have the requested type.
The error when converting the test value to that variable type:
Wrong number of dimensions: expected 1, got 2 with shape (125, 5).

Do I need to set the shape when defining p1/p2/X_ somehow?

You can try something like:

def mswitch(X, tau, p1, p2):
    p = tt.zeros(X.shape)
    p = tt.inc_subtensor(p[tt.arange(X.shape[0])<tau, ...], p1)
    p = tt.inc_subtensor(p[tt.arange(X.shape[0])>=tau, ...], p2)
    return p

with pm.Model() as model:
    p1 = pm.Dirichlet('p1', np.ones(5))
    p2 = pm.Dirichlet('p2', np.ones(5))
    tau = pm.Uniform('tau', 0, X.shape[0])      
    p = mswitch(X, tau, p1, p2)
    X_ = pm.Multinomial('X_',10, p, observed=X)

    trace = pm.sample()

Rewrite the mswitch into a theano function, and also using a continuous RandomVariable should make the model works better.

Thanks, that works! It’s a little slow, but I guess it’s a fairly complex model.

Out of interest, what’s the trade-off between Uniform and DiscreteUniform? As in, when should I use one over the other?

Mixing continuous and discrete variable is always suboptimal, as we don’t have reliable inference. DiscreteUniform is sampled using Metropolis, which can stuck easily when mixing with NUTS or HMC. I use uniform here so everything could be sampled using nuts. Note that in your model the switch introduce discontinues in the logp, so nuts might still have problem.

That makes sense, thanks! Don’t suppose there’s any way to avoid introducing a discontinuity with the switch?

You can use a continuous sigmoid function instead of switch, for example you can see some related discussion here: Switch point Metropolis tuning

Of course, in your case since you need to keep the row sum to 1, it might be a bit more tricky… you might need to construct p in the unconstrained parameter space and transform them row-wise to a simplex