Why isn't NUTS sampling with tt.dot or pm.math.dot?

I am trying to implement parts of Facebook’s prophet with some help from this example.

This goes well :slight_smile:, but I am having some problems with the dot product I don’t understand. Note that I am implementing the linear trends.

ds = pd.to_datetime(df['dagindex'], format='%d-%m-%y')

m = pm.Model()
changepoint_prior_scale = 0.05
n_changepoints = 25
changepoints = pd.date_range(
    periods=n_changepoints + 2
)[1: -1]

with m:
    # priors
    sigma = pm.HalfCauchy('sigma', 10, testval=1)
    growth = pm.Normal('growth', 0, 10)

    prior_changepoints = pm.Laplace('changepoints', 0, changepoint_prior_scale, shape=len(changepoints))

    y = np.zeros(len(df))

    # indexes x_i for the changepoints. 
    s = [np.abs((ds - i).values).argmin() for i in changepoints]
    g = growth
    x = np.arange(len(ds))
    # delta
    d = prior_changepoints
    regression = x * g
    base_piecewise_regression = []
    for i in s:
        local_x = x.copy()[:-i]
        local_x = np.concatenate([np.zeros(i), local_x])
    piecewise_regression = np.array(base_piecewise_regression)
#  this dot product doesn't work?
    piecewise_regression = pm.math.dot(theano.shared(piecewise_regression).T, d)

#  If I comment out this line and use that one as dot product. It works fine
#     piecewise_regression = (piecewise_regression.T * d[None, :]).sum(axis=-1)
    regression += piecewise_regression
    y += regression
    obs = pm.Normal('y',
                   mu=(y - df.gebruikers.mean()) / df.gebruikers.std(),
                   observed=(df.gebruikers - df.gebruikers.mean()) / df.gebruikers.std())
    start = pm.find_MAP(maxeval=10000)
    trace = pm.sample(500, step=pm.NUTS(), start=start)

If I run the snippet above with

piecewise_regression = (piecewise_regression.T * d[None, :]).sum(axis=-1)

the model works as expected. However I cannot get it to work with a dot product. The NUTS sampler doesn’t sample at all.

piecewise_regression = pm.math.dot(theano.shared(piecewise_regression).T, d)

Do you see any error?

No, the output I see is the following:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [k, m, delta, sigma]
Sampling 4 chains:   0%|          | 0/4000 [00:00<?, ?draws/s]

And then nothing happens. Can it be because one input is a numpy array? If I change tt.dot with

def det_dot(a, b):
    :param a: (np matrix)
    :param b: (theano vector)
    return (a * b[None, :]).sum(axis=-1)

everything works fine.

yeah it could be, otherwise you can always cast the numpy arrays within a pymc3 model (ie, inside the context manager with pm.Model() as model: ...) using theano.shard(numpy_array)

The problem still occurs with theano.shared. I’ve got a minimal working example:


n_changepoints = 10
t = np.arange(1000)
s = np.sort(np.random.choice(t, size=n_changepoints, replace=False))
a = (t[:, None] > s) * 1

real_delta = np.random.normal(size=n_changepoints)
y = np.dot(a, real_delta) * t

with pm.Model():
    sigma = pm.HalfCauchy('sigma', 10, testval=1)
    delta = pm.Laplace('delta', 0, 0.05, shape=n_changepoints)
    g = tt.dot(a, delta) * t
    obs = pm.Normal('obs',
                   mu=(g - y.mean()) / y.std(),
                   observed=(y - y.mean()) / y.std())
    trace = pm.sample(500)

It seems to have something to do with the size of matrix a. NUTS doesnt’t sample if I start with

t = np.arange(1000)

however the example above does sample when I reduce the size of t to:

t = np.arange(100)

Likely this is a memory problem, if the size of the input makes a differences.