Constrain the sum of two categorical variables

I was trying to constrain the sum of several variables, but am finding this unexpectedly hard in pymc3. Here is a minimalistic example:

import numpy as np
import pymc3 as pm

with pm.Model() as model:
    i = pm.Categorical('i', np.arange(10)+0.)
    j = pm.Categorical('j', np.arange(10)+0.)
    total = pm.Normal("total", mu= i + j, sigma=1., observed=3.)
    trace = pm.sample(1000, tune=1000, discard_tuned_samples=True, return_inferencedata=True)

Here the prompt:

Multiprocess sampling (4 chains in 4 jobs)
CategoricalGibbsMetropolis: [j, i]

100.00% [8000/8000 00:04<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.
The number of effective samples is smaller than 25% for some parameters.

The algorithm does not converge …

(Now if I look at the trace, the result of i and j is as expected, a number between 0 and 9, but the total is completly off the mark, with negative values mostly.)

total (chain, draw, total_dim_0) float64 -1.419 -0.9189 … -2.919 -2.919

It looks like a bug to me. Is there any other way of doing this?

The reason why I am attempting such a task is for a more complex model where I need to first tune sub-components of the models independently from each others, and in a final step, resample from posterior and constrain the sum. I could do that by hand quite easily I think, but I cannot figure out how to do it in pymc3. Anyone ? Thanks much !

Where are you getting those values for total? The sum should be ok. To pull the sum out of the observed total, you can just break it down like this:

with pm.Model() as model:
    i = pm.Categorical('i', np.arange(10)+0.)
    j = pm.Categorical('j', np.arange(10)+0.)
    ij = pm.Deterministic('ij', i + j)
    total = pm.Normal("total", mu=ij, sigma=1., observed=3.)
    idata = pm.sample(return_inferencedata=True)

then, inspecting the first sample

idata.posterior.sel(chain=0,draw=0)

yields:

<xarray.Dataset>
Dimensions:  ()
Coordinates:
    chain    int64 0
    draw     int64 0
Data variables:
    i        int64 4
    j        int64 1
    ij       int64 5
Attributes:
    created_at:                 2022-01-22T23:34:29.387184
    arviz_version:              0.11.2
    inference_library:          pymc3
    inference_library_version:  3.11.2
    sampling_time:              1.3836240768432617
    tuning_steps:               1000

Hi @cluhmann,

Thanks for your prompt reply. Indeed, I could verify that ij from your code follow a normal distribution whose result is reasonable (centered on 4, probably because the prior ij is much larger than the observed 3).

I must conclude I misinterpreted the meaning of the log_likelihood field, from which I had read out total. It must be instead the actual log-likehood of the “total” variable, not the variable itself. Seems obvious now that I read your reply.

I was also confused by the warning:

The number of effective samples is smaller than 25% for some parameters.

Thanks much for your time replying, sorry for the kind of basic issue reported here. At least this gives me more confidence dealing with the actual, more complex issue I am set to resolve.

That’s correct. Because total is an observed variable, sampling doesn’t propose new values for it. The sampler proposes values for the (unobserved) parameters (i.e., i and j) and then calculates the probability of the observed values. That’s why I separated the sum from the likelihood/observed variable.