# Shape Parameter for Multiple Treatment Groups

From “Bayesian Methods for Hackers” in Chapter 1 there’s an example where the expected rate of text messages across a single boundary, and the location of the boundary in time, are inferred. I’ve been trying to generalise this such that if we had count data for multiple people we could compute the parameters for each individual separately and simultaneously, without having to create separate likelihood functions (using “fancy indexing” as Eric J Ma calls it) in a style similar to this:

``````with pm.Model() as model:
n_persons = fake_count_data['person_id'].nunique()
n_count_data = len(fake_count_data)/n_persons

alpha = 1.0/fake_count_data['n'].mean()

lambda_1 = pm.Exponential("lambda_1", alpha, shape=n_persons) # Added shape=n here
lambda_2 = pm.Exponential("lambda_2", alpha, shape=n_persons) # Added shape=n here
tau = pm.DiscreteUniform("tau", lower=0, upper=n_count_data - 1, shape=n_persons) # Added shape=n here

tau = pm.DiscreteUniform("tau", lower=0, upper=n_count_data - 1, shape=n_persons) # Added shape=n here

idx = np.arange(n_count_data)
lambda_ = pm.math.switch(tau > idx, lambda_1, lambda_2) # This line fails

observation = pm.Poisson("obs", mu=lambda_[fake_count_data['person_id']], observed=fake_count_data['n']) # # Added [count_data['person_id']] here
``````

However I get an error: `ValueError: Input dimension mis-match. (input.shape = 21, input.shape = 17)`

There seems to be an incompatibility with the dimensions of idx (17 units of time long) and tau (21 people) in the `tau < idx` comparison; every person should have their own tau but idx should simply be a sequence describing the length of the period.

Moreover I would have thought that doing the following would have no effect on the original code:
adding `shape=1` to all three variables: `lambda_1`, `lambda_2` and `tau` (leaving `mu=lambda_` as is in the likelihood function `observation`). Why is my intuition wrong? I get an error this time on sampling: `TypeError: Cannot convert Type TensorType(int64, vector) (of Variable tau_shared__) into Type TensorType(int64, (True,)). You can try to manually convert tau_shared__ into a TensorType(int64, (True,)).`.

Would love some help with this!

You could change that to

``````idx = np.arange(n_count_data)[:, None]
``````

This should allow the condition in the `switch` to broadcast well, and the indexing you do here:

should still work.

TIL that `np.arange(n_count_data)[:, None]` works! This looks to be equivalent to `np.expand_dims(np.arange(n_count_data), -1)` (both turn the index into a `n_count_data x 1` matrix instead of an array).

Thanks for the response! Almost there but not quite, the final line with the likelihood is now providing the error. I’m generating the fake data like so:

``````count_data = pd.DataFrame({
"n": np.random.poisson(lam = 15, size = 74*3),
"person_id": np.repeat([0,1,2], 74)
})
``````

and get a `ValueError: Input dimension mis-match. (input.shape = 222, input.shape = 3)` As a reminder I’m using 3 people, each with 74 items of data.

and as above, the likelihood function is

`````` observation = pm.Poisson("obs", mu=lambda_[count_data['person_id']], observed=count_data['n'])
``````

My suggestion is to separate the person and iid observations into separate axis like this:

``````"n": np.random.poisson(lam = 15, size = (74, 3)), "person_id": np.repeat([[0,1,2]], (74, 1))
``````

and finally set the shape of the observations RV to `shape=n_persons`. The idea behind this is that you want to have the model specified well in the absence of observed data. By specified well I mean that `sample_prior_predictive` should work and its output for the observations RV should have the same shape as your fake data, when you ask to get 74 samples.

1 Like

Got it working thanks for your help lucianopaz! So it seems like with `pm.switch` it’s easier to better to pass data in a wide format, not a tall format (tall formats work fine on other distributions using numpy fancy indexing).

Curious to know the best practice if you have different numbers of observations for each treatment group (NANs give the warning: `Data in likelihood contains missing values and will be automatically imputed from the sampling distribution.`)?

Final code if it will help others:

``````fake_count_data = pd.DataFrame(np.random.poisson(lam = 15, size = (74, 3)))
fake_count_data.columns = ['p0', 'p1', 'p2']

with pm.Model() as model:

n_persons = fake_count_data.shape
n_count_data = len(fake_count_data)

alpha = 1.0/fake_count_data.mean().mean()

lambda_1 = pm.Exponential("lambda_1", alpha, shape=n_persons)
lambda_2 = pm.Exponential("lambda_2", alpha, shape=n_persons)
tau = pm.DiscreteUniform("tau", lower=0, upper=n_count_data - 1, shape=n_persons)

idx = np.arange(n_count_data)[:, None] # Thanks lucianopaz!
lambda_ = pm.math.switch(tau > idx, lambda_1, lambda_2)

observation = pm.Poisson("obs", mu=lambda_, observed=fake_count_data)

``````