Shape of Trace: number of columns depends on observations

Overview. We have a group of persons (eg 2). Each person performs a different number of experiment n (eg 10, 20) and we record k, ie how many times the experiment was a success (eg 2, 3). For each person, we have have some information like age (eg 22, 34).

I’d like to estimate the success rate (theta) and how this is affected by the age. This is the model I’ve implemented:

model = pm.Model()

n = [10, 20]
k = [2, 3]
age = [22, 34]

with model:
    intercept = pm.Normal('intercept', mu=0, sigma=10)
    slope = pm.Normal('slope', mu=0, sigma=100)
    y = intercept + slope * age
    theta = pm.Deterministic('theta', pm.math.sigmoid(y))
    binominal = pm.Binomial('binominal', n=n, p=theta, observed=k)

    p_trace = pm.sample(50, init='adapt_diag')

It is modeled as a Binomial distribution for the experiments where the success rate is a logistic regression with the age as an input.

The trace of theta has a number of columns equal to the number of observations (2).

(100, 2)

Why? In this toy example, I set just 2 observations, but they are way more in reality. I cannot analyze the traceplot (too many plots). How can I model the above by then having just one theta trace? Should I change the model?

Hey there!
I think your model is behaving as expected: the deterministic variable is computed over the 2 given values of age, which is why you get two traces for theta – one for age = 22, the other for age = 34.

If you wanna see how the probability changes with new values of age, I think you have two options:

  • set age as a shared variable to resample the model later with new values of the predictor – then you’ll get as much theta traces as you have predictor values
  • reconstruct your GLM after the model has sampled – you just do logistic(p_trace["intercept"] + p_trace["slope"] * age as these are numpy arrays

These two options achieve the same goals, but one is done during sampling, the other afterwards – pick your favorite :wink: