Basic population and group level estimation example: suggestions for improvements?

I’m trying to build a basic model to test my understanding of estimating population and group level effects . I’m leaning on existing examples, such as Rugby Analytics and Multilevel Modelling but want to try an even simpler example which is a bit more relevant to my field of interest…

Say I have 3 thermometers sitting in a water bath, and I want to estimate the temperature of the water. I suspect that the thermometers might not all be entirely accurate, and may have different levels of precision, but I don’t know this a priori. By using 3 thermometers I assume that my estimate will be better than choosing one of the thermometers at random to take my measurements.

I assume that the true water temperature (y) is a function of the thermometer recorded temperature with some bias/error associated with it. I constructed some dummy data as follows:

import pymc as pm
import arviz as az
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

n=30  # Number of observations
obs_temps0 = true_temp_mean + np.random.normal(loc=0, scale=0.5, size=n)  # Good, accurate thermometer
obs_temps1 = true_temp_mean + np.random.normal(loc=0, scale=1, size=n)  # Good but less accurate thermometer
obs_temps2 = true_temp_mean + np.random.normal(loc=-2, scale=0.5, size=n)  # Bad thermometer with cold bias

all_obs = np.array([obs_temps0, obs_temps1, obs_temps2]).flatten()

# An array describing which of the obs belong to which thermometer
obs_idx = np.array([np.repeat(0, n), np.repeat(1, n), np.repeat(2, n)]).flatten()
obs_sample = np.tile(range(n), 3)  # Optional: sample number

# Make dataframe
df = pd.DataFrame(data=zip(obs_sample, obs_idx, all_obs), columns=["Sample", "Thermometer", "Temp"], )

I then construct a model that attempts to model a global temperature term (population level effect) and deviations from it for each of the 3 thermometers (random effects) assuming the true temperature is drawn from a truncated Normal distribution, with random effects for the mean and standard deviation (noting that I could have gone with an ordinary Normal distribution, but I’m imposing some prior understanding that the water is not frozen or boiling)

coords = {'Thermometer': ['T0', 'T1', 'T2'],
         'obs_id': df.index}

with pm.Model(coords=coords) as tmodel:
    thermometer_idx = pm.MutableData("thermometer_idx", obs_idx, dims="obs_id")
    # Hyperpriors
    mu_a = pm.Normal("a", mu=0.0, sigma=10.0)
    sigma_a = pm.Exponential("sigma_a", 2.0)

    # Varying intercepts
    a_thermometer = pm.Normal("a_thermometer", mu=mu_a, sigma=sigma_a, dims="Thermometer")
    global_theta = pm.TruncatedNormal('t_global', mu=30, sigma=10, lower=0, upper=100)
    # Expected value, global term plus random effect
    theta = global_theta + a_thermometer[thermometer_idx]
    # Uncertainty terms
    a_thermometer_sigma = pm.Exponential("sigma", 1, dims="Thermometer")
    global_sigma = pm.Exponential("sigma_global", 1)
    sigma = global_sigma*a_thermometer_sigma[thermometer_idx]

    y = pm.TruncatedNormal("y", mu=theta, sigma=sigma, observed=all_obs, dims="obs_id")


Sampling as follows:

with tmodel:
    trace = pm.sample(tune=2000, chains=4, cores=4, target_accept=0.99, draws=4000, return_inferencedata=True)

This take me c. 14 mins and raises some warnings that the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.

but seems to give reasonable answers
Screenshot from 2023-04-05 15-55-20

With a global mean (my proxy for the true water bath temperature)

Screenshot from 2023-04-05 15-56-13

What could I improve here? Is there a better way to formulate this model? The posterior estimate seems very wide, but maybe this is because I left my priors very wide? The energy trace doesn’t look great…
Screenshot from 2023-04-05 16-16-48

The model looks good, the “three thermometer” problem is great to illustrate a hierarchical situation. I will remember it for teaching :wink:

Now, generally, the reason for the wide posterior estimate could be the prior, or too little data. You could safely set sigma in mu_a to start with 2*NP.std(all_obs), and I would also suggest to set the mu to NP.mean(all_obs). But with more data, the priors matter less and close to nothing. Play around by increasing “n”!

Yet the long sampling points to another problem: your “global theta” and “a_thermometer” are, as I would phrase it, “competing” for which of them would hold the actual temperature. There is no “slope” here, as in most examples; you just add two values. The sampler could either put some share of a temperature of, say, 30 degrees, to the global_theta, and another share to a_thermometer. At one instance, the sampler might find theta = global + therm to be 30=30+0, at the next instance it might allocate 30=20+10. Even though you set the thermometer parts to start at zero, they are not constrained to stay there.

So you can remove the “global_…” parts of your model. The true temperature will be your mu_a!
Sampling took 26s for me.

with pm.Model(coords=coords) as tmodel:
    thermometer_idx = pm.MutableData("thermometer_idx", obs_idx, dims="obs_id")

    # Hyperpriors
    mu_a = pm.Normal("a", mu=0.0, sigma=10.0)
    sigma_a = pm.Exponential("sigma_a", 2.0)

    # Varying intercepts
    a_thermometer = pm.Normal("a_thermometer", mu=mu_a, sigma=sigma_a, dims="Thermometer")

    # Expected value, global term plus random effect
    theta = a_thermometer[thermometer_idx]

    # Uncertainty terms
    a_thermometer_sigma = pm.Exponential("sigma", 1, dims="Thermometer")
    sigma = a_thermometer_sigma[thermometer_idx]

    y = pm.TruncatedNormal("y", mu=theta, sigma=sigma, observed=all_obs, dims="obs_id")



1 Like

@falk Thanks! Those suggestions definitely make a good difference.

I’ve also been trying a bambi version of the similar model, which leads to some further questions, specifically to do with construction of the ‘global’ term.

In bambi I would use the formula Temp ~ 1 + (1|Thermometer) to get a global intercept term with random effects per thermometer:

partial_pooling_priors = {
    "Intercept": bmb.Prior("TruncatedNormal", mu=30, sigma=10, lower=0, upper=100),
    "1|Thermometer": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
    "sigma": bmb.Prior("Exponential", lam=1),

partial_pooling_model = bmb.Model(
    formula="Temp ~ 1 + (1|Thermometer)",

The catch is that bambi can’t (yet) handle heteroscedasticity (see also the ‘Distributional Model’ discussion here), but this formulation :point_up_2: is handling the global term and then separate thermometer deviations (random effects), as I was originally trying to do…

Clearly my original attempt isn’t doing this, so what is the equivalent model specification in pymc?

@hsteptoe I never used bambi; I find the formula notation highly ambiguous and limited, also in R and patsy, and never found it worth reading up.

Now I think the heteroscedasticity is well solved above: if you take a_thermometer_sigma, index it (sigma = ...[idx]) and pass that to the posterior/observable distribution, you are fine! T1 has a higher variability in this example.
Yet note that there is a problem in this simple example, I think: sigma_a and sigma are a bit redundant, just as the global_... and mu_a were redundant as we found out above. I do not know how to solve this here because all the distributions force you to give a sigma. It might not matter much. But as I understand and interpret it, the data structure / “experimental desigh” does not allow to distinguish between noise in the measurement device (thermometers, sigma_a) and noise in the actual quantity of interest (physical temperature, sigma).
You see: there is heteroscedasticity, and there is heteroscedasticity :slight_smile:

Does that matter in real life applications? Most of the times it doesn’t: as soon as you have more than one variable, and given there is enough data, the different sigmas will be distinguished by the model and the appropriate variance moved to the right level.

Hope this is somehow understandable :slight_smile: Thanks again for this basic, useful example!

There is crucial piece of information I forgot above:

It is not only the n which matters. Your sample size per thermometer determines how wide or narrow the posterior distributions are for each thermometer, i.e. “on the thermometer level”.

If you want to narrow the global distribution, you need more thermometers! Your sample size for the hyper-prior is exactly three in the example.
So to get a grip on this, not only play around with n, but also with the number of thermometers!

This is a common practical issue! I have had simular situations with colleagues (Biology) designing behavioral experiments with few individuals (=repeats for sampling from a population). They then measure a quadrillion observations per individual (=repeats on individual level).
In that situation, statistical power is great for each individual, but you learn little about the population. More balance is usually advisible: a couple of repeats per individual, a couple of individuals. Same effort, more insight.

1 Like

Sounds a lot like the old generalization problem. You may know a lot about specific subjects but not be able to generalize to others.

Hierarchical models are great at showing this trade-off.

1 Like

Bambi can do distributional models now. See: Distributional models — Bambi documentation

1 Like