Apparently inconsistent posteriors in autoregressive model

Basically I’m doing something along the lines of (this is just a code sketch):

    mu = pm.Normal("mu", mu=0, sigma=0.5)
    sigma = pm.HalfNormal("sigma", sigma=0.2)
    traffic_0 = pm.LogNormal("traffic_0", mu=mu, sigma=sigma)

And I expect that the posterior samples of traffic_0 are similar to taking the posterior samples of mu and sigma and sampling from a lognormal.
However I actually get quite different distributions.

mus = az.extract(idata.posterior, var_names=["mu"])
sigmas = az.extract(idata.posterior, var_names=["sigma"])
samples = pm.draw(pm.LogNormal.dist(mu=mus.values, sigma=sigmas.values))
bins = np.linspace(0, 5, 50)
traffic_0 = az.extract(idata.posterior, var_names=["traffic_0"])

plt.hist(samples, density=True, alpha=0.5, bins=bins)
plt.hist(traffic_0.squeeze(), density=True, alpha=0.5, bins=bins)
plt.legend(["Lognormal with $\mu$,$\sigma$ posteriors", "traffic_0 posterior"])

This feels like I must be missing something very basic but I can’t see what. Now what I’m doing is slightly more complex but not that much.

Autoregressive traffic model

I’m setting up an AR model where the initial value and the innovations come from the same lognormal distribution. In the true data generating process initial values and innovations are actually different, so this posterior “inconsistency” actually adjusts the data better, but I wanted to try a simpler version and I just fail to see how the results I’m getting are possible with the model I’ve defined.

The following is a simplified version of the model I’m using. If the issue is not straightforward and requires debugging the model let me know and I’ll work on an MRE.

def ar_dist(ar_init, rho, mu, sigma, n_steps, size):
    def ar_step(traffic_t1, rho, mu, sigma):
        innov = pm.LogNormal.dist(mu=mu, sigma=sigma)
        traffic = traffic_t1 * rho[0] + innov * (1 - rho[0])
        return traffic, collect_default_updates([innov])
    #
    ar_innov, _ = pytensor.scan(
        fn=ar_step,
        outputs_info=[{"initial": ar_init, "taps": range(-lags, 0)}],
        non_sequences=[rho, mu, sigma],
        n_steps=n_steps,
        strict=True,
    )
    return ar_innov


with pm.Model(coords=coords, check_bounds=False) as ar_model:
    # AR parameters
    rho = pm.Beta("rho", 5, 1, dims=("lags",))
    mu = pm.Normal("mu", mu=0, sigma=0.5)
    sigma = pm.HalfNormal("sigma", sigma=0.2)
    # AR process
    traffic_0 = pm.LogNormal("traffic_0", mu=mu, sigma=sigma, dims=("lags",))
    traffic_t = pm.CustomDist(
        f"traffic_t",
        traffic_0,
        rho,
        mu,
        sigma,
        stops - lags,
        dist=ar_dist,
        dims=("steps"),
    )
    traffic = pm.Deterministic(
        name="traffic",
        var=pt.concatenate([traffic_0, traffic_t], axis=-1),
        dims="stops",
    )
    pm.Exponential("delay", traffic, observed=obs, dims=("instances", "stops"))

I first realised the issue by plotting the first 20 steps (I’m using a shifted geometric rather than an exponential, hence the discrete hdi). In the plot below you can see that for the posterior the initial traffic and the innovations (the latter being the value towards which the traffic converges as steps increase) come from different distributions despite being modelled by the same parameters. At first I thought it could be an issue with scan and the updating of \mu and \sigma but the innovations are actually consistent with these posterior, it’s the initial value (i.e., outside scan) that is not.

Versions

  • pymc 5.17.0

In the simplified example, traffic_0 is still a parameter to be learned. It’s conditioned on mu and sigma, but not uniquely determined by them. Consider the “non-centered” formulation:

mu = pm.Normal("mu", mu=0, sigma=0.5)
sigma = pm.HalfNormal("sigma", sigma=0.2)
offset = pm.Normal("offset", mu=0, sigma=1, size=n)
traffic_0 = pm.Deterministic("traffic_0", pm.math.exp(mu + sigma * offset))

Written this (equivalent) way, it’s clear you need to estimate n “offset” parameters for traffic_0. This is what is going on. Your plot is comparing a posterior (in orange) with a mixed posterior + prior (in blue)