Unexpected log likelihood results

Hello, I have been working on a bayesian poisson GLM using pymc and am interested in analyzing the log likelihood at certain points. To do this I am using model.compile_logp(vars=model.observed_RVs)(point). However, I notice the values of the log likelihood are slightly different than those produced by manually calculating the log likelihood. I have also confirmed that my manual implementation matches the log likelihood produced from fitting an equivalent frequentist model using statsmodels. Any help explaining this discrepancy would be much appreciated!

Can you share code that reproduces the mismatch you’re seeing

data = prepare_data()

coords = {
    "cat1_level": data["cat1_names"],
    "cat2_level": data["cat2_names"],
    "cat3_level": data["cat3_names"],
}

with pm.Model(coords=coords) as model:
    X1_data = pm.Data("X1_data", data["X1"])
    X2_data = pm.Data("X2_data", data["X2"])
    X3_data = pm.Data("X3_data", data["X3"])

    alpha = pm.Normal("alpha", mu=0.0, sigma=SIGMA_PRIOR)
    beta1 = pm.Normal("beta1", mu=0.0, sigma=SIGMA_PRIOR, dims="cat1_level")
    beta2 = pm.Normal("beta2", mu=0.0, sigma=SIGMA_PRIOR, dims="cat2_level")
    beta3 = pm.Normal("beta3", mu=0.0, sigma=SIGMA_PRIOR, dims="cat3_level")
    
    linear_term = (
        alpha
        + pm.math.dot(X1_data, beta1)
        + pm.math.dot(X2_data, beta2)
        + pm.math.dot(X3_data, beta3)
    )

    mu = pm.Deterministic("mu", pm.math.exp(linear_term))

    pm.Poisson("y_obs", mu=mu, observed=data["y"])

    map_estimate = pm.find_MAP()

    logp_fn = model.compile_logp(vars=model.observed_RVs)

map_point = {
    "alpha": map_estimate["alpha"],
    "beta1": np.asarray(map_estimate["beta1"]),
    "beta2": np.asarray(map_estimate["beta2"]),
    "beta3": np.asarray(map_estimate["beta3"]),
}

pymc_log_likelihood = float(logp_fn(map_point))

mu = np.exp(
    map_point['alpha']
    + data["X1"].dot(map_point["beta1"])
    + data["X2"].dot(map_point["beta2"])
    + data["X3"].dot(map_point["beta3"])
)

manual_log_likelihood = np.sum(
    data["y"] * np.log(mu) - mu - gammaln(data["y"] + 1.0)
)

print("\n=== PyMC Log Likelihood ===")
print(f"Log likelihood at MAP: {pymc_log_likelihood:.6f}")

print("\n=== Manual Log Likelihood ===")
print(f"Manual log likelihood at MAP: {manual_log_likelihood:.6f}")

=== PyMC Log Likelihood ===
Log likelihood at MAP: -9115.067879

=== Manual Log Likelihood ===
Manual log likelihood at MAP: -11810.939595

Your manual computation looks like it’s only the log-likelihood without the contributions from the priors

I was under the impression that setting vars=model.observed_RVs in compile_logp() would only calculate the log likelihood rather than the full log posterior

You’re correct, I think @jessegrabowski missed that point

This is how the Poisson logp is implemented in PyMC: pymc/pymc/distributions/discrete.py at f69e159cb8344da8576cdb6b49fc81be766dec5e · pymc-devs/pymc · GitHub

Other than the special behavior of logpow I could imagine some float precision differences? PyTensor won’t do the naive log(exp(mu))

Yes I missed that, sorry :slight_smile:

I found the issue. Many of my y values were not exact integers (eg 1.000001, 0.999999) and rounding to the nearest integer in the preprocessing seems to have solved this. Thanks for your help with this.

2 Likes