Integrating untruncated sum to truncated model

ReverendBae · August 2, 2024, 2:17pm

I am using an insurance claims dataset which only shows claims over a certain size (it is truncated). However, it also includes the sum of all claims seen (but not the number). It strikes me that it would be super useful information to help reconstruct the whole distribution.

However, I’m not sure how to integrate this information when fitting my TruncatedNormal. Does anyone have any insight?

Data setup:

import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import arviz as az

print(f"pymc: {pm.__version__}")

rng = np.random.default_rng(20240802)

# insurance losses
draws = rng.lognormal(10, 3, size=100_000)

# total loss - this number is provided
total_loss = draws.sum()

# we only get individual losses reported above this number
min_reported_loss = 1_000_000

print(f"We see only {(draws > min_reported_loss).mean():%} of losses")

observed_losses = draws[draws > min_reported_loss]

print(f"Sampled mean {np.log(draws).mean():.3f}")
print(f"Sampled std {np.log(draws).std():.3f}")

plt.hist(np.log(draws), alpha=0.5)
plt.axvline(np.log(min_reported_loss), color="black", linestyle="--")

And the model:

print(f"Observed count: {len(observed_losses):,}\n")

with pm.Model() as model:
    mu = pm.Normal("mu", 0, 10)
    sigma = pm.HalfNormal("sigma", 5)
    y = pm.TruncatedNormal(
        "y",
        mu=mu,
        sigma=sigma,
        lower=np.log(min_reported_loss),
        observed=np.log(observed_losses),
    )

with model:
    trace = pm.sample()

pm.summary(trace)

ReverendBae · August 2, 2024, 2:37pm

In writing this post, I actually made what turns out was a great prompt, which resulted in the following solution which does seem to help a lot, especially when the number of draws reduced by a lot:

with pm.Model() as model:
    mu = pm.Normal("mu", 10, 10)
    sigma = pm.HalfNormal("sigma", 5)

    y_obs = pm.TruncatedNormal(
        "y_obs",
        mu=mu,
        sigma=sigma,
        lower=np.log(min_reported_loss),
        observed=np.log(observed_losses),
    )

    # Adding a custom likelihood for the total sum of claims
    total_loss_constraint = pm.Potential(
        "total_loss_constraint",
        pm.logp(pm.Normal.dist(mu=mu, sigma=sigma), np.log(draws)).sum() - total_loss,
    )

ReverendBae · August 5, 2024, 1:57pm

This sneakily does use the number of draws so isn’t actually a good solution.

Topic		Replies	Views
Truncated Normal Questions	8	1898	January 29, 2021
Truncated lognormal observations Questions	1	814	February 19, 2021
Truncated multivariate normal likelihood v5 modeling	10	72	May 30, 2025
Weighted Sum of RVs v5 modeling	7	577	March 28, 2023
Initial evaluation of model at starting point failed, when attempting to use truncated model version agnostic	7	71	November 4, 2024

Integrating untruncated sum to truncated model

Related topics