Hi Forum,
I am pretty new to Bayesian statistics and pymc3, apologies if I read really bad at it… and thanks for your help anyway!
I am trying to use some hierarchical model in order to determine the distribution of losses incurred month after month by an insurance company. I have found some documentation with models in Stan (which I have never used whilst I quite like Python!) and essentially, the idea is:
- for each cohort, determine a cohort-specific distribution of what can be called a ‘ultimate loss’; by cohort, I mean, a group of insurance policies that have somewhat similar characteristics. Practically speaking, a cohort groups policies issued during the same quarter or year or month. In my case, a cohort is for instance 2019_1 which means, ‘Jan 2019’. There are therefore as many Ultimate_Loss distributions as cohorts. Let’s say the dimension is nb_cohorts.
- determine an omega and a theta distribution that will be applied to all cohorts, so that the losses we observe as time passes follow the equation Avg_Loss_Cohort_i_Period_k = Ultimate_Loss_Cohort_i * (1-exp(-(k/theta)**omega)),
- define a sigma distribution so that sigma_i_k = sigma * sqrt(Avg_Loss_Cohort_i_Perod_k) to account for a lower variability of losses for early observations, and where sigma follows a half normal distribution
-calibrate losses so that losses_i_k follow a gaussian distribution(mu = Avg_Loss_Cohort_i_Period_k, sigma = sigma_i_k)
In practice, the observations for ‘old cohorts’ range from 1 to 20 as we already have 20 months of observations, while for the cohort that is x months old, we only have x months of observation. In other words, for the most recent cohort, we only have one observation. So, whilst time ranges from 1 to 20 for an old cohort, it only takes one value for the most recent one: 1
I am struggling to set up the data right in order to account for the triangular shape of the data. I have it under vectorized form that looks like.
Here is a bit of code, but obviously, if it does run, it does not provide anything relevant.
def Cohorts_Losses_Calibration(inputs):
cohorts = pd.DataFrame(inputs['Cohort'])
Periods = pd.DataFrame(inputs['Period']).to_numpy()
Losses = pd.DataFrame(inputs[L]).to_numpy()
uniquecohorts = np.unique(cohorts)
nbcohorts = len(uniquecohorts)
uniquePeriods = np.unique(Periods)
nbPeriods = len(uniquePeriods)
Periods = Periods[:,None] # transpose Periods
with pm.Model() as model:
omega = pm.HalfNormal("omega", sigma = 3)
theta = pm.HalfNormal("theta", sigma = 5)
ultimate_loss = pm.TruncatedNormal("ultimate_loss", mu = 60, sigma = 10, lower = 10, upper = 110, shape = (nbcohorts, 1))
sigma = pm.HalfNormal("sigma", sigma = 20)
mu_i_t = pm.Deterministic("mu_i_t", ultimate_loss * (1 - np.exp(-((uniquePeriods/theta) ** omega))))
sigma_i_t = pm.Deterministic("sigma_i_t", sigma * mu_i_t ** 0.5)
Losses_i_t = pm.Normal("Losses_i_t",
mu = mu_i_t,
sigma = sigma_i_t,
observed = Losses,
shape = (nbcohorts, nbPeriods))
results = pm.sample(draws = 50, tune = 50, cores = 8)