Modeling with highly-skewed data

Many thanks, @cluhmann and @ricardoV94!

Here’s the corrected code (which I can’t copy-paste directly into this editor from my notebook - long story):

lower_bound = 0
upper_bound = np.max(data['late_days'])

with pm.Model() as threshold_model:
    late_days = pm.Uniform(
    'payment_behavior', 
    lower = lower_bound, 
    higher = upper_bound
    )

    degress_freedom = pm.Exponential('degrees_freedom', lam = 1)
    
    thresholds = pm.StudentT(
    'thresholds',
    nu = degrees_freedom,
    mu = late_days,
    sd = 1,
    observed = data['late_days'].values
    )

    trace = pm.sample(5_000, tune = 5_000) # based on your input re samples

Regarding the other points:

  • I’m running the project on pymc3 3.11.4, (MacOS Ventura)
  • the complete warning stated:
/opt/homebrew/anaconda3/envs/pymc_framework/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:624: RuntimeWarning: overflow encountered in _beta_ppf 
    return _boost._beta_ppf(q, a, b)
Sampling 4 chains for 1

Then the kernel broke and the sampling process was interrupted.

However, I have now rerun the code successfully and conclude my mistake was taking too large sample sizes.

As a follow up question: what criteria could be taken into account for defining the values for draws and for tune params?

Thank you again!

1 Like