Posterior predictive checks for dimension and with applied filters on values

People, hey!

I wrote model to describe difference in ARPU (average revenue per user) of two countries.

I used data stored in pandas dataframe called data where every row = 1 user and columns:

  • player_id – unique user id
  • country_code – US or JP
  • revenue_7 – cumulative revenue to 7th day of user’s life (~95% of users are not payers, for them value = 0)

Model:

country = np.array(['US', 'JP'])
country_idx = pd.Categorical(data['country_code'], categories=country).codes
coords = {'country': country, 'country_flat': country[idx]}

with pm.Model(coords=coords) as model:
    
    psi = pm.Beta('psi', alpha=1, beta=1, dims='country')
    mu = pm.HalfNormal('mu', sigma=10, dims='country')
    sigma = pm.HalfNormal('sigma', sigma=15, dims='country')
    y = pm.HurdleGamma('y', mu=mu[country_idx], sigma=sigma[country_idx], psi=psi[country_idx], observed=data['revenue_7'])

    revenue = pm.Deterministic('revenue', psi * mu, dims='country')

    diff = pm.Deterministic('diff', revenue[0] - revenue[1])
    
    idata = pm.sample()
    idata.extend(pm.sample_posterior_predictive(idata))

Can you help me please with following questions :slightly_smiling_face::

  1. How to plot posterior predictive checks for dimensions separately (2 countries) using az.plot_ppc?
  2. How to ignore in vizualization zero values? it’s needed because psi is less then 5% and validation of revenue spread is impossible, there is only spike in x:0 definable and don’t see anything else. I wanna look at KDE for only payers, revenue tail (hope you understand). Or can you suggest better model setup that make inference data better structured for post-analysis :slight_smile: