# Posterior predictive checks for dimension and with applied filters on values

People, hey!

I wrote model to describe difference in ARPU (average revenue per user) of two countries.

I used data stored in pandas dataframe called `data` where every row = 1 user and columns:

• player_id – unique user id
• country_code – US or JP
• revenue_7 – cumulative revenue to 7th day of user’s life (~95% of users are not payers, for them value = 0)

Model:

``````country = np.array(['US', 'JP'])
country_idx = pd.Categorical(data['country_code'], categories=country).codes
coords = {'country': country, 'country_flat': country[idx]}

with pm.Model(coords=coords) as model:

psi = pm.Beta('psi', alpha=1, beta=1, dims='country')
mu = pm.HalfNormal('mu', sigma=10, dims='country')
sigma = pm.HalfNormal('sigma', sigma=15, dims='country')
y = pm.HurdleGamma('y', mu=mu[country_idx], sigma=sigma[country_idx], psi=psi[country_idx], observed=data['revenue_7'])

revenue = pm.Deterministic('revenue', psi * mu, dims='country')

diff = pm.Deterministic('diff', revenue[0] - revenue[1])

idata = pm.sample()
idata.extend(pm.sample_posterior_predictive(idata))
``````

Can you help me please with following questions :

1. How to plot posterior predictive checks for dimensions separately (2 countries) using `az.plot_ppc`?
2. How to ignore in vizualization zero values? it’s needed because psi is less then 5% and validation of revenue spread is impossible, there is only spike in x:0 definable and don’t see anything else. I wanna look at KDE for only payers, revenue tail (hope you understand). Or can you suggest better model setup that make inference data better structured for post-analysis