Hi, I am these days building my hierarchical model about:
-The model uses a bank account of client ‘A’, with 3 types of money transactions with different nature inside them: day of the month to be executed (day 1 to day 30) and in amount of money it contains.
registrodemovimientos.csv (10.2 KB)
This is my little data I work with in the model.
And this is my entire code I work with:
import pymc3 as pm
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import janitor
import arviz as az
from theano import shared
data = (
pd.read_csv('../registrodemovimientos.csv')
.label_encode('tipo_movimiento')
)
n_day_ofthe_month = len(data.dia_del_mes.unique())
transaction_amount = data['cantidad'].values
idx = pd.Categorical(data['tipo_movimiento'],
categories=['alquiler', 'nomina', 'supermercado']).codes
n_movements = len(np.unique(idx))
dummy_dict = {}
shared_vars = {}
for c in ['dia_del_mes', 'tipo_movimiento']:
dummy_dict[c] = pd.get_dummies(data[c]).iloc[:,1:].values
# setting these as shared variables, will explain later
shared_vars[c] = shared(dummy_dict[c])
# additional shared vars
shared_vars['day_ofthe_month'] = shared(data.dia_del_mes.values-1)
shared_vars['type_movement_idx'] = shared(data.tipo_movimiento_enc.values)
and the pymc3 model of my problem:
with pm.Model() as hierarchical_account:
#hyperpriors
mu_alpha = pm.Normal('mu_alpha', mu=0., sd=50.)
sd_alpha = pm.HalfNormal('sd_alpha', 5.)
mu_beta = pm.Normal('mu_beta', mu=0., sd=50.)
sd_beta = pm.HalfNormal('sd_beta', 5.)
#dias del mes intercepts
dia_alpha = pm.Normal('dia_del_mes', mu=mu_alpha, sd=sd_alpha, shape=n_day_ofthe_month)
#tipo movimiento intercepts
movimiento_beta = pm.Normal('tipo_movimiento', mu=mu_beta, sd=sd_beta, shape=n_movements)
#model error
sigma = pm.HalfCauchy('sigma', beta=5)
#important step
mu = dia_alpha[shared_vars['day_ofthe_month']]+
movimiento_beta[shared_vars['type_movement_idx']]
#likelihood
like = pm.Normal('like', mu=mu, sigma=sigma, observed=transaction_amount)
I have two problems, two doubts, two questions:
-
If i want to indicate in the code that each type of movement has a specific intrinsic frequency, that is, ‘supermercado’ movements are much more frequent than those of ‘nomina’ and ‘alquiler’, how can I implement it in the pymc code?
-
How can I calculate the distribution for the amount of money for a particular type of movement and a particular day of the month executed? (e.g. the probabilities of obtain a certain amounts of money in the case we have
={'tipo_movimiento'='supermercado','dia_del_mes'='8'}
)
Thankyou so much, I’d appreciate any help. Maybe the ‘mu expression’ in the model is in the incorrect form.