I’m struggling to understand how observed data works in pymc3. From the information I’ve found so far, these two examples have been the most helpful for getting me as far as I have, but I can’t get my model to work.
As an example of what I’m trying to do, say I have records from customers at a restaurant, recording the temperature of the day on a categorical rating scale out of 5, and whether or not they ordered a main meal, a side, or a beverage. I’ve set up some mock data like so:
import numpy as np
import pymc3 as pm
from theano import shared
no_of_root_categories = 5
no_of_samples = 1000
hot_day = np.random.randint(no_of_root_categories, size=no_of_samples)
option_labels = ["Main", "Side", "Beverage"]
meal_options = np.random.randint(2, size=(len(option_labels), no_of_samples))
I want to model it as a simple Bayesian network, like so:
where the shaded nodes are observed.
This is what I’ve got:
with pm.Model() as crashing_model:
shape = (no_of_samples, no_of_root_categories)
alpha = (1 / no_of_root_categories) * np.ones(shape)
root_prior = pm.Dirichlet("Temperature Rating Prior", alpha, shape=shape)
root = pm.Categorical('Temperature Rating', p=root_prior, shape=no_of_samples, observed=hot_day)
for item, label in enumerate(option_labels):
node_data = meal_options[item, :]
node_prior = pm.Beta(f"{label} Prior",
mu=root,
sigma=root,
shape=no_of_samples,
testval=np.random.randint(1, size=no_of_samples))
pm.Binomial(label, p=node_prior, n=no_of_samples, observed=node_data)
which works, but when I try
with crashing_model:
trace = pm.sample(1000, random_seed=0)
Python exits with a ‘Bad initial energy’ error.
I can create a model which seems to work without the latent variables
with pm.Model() as working_model: # seems to work
root_values = [np.where(hot_day == i)[0].tolist() for i in range(no_of_root_categories)]
root_p = [len(i) / 1000 for i in root_values]
root = pm.Categorical('Temperature Rating', p=root_p)
shared_proportions = shared(np.array([len(hot_day[i]) for i in root_values]))
for item, label in enumerate(option_labels):
node_probs = [sum([meal_options[item, idx] for idx in category]) / len(category) for category in root_values]
theano_probs = shared(np.array(node_probs))
pm.Binomial(label, p=theano_probs[root], n=shared_proportions[root])
but I’m not sure how to translate what I’ve done there to work with the latent variables. Any help would be appreciated.
I posted this question on SO too before realising that this is a better place for it. Thinking further about it since then, I realised there’s more that I haven’t wrapped my head around. The documentation has been somewhat helpful, but there’s still a lot I’m unsure of, such as how the output from each node works - i.e. why it can be used in the probability of the next node, but also as an index for the theano shared object. I’m new to both pymc3 and theano, so if anyone has some good resources they can share that’d be great too.