Hi, thanks @AlexAndorra
I’ve had a crack at this:
import numpy as np
import copy
import pymc3 as pm
#Creating a synthetic dataset:
data = list()
probs = [0.9, 0.1]
for j in range(100):
observations = list()
#all globes must have at least one measurement:
observations.append(np.random.choice([0,1], p=probs))
#now take several more measurements:
while np.random.uniform(0,1)>0.3:
observations.append(np.random.choice([0,1], p=[0.9,0.1]))
data.append(observations)
##put into long format:
ids = list()
observations = list()
for count, obs in enumerate(data):
for d in obs:
ids.append(count)
observations.append(d)
total_observations = len(observations)
water_observations = copy.copy(observations)
with pm.Model() as planet_model:
a = pm.Exponential("a", 1)
b = pm.Exponential("b", 1)
p_globes = pm.Beta("p_globes", a , b, shape=100)
w = pm.Binomial("w", p=p_globes[ids], n=total_observations, observed=water_observations)
with planet_model:
trace = pm.sample(target_accept=0.99)
This seems to work for a first pass however doesn’t seem to capture the 10% proportion I set up in the synthetic data. I gather it will get quite slow at large number of samples (650,000). Is there a smarter way to include this much data or should I just wait it out?
cheers