Expanding the common globe tossing example to multiple globes

Hi, thanks @AlexAndorra

I’ve had a crack at this:

import numpy as np
import copy
import pymc3 as pm

#Creating a synthetic dataset:
data = list()
probs = [0.9, 0.1]
for j in range(100):
    observations = list()
    #all globes must have at least one measurement:
    observations.append(np.random.choice([0,1], p=probs))
    #now take several more measurements:
    while np.random.uniform(0,1)>0.3:
        observations.append(np.random.choice([0,1], p=[0.9,0.1]))
    data.append(observations)

##put into long format:
ids = list()
observations = list()

for count, obs in enumerate(data):
    for d in obs:
        ids.append(count)
        observations.append(d)
        
total_observations = len(observations)
water_observations = copy.copy(observations)

with pm.Model() as planet_model:    
    a = pm.Exponential("a", 1)
    b = pm.Exponential("b", 1)

    p_globes = pm.Beta("p_globes", a , b, shape=100)
    
    w = pm.Binomial("w", p=p_globes[ids], n=total_observations, observed=water_observations)

with planet_model:
    trace = pm.sample(target_accept=0.99)

This seems to work for a first pass however doesn’t seem to capture the 10% proportion I set up in the synthetic data. I gather it will get quite slow at large number of samples (650,000). Is there a smarter way to include this much data or should I just wait it out?
cheers

1 Like