Passing observations of varying size to a likelihood function

I’m looking to infer a single parameter, m, for several sets of observations, whereby the inferred parameter will be different for each set.

If I have 5 sets of 50 observations each, I am able to run my experiment nicely by setting shape=5 in my prior for m and then making sure my observations and any other inputs are a 50 x 5 array. I then get 5 posteriors output with expected distributions.

However I am completely stumped for how to do this if I have say, 44 input/observations pairs in one set, 35 in another, 51 in another and so on. I have read about theano.shared which seems like it could have use in this situation for neatly repeating an experiment with the same model but with different data, but also wondered if there are other datatypes e.g. a list of lists that could be used in ‘observed=…’ to run the experiment in a single loop?

My working 50x5 array version is below, where probs_array and outcomes array are both 50x5 arrays. Any help on how to adapt this to the variable size data sets would be much appreciated?

mcmc_model = pm.Model()
with mcmc_model:
    #prior
    m = pm.Uniform('m',lower=-2,upper=2, shape=5)
    
    prob = probs_array + m
    
    #liklihood function
    outcome = pm.Bernoulli('outcomes',p=prob,observed=outcomes_array)    
    
#-----NUTS sampler------------
with mcmc_model:
    trace = pm.sample(5000, cores=1, chains=2, tune=1000)

Just reshape everything to be 1d arrays and use indexing to identify which “observation set” the observation applies to.

However I am completely stumped for how to do this if I have say, 44 input/observations pairs in one set, 35 in another, 51 in another and so on

Create an 1D integer array that has 44 0s, 35 1s, 51 2s, and so on:

Something like:
ix = np.concatenate([np.zeros(44), np.ones(35), np.repeat(2, 51)])

with mcmc_model:
    #prior
    m = pm.Uniform('m',lower=-2,upper=2, shape=5)
    
    prob = probs_array + m[ix]  # probs_array should be 1-D, same shape as ix
    
    # likelihood function
    # outcomes_array should be 1-D, same shape as ix and probs_array
    outcome = pm.Bernoulli('outcomes',p=prob,observed=outcomes_array)
3 Likes

Thanks very much - exactly what I’m after.