Issue with observed data in joint distribution

Hi,

I am trying to estimate the parameters of a joint distribution. I have a vector of values for each feature. A simplified version of the problem, giving the same error is:

num_shifts=24
num_lenghts=24
num_dow=7
data = {‘P_SD’:shift, ‘P_ED’:end_hour, ‘P_length’:lengths, ‘P_dow’:dow, ‘P_hr’:hrs}
#Each value in the previous dictionary is a 1D np.array

with pm.Model() as model:

        sd1 = pm.Uniform('sd_SD', 0.25, 5, shape=num_shifts)
        lambdas1 = pm.Uniform('lambdas_SD', 0, 1.0, shape=num_shifts)
        p_sd = pm.NormalMixture('P_SD', mu=np.arange(num_shifts), w=lambdas1, sd=sd1)

        sd2 = pm.Uniform('sd_ED', 0.25, 5, shape=num_shifts)
        lambdas2 = pm.Uniform('lambdas_ED', 0, 1.0, shape=num_shifts)
        p_ed = pm.NormalMixture('P_ED', mu=np.arange(num_shifts), w=lambdas2, sd=sd2)

        sd3 = pm.Uniform('sd_L', 0.25, 5, shape=num_lenghts)
        lambdas3 = pm.Uniform('lambdas_L', 0, 1.0, shape=num_lenghts)
        p_length = pm.NormalMixture('P_length', mu=np.arange(num_lenghts), w=lambdas3, sd=sd3)

        prior_dow = pm.Dirichlet('prior_dow', a=pm.floatX((1.0 / num_dow) * np.ones(num_dow)))
        p_dow = pm.Multinomial('P_dow',num_dow, prior_dow)

        mean_hr = pm.Uniform("mh", 50, 110)
        std_hr = pm.Uniform("sh", 1, 30)
        p_hr = pm.Normal('P_hr',mean_hr, std_hr)

        pm.Potential('mylike', p_sd * p_ed * p_length * p_dow * p_hr, observed=data)

        step = pm.NUTS()
        trace = pm.sample(20000, step=step)
        burned_trace = trace[10000:]

The error I’m getting is:

TypeError: For compute_test_value, one input test value does not have the requested type.

The error when converting the test value to that variable type:
Wrong number of dimensions: expected 0, got 1 with shape (7,).

The error happens in p_dow = pm.Multinomial(‘P_dow’,num_dow, prior_dow). Any ideas?
Thanks

Doing p_dow = pm.Multinomial('P_dow',num_dow, prior_dow, shape=num_dow) should suppress the error, however this is a bit odd as I remember you dont need to explicitly write down the shape.

Also, there are two problem of

pm.Potential('mylike', p_sd * p_ed * p_length * p_dow * p_hr, observed=data)
  1. You cannot assign observed to Potential. If your potential function evaluate on some input data it is better to write it as a DensityDist
  2. p_sd, p_ed, p_length, p_dow, p_hr all has different shape - it will get broadcasted in a weird way. are you sure you want to do this?

Also, you should do

trace = pm.sample(1000, tune=1000)

instead of

    step = pm.NUTS()
    trace = pm.sample(20000, step=step)

as passing step directly to sample is not recommanded, and 20000 is likely too many samples.

Thanks a lot for the quick response. I have changed what you have told me, but I am not sure how to pass the data using DensityDist. Since it requires a function, I have done the following:

data = [shift,end_hour,lengths,dows,hrs]
with pm.Model() as total_model:
sd1 = pm.Uniform(‘sd_SD’, 0.25, 5, shape=num_shifts)
lambdas1 = pm.Uniform(‘lambdas_SD’, 0, 1.0, shape=num_shifts)
p_sd = lambda d: pm.NormalMixture(‘P_SD’, mu=np.arange(num_shifts), w=lambdas1, sd=sd1, observed=d)

        sd2 = pm.Uniform('sd_ED', 0.25, 5, shape=num_shifts)
        lambdas2 = pm.Uniform('lambdas_ED', 0, 1.0, shape=num_shifts)
        p_ed = lambda d: pm.NormalMixture('P_ED', mu=np.arange(num_shifts), w=lambdas2, sd=sd2, observed=d)

        sd3 = pm.Uniform('sd_L', 0.25, 5, shape=num_lenghts)
        lambdas3 = pm.Uniform('lambdas_L', 0, 1.0, shape=num_lenghts)
        p_length = lambda d: pm.NormalMixture('P_length', mu=np.arange(num_lenghts), w=lambdas3, sd=sd3, observed=d)

        prior_dow = pm.Dirichlet('prior_dow', a=pm.floatX((1.0 / num_dow) * np.ones(num_dow)))
        p_dow = lambda d: pm.Multinomial('P_dow',num_dow, prior_dow, shape=num_dow, observed=d)

        mean_hr = pm.Uniform("mh", 50, 110)
        std_hr = pm.Uniform("sh", 1, 30)
        p_hr = lambda d: pm.Normal('P_hr',mean_hr, std_hr, observed=d) 

        pm.DensityDist('mylike', lambda d: p_sd(d[0]) * p_ed(d[1]) * p_length(d[2]) * p_dow(d[3]) * p_hr(d[4]), observed=data)
        trace = pm.sample(1000, tune=1000)

The error is:

ValueError: Input dimension mis-match. (input[0].shape[0] = 5, input[1].shape[0] = 7)

Respect to multiplying all the variables, I know the way they are described at this moment they are independent and having the distributions separately would be more efficient. In fact, I have them in this way before, but I want to add other factors that will change the independencies and thus I wanted to make it work in this way before doing other modifications.

Thanks

Skimming through your posts, I think you are trying to pull the NormalMixture log likelihood into your DensityDist logp. In that case, you shouldn’t be using pm.NormalMixture, but instead should use pm.NormalMixture.dist and use its logp method. That lets you make a pymc3 distribution object and use it to get the log likelihood of that distribution. Something like this:

or