Encountering Error: "Dimensions ('ys',) must have the same length as the number of data dimensions, ndim=2"

Hi All,

I’m encountering the following error on what I think is a rather simple model on simulated data:

ValueError: dimensions ('ys',) must have the same length as the number of data dimensions, ndim=2

Strangely (to me), the error occurs just as sampling finishes.

I uploaded a complete notebook example replicating the error.

I am modelling the support of multiple political parties (\pi) over time as a Multivariate Gaussian Random Walk. This latent support \pi is used to draw opinion poll results y from a multinomial distribution. \pi is first softmaxed (\pi^*) to ensure that we can use it as the p parameter of the multinomial.

An excerpt of the code that defines the model:

coords = {
    "ys": ys
}

with pm.Model(coords=coords) as model:

    pi = pm.MvGaussianRandomWalk(
        name="pi",
        mu=np.zeros(D),
        chol=L, # We use the actual grand-truth covariance here to simply things
        init_dist=pm.MvNormal.dist(mu=[0.3191,0.3947,0.2862], chol=L),
        shape=(NUM_TS, D)
    )

    pi_star = pm.Deterministic("pi_star", pm.math.exp(pi)/pm.math.exp(pi).sum(1,keepdims=True))

    y = pm.Multinomial('y', p=pi_star[ts], n=ns, dims="ys", observed=ys)
    
pm.model_to_graphviz(model)

In this example, ys is of shape (20, 3)

Thanks for any help you can provide, I’m new to PyMC and so these errors aren’t always obvious to me.

The dims you’re passing to y do not match the data. I would suggest omitting the coords you are currently using as they do not seem to encode any dimensions/coordinates. If you were to use dims with your current y, you would want something like this:

coords = {"dim1": np.arange(20), "dim2": ["thing1", "thing2", "thing3"]}

The reason it is failing at the end and not during sampling is because that’s when the results get packed into an ArviZ InfernenceData object (where all the dims and coords are associated with the draws/groups).

1 Like

Brilliant!

The error was actually pretty embarrassing, y had shape (20,) but ys had shape (20,3).

Thanks so much for your patience.

1 Like

Just to be clear, y and ys both have shape (20, 3). The reason using ys as the coordinates of y doesn’t work is that the coords are intended to name each dimension (of which y has two) and each value or coordinate along each dimension (there are 20 values along the first dimension and 3 values along the second dimension).

1 Like