How to find shape mismatch on out of sample (OOS) data?

aiuxp48 · February 23, 2023, 7:06pm

Hello,

When I run oos data with my model, I’m getting a data mismatch error.

Here is the error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/aesara/compile/function/types.py in __call__(self, *args, **kwargs)
    975                 self.vm()
--> 976                 if output_subset is None
    977                 else self.vm(output_subset=output_subset)

ValueError: Input dimension mismatch. One other input has shape[0] = 7690, but input[2].shape[0] = 27085.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_77223/2168918377.py in <module>
     60                     'month': test_month_idx})
     61         print("sampling test ppc...")
---> 62         test_ppc = pm.sample_posterior_predictive(idata)
     63 
     64         #Adding columns to the test dataset dataframe

/opt/conda/lib/python3.7/site-packages/pymc/sampling.py in sample_posterior_predictive(trace, samples, model, var_names, keep_size, random_seed, progressbar, return_inferencedata, extend_inferencedata, predictions, idata_kwargs, compile_kwargs)
   1955                 param = _trace[idx % len_trace]
   1956 
-> 1957             values = sampler_fn(**param)
   1958 
   1959             for k, v in zip(vars_, values):

/opt/conda/lib/python3.7/site-packages/pymc/util.py in wrapped(**kwargs)
    364     def wrapped(**kwargs):
    365         input_point = {k: v for k, v in kwargs.items() if k in ins}
--> 366         return core_function(**input_point)
    367 
    368     return wrapped

/opt/conda/lib/python3.7/site-packages/aesara/compile/function/types.py in __call__(self, *args, **kwargs)
    990                     node=self.vm.nodes[self.vm.position_of_error],
    991                     thunk=thunk,
--> 992                     storage_map=getattr(self.vm, "storage_map", None),
    993                 )
    994             else:

/opt/conda/lib/python3.7/site-packages/aesara/link/utils.py in raise_with_op(fgraph, node, thunk, exc_info, storage_map)
    532         # Some exception need extra parameter in inputs. So forget the
    533         # extra long error message in that case.
--> 534     raise exc_value.with_traceback(exc_trace)
    535 
    536 

/opt/conda/lib/python3.7/site-packages/aesara/compile/function/types.py in __call__(self, *args, **kwargs)
    974             outputs = (
    975                 self.vm()
--> 976                 if output_subset is None
    977                 else self.vm(output_subset=output_subset)
    978             )

ValueError: Input dimension mismatch. One other input has shape[0] = 7690, but input[2].shape[0] = 27085.
Apply node that caused the error: Elemwise{Composite{(i0 + i1 + (i2 * i3) + (i4 * i5) + (i6 * i7) + (i8 * i9) + (i10 * i11) + (i12 * i13) + (i14 * i15))}}[(0, 0)](AdvancedSubtensor.0, AdvancedSubtensor.0, AdvancedSubtensor1.0, promotion, AdvancedSubtensor1.0, cannibalization, AdvancedSubtensor1.0, dc_discount, AdvancedSubtensor1.0, free_fin, AdvancedSubtensor1.0, pvbv, AdvancedSubtensor1.0, giftset, InplaceDimShuffle{x}.0, month)
Toposort index: 9
Inputs types: [TensorType(float64, (None,)), TensorType(float64, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (1,)), TensorType(int32, (None,))]
Inputs shapes: [(7690,), (7690,), (27085,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (1,), (7690,)]
Inputs strides: [(8,), (8,), (8,), (4,), (8,), (4,), (8,), (4,), (8,), (4,), (8,), (4,), (8,), (4,), (8,), (4,)]
Inputs values: ['not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', array([-0.55082063]), 'not shown']
Outputs clients: [[normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F63207A97D0>), TensorConstant{[]}, TensorConstant{11}, Elemwise{Composite{(i0 + i1 + (i2 * i3) + (i4 * i5) + (i6 * i7) + (i8 * i9) + (i10 * i11) + (i12 * i13) + (i14 * i15))}}[(0, 0)].0, sigma)]]

HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

However, when I print the shapes of all the new OOS data, I get the following:

print(test_promo_pvbv_idx.shape,
     test_giftset_idx.shape,
     test_free_fin_idx.shape,
     test_dc_idx.shape,
     test_cann_idx.shape,
     test_promo_idx.shape,
     test_location_idx.shape,
     test_item_idx.shape,
     test_month_idx.shape,
     test_time_idx.shape,
     df_test['residual'].shape)
Output:
(7690,) (7690,) (7690,) (7690,) (7690,) (7690,) (7690,) (7690,) (7690,) (7690,) (7690,)

So I’m not sure where the error is getting the shape 27085.

Code used:

    test_time_idx, test_times = pd.factorize(df_test.index.get_level_values(0))
    test_month_idx, test_month = pd.factorize(df_test['month'])
    test_item_idx =  np.array(list(map(item_to_idx_dict.get, df_test.index.get_level_values(1))))
    test_location_idx, test_locations = pd.factorize(df_test.index.get_level_values(2))
    test_promo_idx, test_promo = pd.factorize(df_test['promo_status_metric_measure'])
    test_cann_idx, test_cannibalization = pd.factorize(df_test['cannibalized'])
    test_dc_idx, test_dc_discount = pd.factorize(df_test['promo_desc_dcdiscount'])
    test_free_fin_idx, test_free_fin = pd.factorize(df_test['promo_desc_freefinancing'])
    test_giftset_idx, test_giftset = pd.factorize(df_test['promo_desc_giftset'])
    test_promo_pvbv_idx, test_promo_pvbv = pd.factorize(df_test['promo_desc_pvbv'])
    #bring in new data
    with constant_model:

        
        pm.set_data({'loc_idx': test_location_idx,
                    'item_idx': test_item_idx,
                    'time_idx': test_time_idx,
                    'observed_eaches': df_test['residual'],
                    't': t_test,
                    'promotion': test_promo_idx,
                    'cannibalization': test_cann_idx,
                    'dc_discount':test_dc_idx,
                    'free_fin': test_free_fin_idx,
                    'pvbv': test_promo_pvbv_idx,
                    'giftset': test_giftset_idx,
                    'month': test_month_idx})
        print("sampling test ppc...")
        test_ppc = pm.sample_posterior_predictive(idata)

Does anyone have an idea of how I can hunt this error down?

cluhmann · February 23, 2023, 7:27pm

What is the shape of your “in sample” data? Is it 27085 by any chance? If so, it might suggest that you are mixing some “in sample” elements with some “out of sample” elements.

aiuxp48 · February 23, 2023, 7:31pm

You are on it today! Thank you. I was mixing on one line of code.

cluhmann · February 23, 2023, 7:44pm

If I seem wise, it’s only because I recognize the many, many mistakes I have made before.

aiuxp48 · February 23, 2023, 8:30pm

I thought that was it but I still don’t see where the training data is getting through. Question:

In this part of the error trace:
Inputs shapes: [(7690,), (7690,), (27085,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (1,), (7690,)],
it shows me the third shape is what is throwing the OOS data sampling off. Is there a way to figure out what object is driving that shape?

There are 16 objects there, but i’m only have 10 objects that are set to mutable = True. How can I figure out what is driving that shape?

cluhmann · February 23, 2023, 8:51pm

Making some data constant isn’t going to help with the shape problems. For example if I do this it will throw the same kind of error.

d1 = rng.random(size=100)
d2 = rng.random(size=100)

with pm.Model() as model:
    x = pm.MutableData("x", value=d1)
    y = pm.ConstantData("obs", value=d2)
    a = pm.Normal("a")
    b = pm.Normal("b", mu=a * x, sigma=1, observed = y)
    idata = pm.sample()

with model:
    pm.set_data({"x": rng.random(size=10)})
    # 10 x values, but still 100 y values
    test_ppc = pm.sample_posterior_predictive(idata)

There may be a way to inspect the shape of things more directly (someone else would have to chime in), but not necessarily. In the example above, the shapes of x and y are both “known” to the model (because we wrapped them in pm.Data objects. But we didn’t have to do that to get the same error:

d1 = rng.random(size=100)
d2 = rng.random(size=100)

with pm.Model() as model:
    x = pm.MutableData("x", value=d1)
    #y = pm.ConstantData("obs", value=d2)
    a = pm.Normal("a")
    b = pm.Normal("b", mu=a * x, sigma=1, observed = d2)
    idata = pm.sample()

with model:
    pm.set_data({"x": rng.random(size=10)})
    # 10 x values, but still 100 y values
    test_ppc = pm.sample_posterior_predictive(idata)

cluhmann · February 24, 2023, 4:39am

To check any registered variables (e.g., pm.Data, RVs like pm.Normal, etc.) you can inspect the plate notation generated by pm.model_to_graphviz(model). But if you have other model components that the model doesn’t know the shape of ahead of time (e.g., my second example above) it won’t help much.

Topic		Replies	Views
Shape error when making out-of-sample predictions version agnostic shape_issue , prediction	13	390	January 10, 2024
Shape mismatch error for out of sample inference v5 modeling	4	116	September 20, 2024
Shape mismatch when running predictions with Gaussian Processes version agnostic gaussian_process	5	58	March 12, 2025
Shape error using transform v5 theano , bug , modeling	0	384	October 10, 2022
Shape mismatch sample posterior predictive with Binomial v5 shape_issue	2	314	November 15, 2023

How to find shape mismatch on out of sample (OOS) data?

Related topics