Hello,
When I run oos data with my model, I’m getting a data mismatch error.
Here is the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/aesara/compile/function/types.py in __call__(self, *args, **kwargs)
975 self.vm()
--> 976 if output_subset is None
977 else self.vm(output_subset=output_subset)
ValueError: Input dimension mismatch. One other input has shape[0] = 7690, but input[2].shape[0] = 27085.
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
/tmp/ipykernel_77223/2168918377.py in <module>
60 'month': test_month_idx})
61 print("sampling test ppc...")
---> 62 test_ppc = pm.sample_posterior_predictive(idata)
63
64 #Adding columns to the test dataset dataframe
/opt/conda/lib/python3.7/site-packages/pymc/sampling.py in sample_posterior_predictive(trace, samples, model, var_names, keep_size, random_seed, progressbar, return_inferencedata, extend_inferencedata, predictions, idata_kwargs, compile_kwargs)
1955 param = _trace[idx % len_trace]
1956
-> 1957 values = sampler_fn(**param)
1958
1959 for k, v in zip(vars_, values):
/opt/conda/lib/python3.7/site-packages/pymc/util.py in wrapped(**kwargs)
364 def wrapped(**kwargs):
365 input_point = {k: v for k, v in kwargs.items() if k in ins}
--> 366 return core_function(**input_point)
367
368 return wrapped
/opt/conda/lib/python3.7/site-packages/aesara/compile/function/types.py in __call__(self, *args, **kwargs)
990 node=self.vm.nodes[self.vm.position_of_error],
991 thunk=thunk,
--> 992 storage_map=getattr(self.vm, "storage_map", None),
993 )
994 else:
/opt/conda/lib/python3.7/site-packages/aesara/link/utils.py in raise_with_op(fgraph, node, thunk, exc_info, storage_map)
532 # Some exception need extra parameter in inputs. So forget the
533 # extra long error message in that case.
--> 534 raise exc_value.with_traceback(exc_trace)
535
536
/opt/conda/lib/python3.7/site-packages/aesara/compile/function/types.py in __call__(self, *args, **kwargs)
974 outputs = (
975 self.vm()
--> 976 if output_subset is None
977 else self.vm(output_subset=output_subset)
978 )
ValueError: Input dimension mismatch. One other input has shape[0] = 7690, but input[2].shape[0] = 27085.
Apply node that caused the error: Elemwise{Composite{(i0 + i1 + (i2 * i3) + (i4 * i5) + (i6 * i7) + (i8 * i9) + (i10 * i11) + (i12 * i13) + (i14 * i15))}}[(0, 0)](AdvancedSubtensor.0, AdvancedSubtensor.0, AdvancedSubtensor1.0, promotion, AdvancedSubtensor1.0, cannibalization, AdvancedSubtensor1.0, dc_discount, AdvancedSubtensor1.0, free_fin, AdvancedSubtensor1.0, pvbv, AdvancedSubtensor1.0, giftset, InplaceDimShuffle{x}.0, month)
Toposort index: 9
Inputs types: [TensorType(float64, (None,)), TensorType(float64, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (None,)), TensorType(int32, (None,)), TensorType(float64, (1,)), TensorType(int32, (None,))]
Inputs shapes: [(7690,), (7690,), (27085,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (7690,), (1,), (7690,)]
Inputs strides: [(8,), (8,), (8,), (4,), (8,), (4,), (8,), (4,), (8,), (4,), (8,), (4,), (8,), (4,), (8,), (4,)]
Inputs values: ['not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', array([-0.55082063]), 'not shown']
Outputs clients: [[normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F63207A97D0>), TensorConstant{[]}, TensorConstant{11}, Elemwise{Composite{(i0 + i1 + (i2 * i3) + (i4 * i5) + (i6 * i7) + (i8 * i9) + (i10 * i11) + (i12 * i13) + (i14 * i15))}}[(0, 0)].0, sigma)]]
HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
However, when I print the shapes of all the new OOS data, I get the following:
print(test_promo_pvbv_idx.shape,
test_giftset_idx.shape,
test_free_fin_idx.shape,
test_dc_idx.shape,
test_cann_idx.shape,
test_promo_idx.shape,
test_location_idx.shape,
test_item_idx.shape,
test_month_idx.shape,
test_time_idx.shape,
df_test['residual'].shape)
Output:
(7690,) (7690,) (7690,) (7690,) (7690,) (7690,) (7690,) (7690,) (7690,) (7690,) (7690,)
So I’m not sure where the error is getting the shape 27085
.
Code used:
test_time_idx, test_times = pd.factorize(df_test.index.get_level_values(0))
test_month_idx, test_month = pd.factorize(df_test['month'])
test_item_idx = np.array(list(map(item_to_idx_dict.get, df_test.index.get_level_values(1))))
test_location_idx, test_locations = pd.factorize(df_test.index.get_level_values(2))
test_promo_idx, test_promo = pd.factorize(df_test['promo_status_metric_measure'])
test_cann_idx, test_cannibalization = pd.factorize(df_test['cannibalized'])
test_dc_idx, test_dc_discount = pd.factorize(df_test['promo_desc_dcdiscount'])
test_free_fin_idx, test_free_fin = pd.factorize(df_test['promo_desc_freefinancing'])
test_giftset_idx, test_giftset = pd.factorize(df_test['promo_desc_giftset'])
test_promo_pvbv_idx, test_promo_pvbv = pd.factorize(df_test['promo_desc_pvbv'])
#bring in new data
with constant_model:
pm.set_data({'loc_idx': test_location_idx,
'item_idx': test_item_idx,
'time_idx': test_time_idx,
'observed_eaches': df_test['residual'],
't': t_test,
'promotion': test_promo_idx,
'cannibalization': test_cann_idx,
'dc_discount':test_dc_idx,
'free_fin': test_free_fin_idx,
'pvbv': test_promo_pvbv_idx,
'giftset': test_giftset_idx,
'month': test_month_idx})
print("sampling test ppc...")
test_ppc = pm.sample_posterior_predictive(idata)
Does anyone have an idea of how I can hunt this error down?