Time-series regression - shape error / Input dimension mis-match

Hi all,

I’ve started using PyMC3 a couple of days ago but am getting a strange shape error with the model below.

Error is:

ValueError: Input dimension mis-match. (input[0].shape[1] = 19, input[1].shape[1] = 18)

19 is the number of predictors. My data does not have NA values is is of shape(780k, 19)

I have read number of guides on how shapes are handled in PyMC3 and responses to similar questions on this forum, but none of the suggestions managed to fix this.

Does anyone have an idea what I’m doing wrong here?



def model(data, sample_shape=None):
    with pm.Model(coords={"time": data["timedelta"]}) as model:
        
        # how often do we update alpha? 
        # set this to every timedelta/row in our data
        interval = len(data["timedelta"])
        
        # instantiate RV for global parameters                               
        beta = pm.Normal('beta',mu=0,sigma=np.sqrt(2))
        ha = pm.Normal('Ha',mu=0,sigma=np.sqrt(2))
                                       
        #vector of time varying parameters as random walk
        alpha_zero = pm.Normal("alpha_zero",mu=0,sigma=np.sqrt(2), shape=sample_shape)  
        
        step_size_ = tt.tile(np.sqrt(2), (interval,1))
        
        alpha= pm.GaussianRandomWalk("alpha",mu=alpha_zero,sigma=step_size_, shape=(interval,sample_shape))   
        
        # define theta_home as deterministic variable
        theta  = pm.Deterministic("theta",pm.invlogit(pm.math.dot(data[pred_cols], alpha)+beta+ha))
        
        
        # finally we define our likelihood, i.e. our sampling distribution of values we are trying to fit
        # Attempt likelihood as a halfnormal - as our thetas are between 0 and 1 but tilted towards 0
        Y_obs =pm.HalfNormal("Y_obs", mu=theta, obs= data["outcome"])
                
    return model

model_theta = model(df, sample_shape=len(pred_cols))

A couple things that might help:

  • if alpha_zero is the starting point for the time varying params, you don’t want to pass it as mu to the Random Walk. Mu in the random walk is for drift - the mean of the errors.
  • you don’t need to tile step_size_ if it’s the same for all observations

If that doesn’t fix, I’d triple check that len(pred_cols) == sample_shape == 19

1 Like

Thank you!

Setting the alpha_zero’s as starting values via init or initval is throwing this error:

TypeError: can’t turn [Subtensor{int64}.0] and {} into a dict. Length of Subtensor{int64}.0 cannot be determined

How can I set the starting point of the time-varying process for the RandomGaussianWalk?
Also I am assuming I don’t need to specify the expected value as the previous alpha, given this is the whole point of the RGW?

I am trying to model the following time-varying process:

\theta_{t} = invlogit(\vec{\alpha_t} * x_{t}+ \beta + Ha)

with values:

\alpha_0 \sim N(0,2), \alpha_t \sim N(\alpha_{t-1},2)

\beta \sim N(0,2), Ha \sim N(0,2)

where \alpha_t are time-varying coefficients, \beta and Ha are constants.

This is my latest code.

def home_intensity_model(data, sample_shape=None):
    with pm.Model(coords={"time": data["timedelta"]}) as model:
        
        
        # how often do we update alpha? 
        # set this to every timedelta/row
        interval = len(data["timedelta"])

        
        # instantiate stochastic RV for global parameters                               
        beta = pm.Normal('beta',mu=0,sigma=np.sqrt(2))
        ha = pm.Normal('Ha',mu=0,sigma=np.sqrt(2))
                                       
        #vector of time varying parameters as random walk
        alpha_zero = pm.Normal("alpha_zero",mu=0,sigma=np.sqrt(2), shape=(sample_shape,1))  

        
        alpha= pm.GaussianRandomWalk("alpha",init=alpha_zero,sigma=np.sqrt(2),shape=(sample_shape,interval))   
        

        
        
        # Finally, define our deterministic prior variable theta
        #To ensure that deterministic variables' values are accumulated during sampling, they should be instantiated 
        #using the named deterministic interface; this uses the Deterministic function to create the variable. 
        #Two things happen when a variable is created this way:
        # 1- The variable is given a name (passed as the first argument)
        # 2- The variable is appended to the model's list of random variables, which ensures that its values are tallied.
       
        theta = pm.Deterministic("theta",pm.invlogit(pm.math.dot(data[pred_cols], alpha)+beta+ha))
        
        
        # finally we define our likelihood, i.e. our sampling distribution of values we are trying to fit
        # our observed thetas are between 0 and 1 but tilted towards 0 
        # try to fit Gamma distribution instead. Note that expected value of Gamma is alpha/beta, so set alpha as our theta and beta=1
        
        
        #now define our likelihood
        Y_obs =pm.Gamma("Y_obs", alpha=theta, beta =1 , observed= data[y])
        
        
        # Can sample and look at distribution of our priors and thetas
        trace = pm.sample(2, chains=1, cores=4)
        pm.traceplot(trace)
        plt.show()
                
    return model

Which version of PyMC are you using? I suggest using v4 if you are not yet.

init, which is now called init_dist requires an unnamed distribution as input so

alpha_zero = pm.Normal.dist(mu=0,sigma=np.sqrt(2), shape=(sample_shape,1))

The new version would have given you a more informative error to explain this.

PS: I haven’t checked the rest of you code.

Thank you. Following your advice I have installed pymc v4 in a separate conda environment but am stuck with a strange theano error when I try to run my model - see below.

I have tried the solutions suggested in this dicussion but nothing has worked out so far.

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\link\c\lazylinker_c.py:79, in <module>
     78         if version != actual_version:
---> 79             raise ImportError(
     80                 "Version check of the existing lazylinker compiled file."
     81                 f" Looking for version {version}, but found {actual_version}. "
     82                 f"Extra debug information: force_compile={force_compile}, _need_reload={_need_reload}"
     83             )
     84 except ImportError:

ImportError: Version check of the existing lazylinker compiled file. Looking for version 0.212, but found 0.211. Extra debug information: force_compile=False, _need_reload=True

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\link\c\lazylinker_c.py:100, in <module>
     99     if version != actual_version:
--> 100         raise ImportError(
    101             "Version check of the existing lazylinker compiled file."
    102             f" Looking for version {version}, but found {actual_version}. "
    103             f"Extra debug information: force_compile={force_compile}, _need_reload={_need_reload}"
    104         )
    105 except ImportError:
    106     # It is useless to try to compile if there isn't any
    107     # compiler!  But we still want to try to load it, in case
    108     # the cache was copied from another computer.

ImportError: Version check of the existing lazylinker compiled file. Looking for version 0.212, but found 0.211. Extra debug information: force_compile=False, _need_reload=True

During handling of the above exception, another exception occurred:

AssertionError                            Traceback (most recent call last)
Input In [40], in <cell line: 8>()
     51 Y_obs =pm.Gamma("Y_obs", alpha=theta_home, beta =1 , observed= df["homegoals_minsremaining"])
     53 #Y_obs =pm.Poisson("Y_obs", mu=f1(theta_home,df["timedelta"]) ,observed= df["homegoals_remaining"], dims="observations")
     54     
     55     
     56 # Can sample and look at distribution of our priors and thetas
---> 57 trace = pm.sample(2, chains=1, cores=4)
     58 pm.traceplot(trace)
     59 plt.show()

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\pymc\sampling.py:524, in sample(draws, step, init, n_init, initvals, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, **kwargs)
    521         auto_nuts_init = False
    523 initial_points = None
--> 524 step = assign_step_methods(model, step, methods=pm.STEP_METHODS, step_kwargs=kwargs)
    526 if isinstance(step, list):
    527     step = CompoundStep(step)

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\pymc\sampling.py:229, in assign_step_methods(model, step, methods, step_kwargs)
    221         selected = max(
    222             methods,
    223             key=lambda method, var=rv_var, has_gradient=has_gradient: method._competence(
    224                 var, has_gradient
    225             ),
    226         )
    227         selected_steps[selected].append(var)
--> 229 return instantiate_steppers(model, steps, selected_steps, step_kwargs)

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\pymc\sampling.py:147, in instantiate_steppers(model, steps, selected_steps, step_kwargs)
    145         args = step_kwargs.get(step_class.name, {})
    146         used_keys.add(step_class.name)
--> 147         step = step_class(vars=vars, model=model, **args)
    148         steps.append(step)
    150 unused_args = set(step_kwargs).difference(used_keys)

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\pymc\step_methods\hmc\nuts.py:178, in NUTS.__init__(self, vars, max_treedepth, early_max_treedepth, **kwargs)
    120 def __init__(self, vars=None, max_treedepth=10, early_max_treedepth=8, **kwargs):
    121     r"""Set up the No-U-Turn sampler.
    122 
    123     Parameters
   (...)
    176     `pm.sample` to the desired number of tuning steps.
    177     """
--> 178     super().__init__(vars, **kwargs)
    180     self.max_treedepth = max_treedepth
    181     self.early_max_treedepth = early_max_treedepth

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\pymc\step_methods\hmc\base_hmc.py:95, in BaseHMC.__init__(self, vars, scaling, step_scale, is_cov, model, blocked, potential, dtype, Emax, target_accept, gamma, k, t0, adapt_step_size, step_rand, **aesara_kwargs)
     92 else:
     93     vars = [self._model.rvs_to_values.get(var, var) for var in vars]
---> 95 super().__init__(vars, blocked=blocked, model=self._model, dtype=dtype, **aesara_kwargs)
     97 self.adapt_step_size = adapt_step_size
     98 self.Emax = Emax

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\pymc\step_methods\arraystep.py:276, in GradientSharedStep.__init__(self, vars, model, blocked, dtype, logp_dlogp_func, **aesara_kwargs)
    273 model = modelcontext(model)
    275 if logp_dlogp_func is None:
--> 276     func = model.logp_dlogp_function(vars, dtype=dtype, **aesara_kwargs)
    277 else:
    278     func = logp_dlogp_func

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\pymc\model.py:637, in Model.logp_dlogp_function(self, grad_vars, tempered, **kwargs)
    635 input_vars = {i for i in graph_inputs(costs) if not isinstance(i, Constant)}
    636 extra_vars = [self.rvs_to_values.get(var, var) for var in self.free_RVs]
--> 637 ip = self.initial_point(0)
    638 extra_vars_and_values = {
    639     var: ip[var.name] for var in extra_vars if var in input_vars and var not in grad_vars
    640 }
    641 return ValueGradFunction(costs, grad_vars, extra_vars_and_values, **kwargs)

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\pymc\model.py:1067, in Model.initial_point(self, seed)
   1059 def initial_point(self, seed=None) -> Dict[str, np.ndarray]:
   1060     """Computes the initial point of the model.
   1061 
   1062     Returns
   (...)
   1065         Maps names of transformed variables to numeric initial values in the transformed space.
   1066     """
-> 1067     fn = make_initial_point_fn(model=self, return_transformed=True)
   1068     return Point(fn(seed), model=self)

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\pymc\initial_point.py:181, in make_initial_point_fn(model, overrides, jitter_rvs, default_strategy, return_transformed)
    179     new_rng_nodes.append(aesara.shared(rng_cls(np.random.PCG64())))
    180 graph.replace_all(zip(rng_nodes, new_rng_nodes), import_missing=True)
--> 181 func = compile_pymc(inputs=[], outputs=graph.outputs, mode=aesara.compile.mode.FAST_COMPILE)
    183 varnames = []
    184 for var in model.free_RVs:

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\pymc\aesaraf.py:1034, in compile_pymc(inputs, outputs, random_seed, mode, **kwargs)
   1032 opt_qry = mode.provided_optimizer.including("random_make_inplace", check_parameter_opt)
   1033 mode = Mode(linker=mode.linker, optimizer=opt_qry)
-> 1034 aesara_function = aesara.function(
   1035     inputs,
   1036     outputs,
   1037     updates={**rng_updates, **kwargs.pop("updates", {})},
   1038     mode=mode,
   1039     **kwargs,
   1040 )
   1041 return aesara_function

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\compile\function\__init__.py:317, in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
    311     fn = orig_function(
    312         inputs, outputs, mode=mode, accept_inplace=accept_inplace, name=name
    313     )
    314 else:
    315     # note: pfunc will also call orig_function -- orig_function is
    316     #      a choke point that all compilation must pass through
--> 317     fn = pfunc(
    318         params=inputs,
    319         outputs=outputs,
    320         mode=mode,
    321         updates=updates,
    322         givens=givens,
    323         no_default_updates=no_default_updates,
    324         accept_inplace=accept_inplace,
    325         name=name,
    326         rebuild_strict=rebuild_strict,
    327         allow_input_downcast=allow_input_downcast,
    328         on_unused_input=on_unused_input,
    329         profile=profile,
    330         output_keys=output_keys,
    331     )
    332 return fn

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\compile\function\pfunc.py:374, in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input, output_keys, fgraph)
    360     profile = ProfileStats(message=profile)
    362 inputs, cloned_outputs = construct_pfunc_ins_and_outs(
    363     params,
    364     outputs,
   (...)
    371     fgraph=fgraph,
    372 )
--> 374 return orig_function(
    375     inputs,
    376     cloned_outputs,
    377     mode,
    378     accept_inplace=accept_inplace,
    379     name=name,
    380     profile=profile,
    381     on_unused_input=on_unused_input,
    382     output_keys=output_keys,
    383     fgraph=fgraph,
    384 )

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\compile\function\types.py:1763, in orig_function(inputs, outputs, mode, accept_inplace, name, profile, on_unused_input, output_keys, fgraph)
   1751     m = Maker(
   1752         inputs,
   1753         outputs,
   (...)
   1760         fgraph=fgraph,
   1761     )
   1762     with config.change_flags(compute_test_value="off"):
-> 1763         fn = m.create(defaults)
   1764 finally:
   1765     t2 = time.time()

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\compile\function\types.py:1656, in FunctionMaker.create(self, input_storage, trustme, storage_map)
   1653 start_import_time = aesara.link.c.cmodule.import_time
   1655 with config.change_flags(traceback__limit=config.traceback__compile_limit):
-> 1656     _fn, _i, _o = self.linker.make_thunk(
   1657         input_storage=input_storage_lists, storage_map=storage_map
   1658     )
   1660 end_linker = time.time()
   1662 linker_time = end_linker - start_linker

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\link\basic.py:254, in LocalLinker.make_thunk(self, input_storage, output_storage, storage_map, **kwargs)
    247 def make_thunk(
    248     self,
    249     input_storage: Optional["InputStorageType"] = None,
   (...)
    252     **kwargs,
    253 ) -> Tuple["BasicThunkType", "InputStorageType", "OutputStorageType"]:
--> 254     return self.make_all(
    255         input_storage=input_storage,
    256         output_storage=output_storage,
    257         storage_map=storage_map,
    258     )[:3]

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\link\vm.py:1299, in VMLinker.make_all(self, profiler, input_storage, output_storage, storage_map)
   1296 else:
   1297     post_thunk_clear = None
-> 1299 vm = self.make_vm(
   1300     order,
   1301     thunks,
   1302     input_storage,
   1303     output_storage,
   1304     storage_map,
   1305     post_thunk_clear,
   1306     computed,
   1307     compute_map,
   1308     self.updated_vars,
   1309 )
   1311 vm.storage_map = storage_map
   1312 vm.compute_map = compute_map

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\link\vm.py:1020, in VMLinker.make_vm(self, nodes, thunks, input_storage, output_storage, storage_map, post_thunk_clear, computed, compute_map, updated_vars)
   1017 pre_call_clear = [storage_map[v] for v in self.no_recycling]
   1019 try:
-> 1020     from aesara.link.c.cvm import CVM
   1021 except (MissingGXX, ImportError):
   1022     CVM = None

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\link\c\cvm.py:13, in <module>
      9 if not config.cxx:
     10     raise MissingGXX(
     11         "lazylinker will not be imported if aesara.config.cxx is not set."
     12     )
---> 13 from aesara.link.c.lazylinker_c import CLazyLinker
     15 class CVM(CLazyLinker, VM):
     16     def __init__(self, fgraph, *args, **kwargs):

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\link\c\lazylinker_c.py:143, in <module>
    140         assert os.path.exists(loc)
    142 args = GCC_compiler.compile_args()
--> 143 GCC_compiler.compile_str(dirname, code, location=loc, preargs=args)
    144 # Save version into the __init__.py file.
    145 init_py = os.path.join(loc, "__init__.py")

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\link\c\cmodule.py:2648, in GCC_compiler.compile_str(module_name, src_code, location, include_dirs, lib_dirs, libs, preargs, py_module, hide_symbols)
   2646     pass
   2647 assert os.path.isfile(lib_filename)
-> 2648 return dlimport(lib_filename)

File C:\ProgramData\Anaconda3\envs\pymc_env\lib\site-packages\aesara\link\c\cmodule.py:331, in dlimport(fullpath, suffix)
    328 finally:
    329     del sys.path[0]
--> 331 assert fullpath.startswith(rval.__file__)
    332 return rval

AssertionError: 

What platform are you on and how did you install pymc/aesara? Those messages suggest aesara doesn’t know where your C compiler is.

I managed to get pymc and aesara to work by removing the pymc_env environment and reinstalling aesara and pymc using pip in a new environment. Not entirely sure what was at fault here.

I’m still trying to get the model to run though.

Recommended installation instructions can be found here.

Thank you.

PYMC and aesara run fine now, but I am still trying to get the model to work - I think the issue is related to the hierarchical nature of my data. I’ll start a separate post on this.