Type Error on Regression Problem

jordan.howell2 · January 13, 2019, 10:01am

Hello,

I’m getting a bunch of errors that I have not been able to fix on the below regression model. I’ve commented out most of the model to figure out what is wrong. There are 99 unique departments and holidays happen 7% of the time.

I was getting chain failure errors, then I changed the department and holiday to the correct distributions. Now I’m getting this:

TypeError: For compute_test_value, one input test value does not have the requested type.

The error when converting the test value to that variable type:
Wrong number of dimensions: expected 0, got 1 with shape (2,).

Model code below:

p1 = np.array([0.0101, 0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101
,0.0101, 0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101
,0.0101, 0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101
,0.0101, 0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101
,0.0101, 0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101
,0.0101, 0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101
,0.0101, 0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101
,0.0101, 0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101
,0.0101, 0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101
,0.0101, 0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.0101,0.01021])

p2 = np.array([.93, .07])

with pm.Model() as sales_model:
#define the priors
alpha = pm.Normal('intercept', mu=0, sd = 20)
beta_1 = pm.Categorical('dept', p = p1)
beta_2 = pm.Bernoulli('IsHoliday_T',  p= p2)
#beta_3 = pm.Normal('Week', mu=0, sd = 10)
#beta_4 = pm.Normal('Fuel_Prices', mu=0, sd = 10)
#beta_5 = pm.Normal('Temperature', mu=0, sd = 10)
#beta_6 = pm.Normal('Markdown1', mu=0, sd = 10)
#beta_7 = pm.Normal('Markdown2', mu=0, sd = 10)
#beta_8 = pm.Normal('Markdown4', mu=0, sd = 10)
#beta_9 = pm.Normal('Markdown5', mu=0, sd = 10)
#beta_10 = pm.Normal('CPI', mu=0, sd = 10)
#beta_11 = pm.Normal('Unemployment', mu=0, sd = 10)

s = pm.Uniform('sd', lower = 1, upper = 20)

#define the likelihood
mu = alpha + beta_1*X_train['Dept'] + beta_2*X_train['IsHoliday_True'] #+ beta_3*X_train['Week'] + beta_4*X_train['Fuel_Price_s'] + beta_5*X_train['Temperature_s'] + beta_6*X_train['MarkDown1_s'] + beta_7*X_train['MarkDown2_s'] + beta_8*X_train['MarkDown4_s'] +beta_9*X_train['MarkDown5_s'] + beta_10*X_train['CPI_s'] + beta_11*X_train['Unemployment_s']

y = pm.Normal('sales', mu = mu, sd = s, observed = Y_train)

step = pm.NUTS()
trace = pm.sample(draws=10000, step = step ,progressbar=True)

junpenglao · January 13, 2019, 10:55am

Try specifying the shape in beta_1 and beta_2. Also, you should avoid using discrete latent variable - changing beta_1 and beta_2 to continue variable would help model inference.

jordan.howell2 · January 13, 2019, 11:01am

Thank you @junpenglao.

I changed it to this based on your recommendation:

with pm.Model() as sales_model:

#define the priors
alpha = pm.Normal('intercept', mu=train['Weekly_Sales'].mean(), sd = train['Weekly_Sales'].std())
beta_1 = pm.Normal('dept', mu = 0, sd = 10, shape = X_train['Dept'].shape)
beta_2 = pm.Normal('IsHoliday_T',  mu = 0, sd = 10, shape = X_train['IsHoliday_True'].shape)
#beta_3 = pm.Normal('Week', mu=0, sd = 10)
#beta_4 = pm.Normal('Fuel_Prices', mu=0, sd = 10)
#beta_5 = pm.Normal('Temperature', mu=0, sd = 10)
#beta_6 = pm.Normal('Markdown1', mu=0, sd = 10)
#beta_7 = pm.Normal('Markdown2', mu=0, sd = 10)
#beta_8 = pm.Normal('Markdown4', mu=0, sd = 10)
#beta_9 = pm.Normal('Markdown5', mu=0, sd = 10)
#beta_10 = pm.Normal('CPI', mu=0, sd = 10)
#beta_11 = pm.Normal('Unemployment', mu=0, sd = 10)

s = pm.Uniform('sd', lower = 1, upper = 20)

#define the likelihood
mu = alpha + beta_1*X_train['Dept'].values + beta_2*X_train['IsHoliday_True'].values
#+ beta_3*X_train['Week'] + beta_4*X_train['Fuel_Price_s'] + beta_5*X_train['Temperature_s'] + beta_6*X_train['MarkDown1_s'] + beta_7*X_train['MarkDown2_s'] + beta_8*X_train['MarkDown4_s'] +beta_9*X_train['MarkDown5_s'] + beta_10*X_train['CPI_s'] + beta_11*X_train['Unemployment_s']

y = pm.Normal('sales', mu = mu, sd = s, observed = Y_train, shape = Y_train.shape)

trace = pm.sample(draws=10000 ,progressbar=True)

Now the following error shows up.

RuntimeError: Chain 0 failed.

junpenglao · January 13, 2019, 11:02am

Could you please paste the full error stack trace?

jordan.howell2 · January 13, 2019, 11:04am

RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
“”"
Traceback (most recent call last):
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\parallel_sampling.py”, line 73, in run
self._start_loop()
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\parallel_sampling.py”, line 113, in _start_loop
point, stats = self._compute_point()
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\parallel_sampling.py”, line 139, in _compute_point
point, stats = self._step_method.step(self._point)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\step_methods\arraystep.py”, line 247, in step
apoint, stats = self.astep(array)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\step_methods\hmc\base_hmc.py”, line 117, in astep
‘might be misspecified.’ % start.energy)
ValueError: Bad initial energy: inf. The model might be misspecified.
“”"

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)
ValueError: Bad initial energy: inf. The model might be misspecified.

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
in ()
36 y = pm.Normal(‘sales’, mu = mu, sd = s, observed = Y_train, shape = Y_train.shape)
37
—> 38 trace = pm.sample(draws=10000 ,progressbar=True)

~\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, discard_tuned_samples, live_plot_kwargs, compute_convergence_checks, use_mmap, **kwargs)
447 _print_step_hierarchy(step)
448 try:
→ 449 trace = _mp_sample(**sample_args)
450 except pickle.PickleError:
451 _log.warning(“Could not pickle model, sampling singlethreaded.”)

~\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, use_mmap, **kwargs)
997 try:
998 with sampler:
→ 999 for draw in sampler:
1000 trace = traces[draw.chain - chain]
1001 if trace.supports_sampler_stats and draw.stats is not None:

~\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\parallel_sampling.py in iter(self)
303
304 while self._active:
→ 305 draw = ProcessAdapter.recv_draw(self._active)
306 proc, is_last, draw, tuning, stats, warns = draw
307 if self._progress is not None:

~\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\parallel_sampling.py in recv_draw(processes, timeout)
221 if msg[0] == ‘error’:
222 old = msg[1]
→ 223 six.raise_from(RuntimeError(‘Chain %s failed.’ % proc.chain), old)
224 elif msg[0] == ‘writing_done’:
225 proc._readable = True

~\AppData\Local\conda\conda\envs\theano\lib\site-packages\six.py in raise_from(value, from_value)

RuntimeError: Chain 0 failed.

junpenglao · January 13, 2019, 11:09am

So the real error is this:

Try checking the model logp:

model1.check_test_point()

and see if there is any non finite value. Usually there might be some nan in your input matrix

jordan.howell2 · January 13, 2019, 11:12am

I put that line of code after the trace?

Nevermind. I got:

intercept -1.115000e+01
dept -2.640039e+04
IsHoliday_T -2.640039e+04
sd_interval__ -1.390000e+00
sales -inf
Name: Log-probability of test_point, dtype: float64

sales is coming from my observed target variable. When I check for inf values I don’t get any.

X_train.isna().sum()
Dept 0
IsHoliday_True 0
Week 0
Fuel_Price_s 0
Temperature_s 0
MarkDown1_s 0
MarkDown2_s 0
MarkDown4_s 0
MarkDown5_s 0
CPI_s 0
Unemployment_s 0
dtype: int64

Y_train.isna().sum()
0

junpenglao · January 13, 2019, 11:26am

You should check all your inputs:

sum(~np.isfinite(Y_train))
sum(~np.isfinite(mu.tag.test_value))
sum(~np.isfinite(X_train['Dept'].values))
sum(~np.isfinite(X_train['IsHoliday_True'].values))

jordan.howell2 · January 13, 2019, 11:36am

ahhh! Found it. Replaced it. I’m getting another error but will open another question if I can’t work through it. Thank you!

jordan.howell2 · January 15, 2019, 1:57am

One question @junpenglao. Is the -inf found in sales_model.check_test_point in the observed data or the samples? I don’t see it when I check before hand in the observed data. If it is the prior samples, how does one avoid that?

junpenglao · January 15, 2019, 5:49am

Neither - it’s in the computation of logp. In this case it is generated by one of the deterministic transformation (mu). In practice, you need to check all the inputs to the problematic RV (parameters and the observed in this case).

jordan.howell2 · January 15, 2019, 1:11pm

Ah. Ok. In that case, is there something I can do ahead of time to avoid this error?

Topic		Replies	Views
Time Varying Overdispersed Poisson Process Questions	7	938	August 28, 2020
Time-series regression - shape error / Input dimension mis-match Questions time_series	8	1144	August 6, 2022
TypeError in model with shape(16,8,2) Questions	2	543	August 22, 2018
Gp.predict gives mismatched dimensions error v5 bug	0	305	December 16, 2022
Test_value shape errors with theano.shared Questions	13	2556	March 27, 2018

Type Error on Regression Problem

Related topics