Good time of the day.
I’m trying to predict values, using pymc3.sample_ppc, for which I need shared values, according to notebook.
But if I try to run something like
DISTANCE = theano.shared(BR['mil_km'])
, where BR is a pandas dataframe, I get the following error during the model specification:
Traceback (most recent call last):
File "rail_0.py", line 25, in <module>
theta = (A + BD * DISTANCE + BY * YEAR)
TypeError: unsupported operand type(s) for *: 'FreeRV' and 'SharedVariable'
If I print(DISTANCE), the output is simply <Generic> instead of a column of data, so I assume that is the problem. What is a correct way to convert a column from pandas to theano.shared? Most of the examples I’ve seen simply generate initial data to use as shared predictors.
Here’s the problem I’m trying to solve if it might be of help:
I have a dataset with 2 periods. It contains year, distance and result. I want to train the model on the years and distance from the first period and then predict values for the second period.
Here’s the way I’m trying to implement that:
import pymc3 as pm
import pandas as pd
import numpy as np
import theano
RAILS = pd.read_csv('./brit_rail_acc.csv')
BR = RAILS[21:47] # Data prior to privatisation, but after the steam engine
ABR = RAILS[48:] # Data after privatisation
DISTANCE = theano.shared(BR['mil_km'])
YEAR = theano.shared(BR['year'])
with pm.Model() as MODEL_BR_0:
A = pm.Normal('alpha', mu=0, sd=100)
BD = pm.Normal('distance', mu=0, sd=10)
BY = pm.Normal('year', mu=0, sd=10)
theta = (A + BD * DISTANCE + BY * YEAR)
y = pm.Poisson('accidents', mu=np.exp(theta),
observed=BR['cdo_acc'].values)
trace = pm.sample(5000, n_init=5000)
DISTANCE.set_values(ABR['mil_km'])
YEAR.set_values(ABR['year'])
ppc = pm.sample_ppc(trace, model=MODEL_BR_0)
On a semi-related note, should I worry if in about 50% of cases I get the following error, while trying to run the basic model without the shared elements?
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
0%| | 0/5500 [00:00<?, ?it/s]
Traceback (most recent call last):
File "rail_0_nt.py", line 24, in <module>
trace = pm.sample(5000, n_init=5000)
File "/home/eichhorn/.local/lib/python3.6/site-packages/pymc3/sampling.py", line 285, in sample
return sample_func(**sample_args)[discard:]
File "/home/eichhorn/.local/lib/python3.6/site-packages/pymc3/sampling.py", line 332, in _sample
for it, strace in enumerate(sampling):
File "/home/eichhorn/.local/lib/python3.6/site-packages/tqdm/_tqdm.py", line 955, in __iter__
for obj in iterable:
File "/home/eichhorn/.local/lib/python3.6/site-packages/pymc3/sampling.py", line 430, in _iter_sample
point, states = step.step(point)
File "/home/eichhorn/.local/lib/python3.6/site-packages/pymc3/step_methods/arraystep.py", line 175, in step
apoint, stats = self.astep(array)
File "/home/eichhorn/.local/lib/python3.6/site-packages/pymc3/step_methods/hmc/nuts.py", line 182, in astep
'might be misspecified.' % start.energy)
ValueError: Bad initial energy: nan. The model might be misspecified.

