Good time of the day.
I’m trying to predict values, using pymc3.sample_ppc
, for which I need shared values, according to notebook.
But if I try to run something like
DISTANCE = theano.shared(BR['mil_km'])
, where BR is a pandas
dataframe, I get the following error during the model specification:
Traceback (most recent call last):
File "rail_0.py", line 25, in <module>
theta = (A + BD * DISTANCE + BY * YEAR)
TypeError: unsupported operand type(s) for *: 'FreeRV' and 'SharedVariable'
If I print(DISTANCE)
, the output is simply <Generic>
instead of a column of data, so I assume that is the problem. What is a correct way to convert a column from pandas
to theano.shared
? Most of the examples I’ve seen simply generate initial data to use as shared predictors.
Here’s the problem I’m trying to solve if it might be of help:
I have a dataset with 2 periods. It contains year, distance and result. I want to train the model on the years and distance from the first period and then predict values for the second period.
Here’s the way I’m trying to implement that:
import pymc3 as pm
import pandas as pd
import numpy as np
import theano
RAILS = pd.read_csv('./brit_rail_acc.csv')
BR = RAILS[21:47] # Data prior to privatisation, but after the steam engine
ABR = RAILS[48:] # Data after privatisation
DISTANCE = theano.shared(BR['mil_km'])
YEAR = theano.shared(BR['year'])
with pm.Model() as MODEL_BR_0:
A = pm.Normal('alpha', mu=0, sd=100)
BD = pm.Normal('distance', mu=0, sd=10)
BY = pm.Normal('year', mu=0, sd=10)
theta = (A + BD * DISTANCE + BY * YEAR)
y = pm.Poisson('accidents', mu=np.exp(theta),
observed=BR['cdo_acc'].values)
trace = pm.sample(5000, n_init=5000)
DISTANCE.set_values(ABR['mil_km'])
YEAR.set_values(ABR['year'])
ppc = pm.sample_ppc(trace, model=MODEL_BR_0)
On a semi-related note, should I worry if in about 50% of cases I get the following error, while trying to run the basic model without the shared elements?
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
0%| | 0/5500 [00:00<?, ?it/s]
Traceback (most recent call last):
File "rail_0_nt.py", line 24, in <module>
trace = pm.sample(5000, n_init=5000)
File "/home/eichhorn/.local/lib/python3.6/site-packages/pymc3/sampling.py", line 285, in sample
return sample_func(**sample_args)[discard:]
File "/home/eichhorn/.local/lib/python3.6/site-packages/pymc3/sampling.py", line 332, in _sample
for it, strace in enumerate(sampling):
File "/home/eichhorn/.local/lib/python3.6/site-packages/tqdm/_tqdm.py", line 955, in __iter__
for obj in iterable:
File "/home/eichhorn/.local/lib/python3.6/site-packages/pymc3/sampling.py", line 430, in _iter_sample
point, states = step.step(point)
File "/home/eichhorn/.local/lib/python3.6/site-packages/pymc3/step_methods/arraystep.py", line 175, in step
apoint, stats = self.astep(array)
File "/home/eichhorn/.local/lib/python3.6/site-packages/pymc3/step_methods/hmc/nuts.py", line 182, in astep
'might be misspecified.' % start.energy)
ValueError: Bad initial energy: nan. The model might be misspecified.