Data frames vs numpy arrays in deterministic nodes


#1

So I’ve got columns of data in a numpy dataframe, and something like this does not work

VA = pm.Deterministic('VA', data.A * (1.0 / (1.0+pm.math.exp(logk)*data.DA)**s))

if I convert that into np arrays, then it works

VA = pm.Deterministic('VA', np.array(data.A) * (1.0 / (1.0+pm.math.exp(logk)*np.array(data.DA))**s))

and if I convert the data frame to a dict of np.array beforehand, then this will work

VA = pm.Deterministic('VA', data['A'] * (1.0 / (1.0+pm.math.exp(logk)*data['DA'])**s))

Providing data as data frames (top example) seems the most natural thing to do, but just wondering why it doesn’t work.

Note: this only seems to be a problem for deterministic nodes… having data from data frames in stochastic nodes has not been a problem.


#2

In the Stochastic node pandas dataframe or series are cast to numpy array internally, but in Deterministic node it evaluate the expression first. Which in this case it is more a theano error as theano does not allow multipling a tensor with pandas series.

You can do data.A.values instead of np.array(data.A) - it does the same thing but you type less :wink:


#3

thanks. This is good to know. Might be worth a quick note in the docs, maybe here http://docs.pymc.io/notebooks/api_quickstart.html#Deterministic-transforms? Otherwise, I guess this thread will serve as a useful reminder when I make the same mistake in a few weeks :wink: