Hi,
I have some data in the form of a matrix that has some missing values coded as -999 in the first column. I have one function that calculates the log-likelihood of all rows without missing data, and the other that calculates the log-likelihood of all rows with missing data. after the calculation, I want to combine the results together as one vector as the output of a likelihood function. It seems that you can’t use something like data[data[:, 0] == -999, :]
, which you would usually do in numpy
, to subset the data in pytensor. What is the recommended way to do this in pytensor?
Thanks!
Can you provide a bit more context for your indexing operation? PyTensor has all the standard numpy operations e(e.g., where()
, etc.). Something should work.
Thank you @cluhmann and @ricardoV94!
So I thought about using pt.where()
, but this will perform computation on the full dataset in both cases, which doesn’t seem to be the most efficient way to do this.
So what I want to do, if I had numpy
, is something like this:
def logp(data, ...)
split1 = data[data[:, 0] == -999, :]
split2 = data[data[:, 0] != -999, :]
result1 = func1(split1, ...)
result2 = func2(split2, ...)
result = np.zeros(data.shape[0])
result[data[:, 0] == -999] = result1
result[data[:, 0] != -999] = result2
return result
It seems that index assignment is not supported, so I will have to use pt.set_subtensor()
. However, I can’t use boolean indexing in set_subtensor
. How do I get around this?
You have to use pt.eq
and pt.neq
instead of ==
and !=
. One of the annoying things of working with PyTensor variables. It has to do with Python constraints on equality and inequality / hashing
So this
result = np.zeros(data.shape[0])
result[data[:, 0] == -999] = result1
result[data[:, 0] != -999] = result2
Can be re-written as
result = pt.zeros(data.shape[0])
pt.set_subtensor(pt.eq(result[data[:, 0], -999]), result1, inplace=True)
pt.set_subtensor(pt.neq(result[data[:, 0], -999]), result2, inplace=True)
correct? I know that the dimensions of result1
and result2
will always be correct, but does pytensor know about this when compiling the op?
2 Likes
You don’t need to use the inplace flag, Pytensor will add inplace Ops itself. You can initialize the tensor with pt.empty
or pt.empty_like
instead.
Do note that such optimizations may not result in a faster graph. Sometimes indexing is actually slower as it breaks loop fusion, memory layouts, (and indexing itself can be slow). If the graphs of func1/func2 are Elemwise the compiler (after PyTensor) may even avoid the useless branch without you knowing it.
This worked really well. Thank you so much, @ricardoV94 and @cluhmann!
1 Like