Hi,

I have some data in the form of a matrix that has some missing values coded as -999 in the first column. I have one function that calculates the log-likelihood of all rows without missing data, and the other that calculates the log-likelihood of all rows with missing data. after the calculation, I want to combine the results together as one vector as the output of a likelihood function. It seems that you can’t use something like `data[data[:, 0] == -999, :]`

, which you would usually do in `numpy`

, to subset the data in pytensor. What is the recommended way to do this in pytensor?

Thanks!

Can you provide a bit more context for your indexing operation? PyTensor has all the standard numpy operations e(e.g., `where()`

, etc.). Something should work.

Thank you @cluhmann and @ricardoV94!

So I thought about using `pt.where()`

, but this will perform computation on the full dataset in both cases, which doesn’t seem to be the most efficient way to do this.

So what I want to do, if I had `numpy`

, is something like this:

```
def logp(data, ...)
split1 = data[data[:, 0] == -999, :]
split2 = data[data[:, 0] != -999, :]
result1 = func1(split1, ...)
result2 = func2(split2, ...)
result = np.zeros(data.shape[0])
result[data[:, 0] == -999] = result1
result[data[:, 0] != -999] = result2
return result
```

It seems that index assignment is not supported, so I will have to use `pt.set_subtensor()`

. However, I can’t use boolean indexing in `set_subtensor`

. How do I get around this?

You have to use `pt.eq`

and `pt.neq`

instead of `==`

and `!=`

. One of the annoying things of working with PyTensor variables. It has to do with Python constraints on equality and inequality / hashing

So this

```
result = np.zeros(data.shape[0])
result[data[:, 0] == -999] = result1
result[data[:, 0] != -999] = result2
```

Can be re-written as

```
result = pt.zeros(data.shape[0])
pt.set_subtensor(pt.eq(result[data[:, 0], -999]), result1, inplace=True)
pt.set_subtensor(pt.neq(result[data[:, 0], -999]), result2, inplace=True)
```

correct? I know that the dimensions of `result1`

and `result2`

will always be correct, but does pytensor know about this when compiling the op?

2 Likes

You don’t need to use the inplace flag, Pytensor will add inplace Ops itself. You can initialize the tensor with `pt.empty`

or `pt.empty_like`

instead.

Do note that such optimizations may not result in a faster graph. Sometimes indexing is actually slower as it breaks loop fusion, memory layouts, (and indexing itself can be slow). If the graphs of func1/func2 are Elemwise the compiler (after PyTensor) may even avoid the useless branch without you knowing it.

This worked really well. Thank you so much, @ricardoV94 and @cluhmann!

1 Like