Yes - for some reason, this operation breaks the gradient! Similarly if you check the notebook, operation like tt.switch also breaks the gradient. Then the question becomes how to re-express that part into something that does not breaks the gradient. And doing subtensor (basically constructing that tensor piece by piece) seems to work
1 Like