Memory problems with nested theano.scan

Update: I am now pretty sure that the issue is caused by the nested theano.scan loops. When I remove them, the model runs fine for large datasets.

How can I get rid of the nested loops? The reason why I am using them is to transform Q-values into probabilities, i.e., to transform triples of action values into triples of action probabilities. The Q-values are anywhere between 0-10, and I apply a softmax transform and then normalize to get probabilities that sum up to 1. The reason why I am looping twice is to get the normalization right. I need individual triples (the 3 Q-values of 1 trial from 1 subject), rather than lists of lists of triples (Q-values of all trials from all subjects) to calculate the right normalization term. Is there a more elegant way to do this using matrix operations?

Again, here is the code in question:

# Define function to transform Q-values into action probabilities
    def softmax(Q_low, season, alien, beta):
        # Loop over subjects within 1 trial
        Q_low_stim = Q_low[season, alien]
        Q_low_exp = T.exp(beta * Q_low_stim)
        p = Q_low_exp / T.sum(Q_low_exp)
        return p

    def softmax_trial_wrapper(Q_low, season, alien, beta):
        # Loop over trials
        p, _ = theano.scan(fn=softmax,
                           sequences=[Q_low, season, alien, beta])
        return p

    # Transform Q-values into action probabilities for all subj, all trials
    p, _ = theano.scan(fn=softmax_trial_wrapper,
                       sequences=[Q_low, seasons, aliens],
                       non_sequences=[beta])