Help understanding pm.categorical

Hello all!

I’m trying to model windspeed as a Markov process - where the state of the windspeed can be one of 20 states, each of which has a average windspeed and std dev.
This is then fed into a normal distribution to generate a sample windspeed.
At each point in time, the state has a probability of changing into a new state given by the transition probability.

What I don’t understand is that in the trace, I end up with 20 traces for windspeed, looks like one per starting windspeed (trace[“wind_speeds”].T).
How do I set the initial state, and only run a trace using that initial state?

Thanks for being patient if this is a simple question!

import numpy as np
import pymc3 as pm
from scipy import stats

# Set the random seed for reproducibility
np.random.seed(42)

# Generate transition probabilities using a Normal distribution
def gen_prob(index):
    x = [stats.norm.pdf(i - index) for i in range(20)]
    x /= sum(x)
    return x

# Set the transition probabilities for the Markov chain
transition_probabilities = [gen_prob(i) for i in range(20)]

# Set the average wind speeds for each state
mean_speeds = np.arange(20)

# Set the standard deviations for each state
std_speeds = np.logspace(-1,0.5,20)

# Create a Markov chain with 3 states
model = pm.Model()
with model:
    mean_speeds = pm.Data('mean_speeds', mean_speeds)
    std_speeds = pm.Data('std_speeds', std_speeds)
    
    # Create a categorical variable with 3 possible states
    state = pm.Categorical('state', p=transition_probabilities, shape=(20,))

#     # Create a normal distribution for the wind speeds
    wind_speeds = pm.Normal('wind_speeds', mu=mean_speeds[state], tau=std_speeds[state], shape=(1,20))

# Generate a sequence of 1000 random wind speeds using the Markov chain
with model:
    trace = pm.sample(1000)

Welcome!

I think the problem might be that you’re not sampling sequences, but rather independent draws. To implement a markov chain, you need to use scan, which allows for recursive computation. It’s also a bit messy to pass RandomVariables though the scan, you have to do some tricks to register them to the model. I followed this post as a guide to re-write your code as follows:

import numpy as np
import pymc as pm
from scipy import stats


# Set the random seed for reproducibility
np.random.seed(42)

# Generate transition probabilities using a Normal distribution
def gen_prob(index):
    x = [stats.norm.pdf(i - index) for i in range(20)]
    x /= sum(x)
    return x

# Set the transition probabilities for the Markov chain
transition_probabilities = at.stack([gen_prob(i) for i in range(20)])

# Set the average wind speeds for each state
mean_speeds = np.arange(20)

# Set the standard deviations for each state
std_speeds = np.logspace(-1,0.5,20)

# Create a Markov chain with 3 states
model = pm.Model()
with model:
    mean_speeds = pm.Data('mean_speeds', mean_speeds, mutable=False)
    std_speeds = pm.Data('std_speeds', std_speeds, mutable=False)
    
    initial_state = pm.Categorical('x0', p=np.full(20, 1/20))
    rng = aesara.shared(np.random.default_rng())
    
    def transition(previous_state, transition_probs, old_rng):
        p = transition_probs[previous_state]
        next_rng, next_state = pm.Categorical.dist(p = p, rng=old_rng).owner.outputs
        return next_state, {old_rng: next_rng}
    
    markov_chain, updates = aesara.scan(transition,
                                        non_sequences=[transition_probabilities, rng],
                                        outputs_info=[initial_state],
                                        n_steps=30)
    
    model.register_rv(markov_chain, name='markov_chain')
    
    # Create a normal distribution for the wind speeds
    wind_speeds = pm.Normal('wind_speeds', mu=mean_speeds[markov_chain], tau=std_speeds[markov_chain])

# Generate 1000 random draws of wind speed sequences of length 30 using the Markov chain
with model:
    idata = pm.sample_prior_predictive(compile_kwargs={'updates':updates})

The important changes are:

  1. stacking your transition probabilities into a matrix variable recognized by the PyMC backend (I’m using Aesara but this has just recently been forked to PyTensor, and I noticed you’re on PyMC3, so you’d be using theano. Both of us need to update our PyMC installs)
  2. Drawing a single initial_state from a single distribution with 20 possible outcomes. I believe this gets to your initial question – size gives the size of the output, not the dimensionality of the support. Your categorical distribution has support over 20 numbers, but we only want to draw 1, so size=1.
  3. Make an rng object. The back end needs to pass this along through the recursive computation, so we need to provide it.
  4. Made a function to step the markov chain. Inside that function we make a new Categorical distribution, but we have to use some tricks because we 1. want to update the state of the rng and 2. don’t want to keep all these intermediate distributions on the computational graph.
  5. Run a scan to recursively set the next state based on the previous state
  6. Manually add the whole markov chain as a single variable to the model with model.register.
  7. Use pm.sample_prior_predictive (since you didn’t show the model any ground truth data, it doesn’t make sense to use pm.sample), and provide the updates in compile_kwargs.

Note that the number of steps in the markov chain is dictated by n_steps, NOT by the number of draws in pm.sample_prior_predictive!

3 Likes