I have been trying, and failing, to implement a very simple trial-and-error-learning model in PyMC. The model predicts a distribution N(mu_t,sigma_t) at each time point ‘t’. It then samples from the distribution on that trial. If the sample is close to a pre-defined value ‘r’, that trial is rewarded, else no reward. If a trial is rewarded then mu_t+1 is moved towards the sampled value by some step-size parameter ‘alpha’, such that the predicted distribution in trial t+1 is centred closer to the sample on trial t. Sigma is also a function of the outcome on the previous trial, to encourage exploration after an unrewarded trial.
The error I get :
“ValueError: Random variables detected in the logp graph: {normal_rv{0, (0, 0), floatX, False}.out}.
This can happen when DensityDist logp or Interval transform functions reference nonlocal variables,
or when not all rvs have a corresponding value variable.”
My current code:
def lik(self,rot,nSteps,obs):
with pm.Model() as model:
nSteps = pm.intX(pm.ConstantData("nSteps", nSteps))
rot = T.as_tensor_variable(rot)
executionNoise = pm.Uniform(name='executionNoise',lower=0.0,upper=100.0)
exploratoryNoise = pm.Uniform(name='exploratoryNoise',lower=0.0,upper=10000.0)
lr = pm.Uniform(name='lr',lower=0.0,upper=1.0)
observations = pm.Data("observations", obs)
#outputs_info = np.array([0.,0.,0.]).astype('float64')#desiredCurrMu,sigma,usedAim
desiredCurr = np.array([0.]).astype('float64')[0]
sigma = np.array([0.]).astype('float64')[0]
usedAim = np.array([0.]).astype('float64')[0]
outcome = np.array([0]).astype('int8')[0]
rng = RandomStream(0)
#rng = np.random.default_rng(0)
def stepFunc(desiredCurr,sigma,usedAim,outcome,executionNoise,exploratoryNoise,lr):
sig = pm.math.switch(
T.eq(outcome,1),
executionNoise,
executionNoise + exploratoryNoise
)
des = pm.math.switch(
T.eq(outcome,1),
desiredCurr + ((usedAim - desiredCurr) * lr) % 360,
desiredCurr
)
aim = pm.Normal.dist(mu=des,sigma=sig)
#aim = rng.normal(des,sig)
aim = pm.math.switch(
T.ge(aim, 180),
aim - 360,
aim
)
rew = pm.math.switch(
T.le(pm.math.abs(aim - rot), 3),
1,
0
)
return des,sig,aim,rew
outputs_info,_ = pt.scan(fn=stepFunc,
outputs_info=[desiredCurr,sigma,usedAim,outcome],
non_sequences=[executionNoise,exploratoryNoise,lr],
n_steps=nSteps)
desiredCurrMu,sigma,usedAim,outcome = outputs_info
states = pm.Deterministic("states", usedAim)
sigmas = pm.Deterministic("sigmas", sigma)
likelihood = pm.Normal("likelihood", mu=states, sigma=sigmas, observed=observations)
return model
If it’s useful, here is the original model, which is hopefully fairle self-explanatory:
How do I implement this model correctly in PyMC?
Many thanks in advance.