Background information and work till now
I am trying to fit the parameters for a simple vehicle dynamics model using fake data generated from a high-fidelity vehicle dynamics model to which I add a gaussian normal noise. This simple vehicle dynamics model takes as input 10 parameters that I need to fit, the steering input (a vector) which comes from the “experiment” (fake data) and the initial conditions of the vehicle which also come from the “experiment”. For example, these are the parameters -
a=theta[0] # distance of c.g. from front axle (m)
b=theta[1] # distance of c.g. from rear axle (m)
Cf=theta[2] # front axle cornering stiffness (N/rad)
Cr=theta[3] # rear axle cornering stiffness (N/rad)
Cxf=theta[4] # front axle longitudinal stiffness (N)
Cxr=theta[5] # rear axle longitudinal stiffness (N)
m=theta[6] # the mass of the vehicle (kg)
Iz=theta[7] # yaw moment of inertia (kg.m^2)
Rr=theta[8] # wheel radius
Jw=theta[9] # wheel roll inertia
The high-fidelity vehicle model provides data as of now for 2 quantities that I can compare with the simple model. These are the Lateral Velocity and the Lateral Acceleration. These quantities are vectors with each element of the vector the value at a particular time T. I currently have about 470 points at 0.01 seconds each totalling about 4.7 seconds of data.
I am currently using my simple vehicle dynamics as a black box similar to the tutorial here using a gaussian log-likelihood. This log-likelihood is implemented in the following way -
def loglike(theta,time_o,st_inp,init_cond,data):
# sigma_vec takes the last 2 elements of theta - these last 2 parameters are the noise parameters
# that are also sampled
sigma_vec = theta[-(data.shape[0]):]
mod_data = vehicle_bi(theta,time_o,st_inp,init_cond)
# Calculate the difference
res = (mod_data - data)
# We will calculate the ssq for each vector and divide it with the norm of the data vector. This way we can
# then add all the individual ssqs without scaling problems
norm_ss = [None]*res.shape[0]
for i in range(0,res.shape[0]):
ss = np.sum(res[i,:]**2/(2.*sigma_vec[i]**2.))
ss = ss / np.linalg.norm(data[i,:])
norm_ss[i] = ss
# We will just use the sum of all these values as our sum of square - this gives equal weight
ssq = np.sum(norm_ss)
logp = (-1)*ssq
return logp
Here vehicle_bi is the simple Vehicle Dynamics black box model. Since the 2 quantities are not of the same order of magnitude, I use a normalisation while calculating my sum of squares.Along with the 10 parameters that I need to fit, I also sample the sigma for each of my data vectors. This is because in a real experiment, I will not know this noise distribution and was curious to see how sampling these would work. sigma_vec basically extracts those parameters from the theta vector and applies where appropriate to calculate the sum of squares.
Since my parameter space is high dimensional, I use NUTS as my sampler for which I define the gradient of the likelihood as follows -
def grad_loglike(theta,time_o,st_inp,init_cond,data):
def ll(theta,time_o,st_inp,init_cond,data):
return loglike(theta,time_o,st_inp,init_cond,data)
#precision
delx = np.finfo(float).eps
#We are approximating the partial derivative of the log likelihood with finite difference
return sp.optimize.approx_fprime(theta,ll,delx*np.ones(len(theta)),time_o,st_inp,init_cond,data)
I use the scipy finite difference approximate to evaluate the gradient (not ideal but I dont know how else I could evaluate this)
Now that these functions are defined, I can define my model and sample. As a first step I only sample two paremeters (Cf and Cr) that I am confident greatly define the Lateral Velocity and Acceleration of the vehicle (the 2 data vectors of our interest). This prevents any parameter non-identifiability issues. The model defines uniform priors over Cf and Cr and then I use the pm.Potential() to provide my likelihood. This can be seen here -
#Intiatting the loglikelihood object which is a theano operation (op)
like = LogLike(loglike,time_o,st_inp_o,init_cond,data)
with pm.Model() as model:
a = 1.14
b = 1.4
Cf = pm.Uniform('Cf',lower = -120000, upper = -50000,testval = -80000) # front axle cornering stiffness (N/rad)
Cr = pm.Uniform('Cr',lower = -120000, upper = -50000,testval = -65000) # rear axle cornering stiffness (N/rad)
Cxf = 10000
Cxr = 10000
m = 1720
Iz = 2420
Rr = 0.285
Jw = 2
#We are also sampling our observation noise - Seems like a standard to use Half normal for this
sigmaVy = pm.HalfNormal("sigmaVy",sigma = 0.06,testval=0.05) #Sigma for Lateral Velocity
sigmaLat_acc = pm.HalfNormal("sigmaLat_acc",sigma = 0.06,testval=0.05) #Sigma for Lateral Acceleration
## Convert our theta into a theano tensor
theta_ = [a,b,Cf,Cr,Cxf,Cxr,m,Iz,Rr,Jw,sigmaLat_acc,sigmaVy]
theta = tt.as_tensor_variable(theta_)
#According to this thread, should use pm.Potential
#https://stackoverflow.com/questions/64267546/blackbox-likelihood-example
pm.Potential("like",like(theta))
#Now sample!
#We use metropolis as the algorithm with parameters to be sampled supplied through vars
if(sys.argv[2] == "nuts"):
# step = pm.NUTS()
pm.sampling.init_nuts()
idata = pm.sample(ndraws ,tune=nburn,discard_tuned_samples=True,return_inferencedata=True,target_accept = 0.9, cores=2)
elif(sys.argv[2] == "met"):
step = pm.Metropolis()
idata = pm.sample(ndraws,step=step, tune=nburn,discard_tuned_samples=True,return_inferencedata=True,cores=2)
else:
print("Please provide nuts or met as the stepping method")
This actually works wonderfully. See below results for Tuning draws = 1500 and Draws = 3000 for 4 chains run parallel.
I see no divergences and nice ESS. The Rhat is 1.0 as can be seen from the similarity of the chains in the Rhat plot. The mean obtained is close to the true mean I was expecting and the model fits perfectly.
However I have the following questions for my next steps
- I now want to use this posterior as a prior for say fitting other parameters which are excited by some other type of data. For example, parameters a and b are very related to the longitudinal velocity and will probably be identifiable from that data. How do I go about setting this up? I think this is similar to hierarchical modelling but I am not sure.
- I also want to draw from this posterior. For example, I understand that using the mean of the distribution minimises the mean squared error, however, I would also like to explore minimising different losses and would thus need different point wise estimates from the posterior. Is this possible considering how my model is set up? Namely, I do not have a CDF and I dont see how I will be able to draw from this posterior without that.
- The current model currently takes around 0.05 seconds (wall time) to run for 4.7 seconds of data. However, I eventually I will be working with models which take around 3.5 seconds for 4.7 seconds of data. The above results took about 45 min to run on a decent machine. I presume that a model that is 100 times slower will cause the sampling to take a lot more time (considering I will also be sampling for many more parameters). Is there any way I can speed this up by changing my model setup?
- According to what I am reading on the discourse, PyMC v4 uses tensorflow probability as backend and not Theano. So I wanted to know if such a blackbox model will be possible to construct using PyMC v4?
- I actually also tried to sample 2 parameters (a and b uniform prior) that are not very identifiable with the Lateral Acceleration and the Lateral Velocity and got pretty bad results (as expected)
There were 77 divergences as can be seen on the rug plot below the trace plot. The pairwise plots have crazy funnel like distributions. I read a bit and it seems like reparametrisation can help exploring such a complicated typical set, but I don’t see how it can be applied to my problem. Is there any other way of doing this? My current idea is related to my first question where I fit a few parameters for some data and fit other parameters with other data. Is this possible?
This turned out to be a longer post than expected. I am not sure if I have provided enough background information so please let me know if something is unclear. Also please feel free to provide suggestions to model things more efficiently, I am very new to this field and only heard about Bayesian Inference and pymc3 about 3 months back.