Fitting a Vehicle Dynamics model as a black box - results and questions

Background information and work till now
I am trying to fit the parameters for a simple vehicle dynamics model using fake data generated from a high-fidelity vehicle dynamics model to which I add a gaussian normal noise. This simple vehicle dynamics model takes as input 10 parameters that I need to fit, the steering input (a vector) which comes from the “experiment” (fake data) and the initial conditions of the vehicle which also come from the “experiment”. For example, these are the parameters -

	a=theta[0]  # distance of c.g. from front axle (m)
	b=theta[1]  # distance of c.g. from rear axle  (m)
	Cf=theta[2] # front axle cornering stiffness (N/rad)
	Cr=theta[3] # rear axle cornering stiffness (N/rad)
	Cxf=theta[4] # front axle longitudinal stiffness (N)
	Cxr=theta[5] # rear axle longitudinal stiffness (N)
	m=theta[6]  # the mass of the vehicle (kg)
	Iz=theta[7] # yaw moment of inertia (kg.m^2)
	Rr=theta[8] # wheel radius
	Jw=theta[9] # wheel roll inertia

The high-fidelity vehicle model provides data as of now for 2 quantities that I can compare with the simple model. These are the Lateral Velocity and the Lateral Acceleration. These quantities are vectors with each element of the vector the value at a particular time T. I currently have about 470 points at 0.01 seconds each totalling about 4.7 seconds of data.

I am currently using my simple vehicle dynamics as a black box similar to the tutorial here using a gaussian log-likelihood. This log-likelihood is implemented in the following way -

def loglike(theta,time_o,st_inp,init_cond,data):
# sigma_vec takes the last 2 elements of theta - these last 2 parameters are the noise parameters
# that are also sampled
    sigma_vec = theta[-(data.shape[0]):]

    mod_data = vehicle_bi(theta,time_o,st_inp,init_cond)
    # Calculate the difference
    res = (mod_data - data)
 
    # We will calculate the ssq for each vector and divide it with the norm of the data vector. This way we can
    # then add all the individual ssqs without scaling problems
    norm_ss = [None]*res.shape[0]
    for i in range(0,res.shape[0]):
        ss = np.sum(res[i,:]**2/(2.*sigma_vec[i]**2.))
        ss = ss / np.linalg.norm(data[i,:])
        norm_ss[i] = ss
    # We will just use the sum of all these values as our sum of square - this gives equal weight
    ssq = np.sum(norm_ss)
    logp = (-1)*ssq
    return logp

Here vehicle_bi is the simple Vehicle Dynamics black box model. Since the 2 quantities are not of the same order of magnitude, I use a normalisation while calculating my sum of squares.Along with the 10 parameters that I need to fit, I also sample the sigma for each of my data vectors. This is because in a real experiment, I will not know this noise distribution and was curious to see how sampling these would work. sigma_vec basically extracts those parameters from the theta vector and applies where appropriate to calculate the sum of squares.
Since my parameter space is high dimensional, I use NUTS as my sampler for which I define the gradient of the likelihood as follows -

def grad_loglike(theta,time_o,st_inp,init_cond,data):
	def ll(theta,time_o,st_inp,init_cond,data):
		return loglike(theta,time_o,st_inp,init_cond,data)
	
	#precision
	delx = np.finfo(float).eps
	#We are approximating the partial derivative of the log likelihood with finite difference
	return sp.optimize.approx_fprime(theta,ll,delx*np.ones(len(theta)),time_o,st_inp,init_cond,data)

I use the scipy finite difference approximate to evaluate the gradient (not ideal but I dont know how else I could evaluate this)

Now that these functions are defined, I can define my model and sample. As a first step I only sample two paremeters (Cf and Cr) that I am confident greatly define the Lateral Velocity and Acceleration of the vehicle (the 2 data vectors of our interest). This prevents any parameter non-identifiability issues. The model defines uniform priors over Cf and Cr and then I use the pm.Potential() to provide my likelihood. This can be seen here -

#Intiatting the loglikelihood object which is a theano operation (op)
	like = LogLike(loglike,time_o,st_inp_o,init_cond,data)
	with pm.Model() as model:
		a = 1.14
		b = 1.4
		Cf = pm.Uniform('Cf',lower = -120000, upper = -50000,testval = -80000) # front axle cornering stiffness (N/rad)
		Cr = pm.Uniform('Cr',lower = -120000, upper = -50000,testval = -65000) # rear axle cornering stiffness (N/rad)
		Cxf = 10000
		Cxr = 10000
		m = 1720
		Iz = 2420
		Rr = 0.285
		Jw = 2
		#We are also sampling our observation noise - Seems like a standard to use Half normal for this 
		sigmaVy = pm.HalfNormal("sigmaVy",sigma = 0.06,testval=0.05) #Sigma for Lateral Velocity
		sigmaLat_acc = pm.HalfNormal("sigmaLat_acc",sigma = 0.06,testval=0.05) #Sigma for Lateral Acceleration

		## Convert our theta into a theano tensor
		theta_ = [a,b,Cf,Cr,Cxf,Cxr,m,Iz,Rr,Jw,sigmaLat_acc,sigmaVy]
		theta = tt.as_tensor_variable(theta_)

		#According to this thread, should use pm.Potential
		#https://stackoverflow.com/questions/64267546/blackbox-likelihood-example
		pm.Potential("like",like(theta))

		#Now sample!
		#We use metropolis as the algorithm with parameters to be sampled supplied through vars
		if(sys.argv[2] == "nuts"):
			# step = pm.NUTS()
			pm.sampling.init_nuts()
			idata = pm.sample(ndraws ,tune=nburn,discard_tuned_samples=True,return_inferencedata=True,target_accept = 0.9, cores=2)
		elif(sys.argv[2] == "met"):
			step = pm.Metropolis()
			idata = pm.sample(ndraws,step=step, tune=nburn,discard_tuned_samples=True,return_inferencedata=True,cores=2)
		else:
			print("Please provide nuts or met as the stepping method")

This actually works wonderfully. See below results for Tuning draws = 1500 and Draws = 3000 for 4 chains run parallel.




I see no divergences and nice ESS. The Rhat is 1.0 as can be seen from the similarity of the chains in the Rhat plot. The mean obtained is close to the true mean I was expecting and the model fits perfectly.
However I have the following questions for my next steps

  1. I now want to use this posterior as a prior for say fitting other parameters which are excited by some other type of data. For example, parameters a and b are very related to the longitudinal velocity and will probably be identifiable from that data. How do I go about setting this up? I think this is similar to hierarchical modelling but I am not sure.
  2. I also want to draw from this posterior. For example, I understand that using the mean of the distribution minimises the mean squared error, however, I would also like to explore minimising different losses and would thus need different point wise estimates from the posterior. Is this possible considering how my model is set up? Namely, I do not have a CDF and I dont see how I will be able to draw from this posterior without that.
  3. The current model currently takes around 0.05 seconds (wall time) to run for 4.7 seconds of data. However, I eventually I will be working with models which take around 3.5 seconds for 4.7 seconds of data. The above results took about 45 min to run on a decent machine. I presume that a model that is 100 times slower will cause the sampling to take a lot more time (considering I will also be sampling for many more parameters). Is there any way I can speed this up by changing my model setup?
  4. According to what I am reading on the discourse, PyMC v4 uses tensorflow probability as backend and not Theano. So I wanted to know if such a blackbox model will be possible to construct using PyMC v4?
  5. I actually also tried to sample 2 parameters (a and b uniform prior) that are not very identifiable with the Lateral Acceleration and the Lateral Velocity and got pretty bad results (as expected)

    Screenshot 2022-03-24 at 5.21.53 PM
    Screenshot 2022-03-24 at 5.21.58 PM
    There were 77 divergences as can be seen on the rug plot below the trace plot. The pairwise plots have crazy funnel like distributions. I read a bit and it seems like reparametrisation can help exploring such a complicated typical set, but I don’t see how it can be applied to my problem. Is there any other way of doing this? My current idea is related to my first question where I fit a few parameters for some data and fit other parameters with other data. Is this possible?

This turned out to be a longer post than expected. I am not sure if I have provided enough background information so please let me know if something is unclear. Also please feel free to provide suggestions to model things more efficiently, I am very new to this field and only heard about Bayesian Inference and pymc3 about 3 months back.

Taking these backwards:

(5)

It’s hard to proffer advice on prior selection or alternative parameterizations without seeing the form of the likelihood function. Given that they’re not particularly identifiable, it means that either the likelihood is largely invariant to them (in which case, use a strong prior for sampling), or varies (largely) with some scalar function g(a,b) - in which case you might consider sampling from g and the preimage {(a,b) : g(a,b)=g} (apologies for using g as a value and a function here).

(4) Is PyMC4 dead or just resting? - #5 by bridgeland

(3) 45 minutes seems like a long time for 4 parameters. Can the likelihood function implementation be optimized?

(2) The values you have plotted are samples from the posterior. You can use the model to generate additional samples from the posterior.

(1) This is a perennial problem that has been asked many times before, and it’s deeply related to your observation that you don’t have a CDF. You have samples from P_{\mathrm{post}}[\theta] = P[\mathcal{D}|\theta]P[\theta]; but if that distribution isn’t closed-form, the only way to sample from P_\mathrm{post}[\alpha] = P[\mathcal{D}'|\alpha, \theta]P_{\mathrm{post}}[\theta]P[\alpha] is via

P_\mathrm{post}[\alpha] = P[\mathcal{D}'|\alpha, \theta]P[\mathcal{D}|\theta]P[\theta]P[\alpha]

i.e., by including both datasets. Solutions posed here, so far, have been to approximate the posterior CDF with some (typically non-parametric) distribution Q_{\mathrm{post}}[\theta] \approx P[\mathcal{D}|\theta]P[\theta].

One possibility is taking a close look at your likelihood function and seeing if it will factor into something of the form

P[\mathcal{D} | \theta] = \lambda(\mathcal{D})f(\mu(\mathcal{D}), \theta)

where \lambda and f are arbitrary functions, but \mu computes sufficient statistics for \theta. Then using P[\theta] \propto f(\mu_0, \theta) is conjugate. This would give you a closed-form posterior an also a mechanism of speeding up your sampling via a custom step.

3 Likes

@chartl Thanks this helps alot!
(5) I think I get what you are saying here. Just to confirm if I understand this correctly, the existence of a scalar function g(a,b) implies that a and b are not uniquely identifiable but still identifiable as a function correct? This means that I have to create a new parameter which is a function g(a,b) and sample from that?
(4) Okay, so I have to probably just change all my Theano dependencies to Aesara. Thats nice!
(3) By optimisation of likelihood I assume you mean optimising my black box vehicle dynamics model. Although I could potentially make that run even faster, my end goal is to use a more complicated (and extremely well optimised) model as a black box. This model will still be slower than the current model I am using. Is there any other place I can optimise to make my sampling faster?
(2) I see. That makes sense.
(1) I think I understand what you are getting at. One question I have is why cant I use KDE to approximate Q_{post}[\theta]?

You can; but a standard KDE will necessarily retain all of the data (as centers of Gaussian balls – by data here I mean sampled points from the posterior) and compute the full likelihood of the KDE for sampling, so don’t expect much of a speedup over the full posterior. Typically choices of Q are more concise than KDE for efficiency reasons.

1 Like