I don’t have an end-to-end yet, but the idea is to be able to do something like this (this is actual working code, it’s just missing some parts to be end-to-end):
alias Ulam.Nutex.Math
alias Ulam.Nutex.Math.Compiler
test "compile unnormalized normal logp" do
use Ulam.Nutex.Math.Operators
i = Math.v("i")
n = Math.data("n")
y = Math.data("y")
x = Math.data("x")
alpha = Math.param("alpha")
beta = Math.param("beta")
epsilon = Math.param("epsilon")
result = - (y[i] - (beta * x[i] + alpha)) * (y[i] - (beta * x[i] + alpha)) / (2.0 * epsilon * epsilon) - Math.ln(epsilon)
{_cache, expression} = Math.optimize_summation(i, n, result)
Compiler.to_rust(expression)
end
The code above defines the unnormalized log-likelihood for a linear regression model with gaussian error. The coefficients of the module are alpha
and beta
and the error is epsilon
. The line use Ulam.Nutex.Math.Operators
overrides the math operators so that they generate unevaluated math expressions, which actually compile into rational functions of variables, constants and elementary functions (such as the Math.ln()
function).
The expression above should be summed over all i
(the index for the x[i]
and y[i]
vector access terms) to get the full likelihood for the data. The code the attempts to separate the parts that depend on i
and turn the expression into a linear combination of terms that depend on i
and on the x
and y
data parameters.
Because the normal distribution (and many others) belong to the exponential family, we know that separating these terms will be worth it, because after separating x[i]
and y[i]
term (and possibly their product) we can use the associative property of addition to sum move the parameters out of the sums and cache the sums before even running the model (i.e. (0..n).into_iter().map(|i| x[i]).sum()
or (0..n).into_iter().map(|i| x[i] * y[i]).sum()
) and convert the expression into something that only uses the sums. This drammatically decreases the number of nodes in the Wishart tape and the number of operations required to evaluate the gradient.
The code above then generates the following rust code:
let target = target + (
- 2.0 * (self.data.n as f64) * epsilon.ln() * epsilon * epsilon
- (self.data.n as f64) * alpha * alpha
- 2.0 * alpha * beta * self.cache.sum_of_x_i__0
+ 2.0 * alpha * self.cache.sum_of_y_i__3
- beta * beta * self.cache.sum_of_x_i_x_i__1
+ 2.0 * beta * self.cache.sum_of_x_i_y_i__2
- self.cache.sum_of_y_i_y_i__4
) / ((2.0 * epsilon * epsilon));
This can be used as part of the following Logp definition (which I still can’t build automatically, but that’s just a matter of bookkeeping):
#[derive(Debug, Clone)]
pub struct CacheContainer {
sum_of_x_i__0: f64,
sum_of_y_i__3: f64,
sum_of_x_i_x_i__1: f64,
sum_of_x_i_y_i__2: f64,
sum_of_y_i_y_i__4: f64
}
#[derive(Debug, Clone)]
pub struct DataContainer {
y: Vec<f64>,
x: Vec<f64>,
n: u32
}
#[derive(Debug)]
pub struct PosteriorDensityWithCache {
cache: CacheContainer,
data: DataContainer
}
impl CpuLogpFunc for PosteriorDensityWithCache {
type LogpError = PosteriorLogpError;
fn dim(&self) -> usize { 3 }
fn logp(&mut self, position: &[f64], grad: &mut [f64]) -> Result<f64, Self::LogpError> {
let tape = autograd::Tape::new();
let target = tape.add_var(0.0);
let alpha = tape.add_var(position[0]);
let beta = tape.add_var(position[1]);
let epsilon = tape.add_var(position[2]);
let target = target + (
- 2.0 * (self.data.n as f64) * epsilon.ln() * epsilon * epsilon
- (self.data.n as f64) * alpha * alpha
- 2.0 * alpha * beta * self.cache.sum_of_x_i__0
+ 2.0 * alpha * self.cache.sum_of_y_i__3
- beta * beta * self.cache.sum_of_x_i_x_i__1
+ 2.0 * beta * self.cache.sum_of_x_i_y_i__2
- self.cache.sum_of_y_i_y_i__4
) / ((2.0 * epsilon * epsilon));
let target_grad = target.grad().wrt(&[alpha, beta, epsilon]);
// This is stupid, it can be done better but my rust doesn't allow
// for anything more advanced yet...
grad[0] = target_grad[0];
grad[1] = target_grad[1];
grad[2] = target_grad[2];
return Ok(target.val)
}
}
The DataContainer
and the CacheContainer
can be initialized before even running the Logp and don’t need to be rebuilt everytime. The autograd
module is a slight modification of the reverse package which implements reverse-mode autodiff for rust.
One could implement part of the above purely in the rust side by exploiting the properties of the exponential family of distributions, but by expanding all the multiplications on the Elixir side, I can simplify things even further when compared to what one could get with the properties of the exponential function alone.
So, in summary, most of the above rust code was written by hand, but I have a road map to be able to generate it from Elixir. Ideally I would do most code generation on the Rust side because it would be more portable, but implementing what is equivalent to a subset of a computer algebra system in Rust to reduce the complex expressions seems like hell (and I can’t write much rust to be honest…)