Have there been any attempts to bring symbolic regression techniques to PyMC? I was looking into Bayesian symbolic regression and I was only able to find this one paper and, if I understand correctly, they had to use an unusual type of MCMC sampler. I was looking for implementations in any of the big PPL’s, but wasn’t able to find anything. If they exist, they are not publicized.
I strongly suspect that my best option is to use a non-parametric regression technique in a model during inference and, outside the model, use a symbolic regression toolkit like PySR on the conditioned non-parametric regressor. Has anyone attempted something like this before?
Thank you all for your time
You should check out the work of MilesCranmer He has a package for symbolic regression in Julia - SymbolicRegression.jl. He’s also got a package called PySR which seems a first glance to be implemented in Python.
Thank you for the reply! I am familiar with PySR and I agree that it could be useful to me. The reason for my question is that PySR only works when you have clearly defined input and output data. However, I am more interested in doing symbolic regression with latent variables as a function of input data in a PyMC model and I’m not sure what the best way to accomplish that is. It seems like there’s no good way to include a symbolic regressor in a PyMC model. Therefore, my first thought is to use a nonparametric regressor like a Gaussian process and, after sampling, use PySR to get an approximate functional form of the GP outside of the PyMC model. I was wondering if anyone had attempted something like this and if they had any experiences they wanted to share.
If your latent variable(s) represents something that you can reason about, then you can just build a model that defines the functional relationship between the latent variable and its parents. So not symbolic regression in that you are not using PyMC to discover what this relationship is, you are defining it yourself. But if you want to explore various functional forms of the relationships between latent variables and their parents, you could just build a function factory and evaluate a whole load of models and do model comparison or ensemble prediction for example. Just some thoughts.