Pretty new to using this code, and am struggling to work out what type of model is approprate for my case.
I am attempting to take data from an atomistic modelling program to work out values for interatomic potentials. I have rough ideas from literature of what these values should be, so I set a script up which randomly sampled from given ranges and allowed the system to relax. I then parse the output to find the systems which have desirable properties (low change in volume/lattice parameters, low sum square of displacements of atoms compared to experiment).
My dataset is a dataframe which contains the parameter values selected for a given run and their corresponding metrics (0 being perfect, anything positive indicating error).
Ideally, I’d be able to feed in a dataset from a number of runs (say, 1000), and get a posterior distribution out which corresponds to runs which performed well.
Could anyone point me in the right direction?
What you describe sounds like a SMC-ABC type of problem, but in your case you already have the simulated parameters and a column indicating whether it is a perfect match? In that case, you can plot the histogram of the parameter when the corresponding metrics is equal 0 (or smaller than eps a small positive value), and that would be your posterior distribution of free parameters.
If I did happen to get perfect values then I would be able to check. The issue is I’m trying to simultaneously optimise 15 variables, so the search space is huge. I was hoping that by randomly sampling the space, I could infer a narrower search space for the next round of simulations.
Yeah that’s exactly under the scope of SMC-ABC. We are working on check that into the code base @aloctavodia
I guess we should have SMC-ABC in PyMC3 in no more than a week.
@mjclarke94 in SMC-ABC-speak your “atomistic modelling program” is call a simulator SMC-ABC starts by passing a set of parameters from a prior distribution (your “rough ideas from literature”) to the simulator and then comparing the results of the simulator to the observed/experimental values. Then the set of proposed parameters is refined iteratively in order to make the simulated data and the observed data come closer.
To use SMC-ABC you will need to call your “atomistic modelling program” from Python. If you can share the data and “atomistic modelling program” then I could try to see how well SMC-ABC works for your problem, otherwise we will let you know when SMC-ABC becomes available on PyMC3 and you will be able to try it by yourself.
@aloctavodia Thanks for that!
The program itself is GULP, which takes an input file containing options, cell parameters, a list of atom positions, and then a list of potentials which defines the energy associated with given atom-atom interactions as a function of their distance.
My approach has been to take the input file and modify it by string formatting then use subprocess to run the system. The output file is then parsed and the key data extracted.
I’m attaching (over several posts because of new user limits) the template which generates input files, an example input/output pair and a csv containing the parameters and their corresponding metrics for several runs. SSD is the sum square of displacements of atoms from their original position and SDMax is the maximum squared displacement (i.e. the displacement of the worst atom). These values are extracted from the "Comparison of initial and final structures’ part of the output file.
displacements.csv (27.5 KB)
Input output file pair:
exampleIn.txt (9.9 KB)
exampleOut.txt (79.2 KB)
template.txt (10.1 KB)
Do you have a python function that takes care of the parsing and calling GULP via subprocess??
Basically we need something like
def simulator(a, b, c):
create input for gulp
call gulp with the created input
where a, b, c are all the necessary parameters (not necessary those names or number) for GULP and
results is a numpy array. Also you will need a
observed array with the same shape as
If attaching files is giving you trouble you can create a gist or send me the files by e-mail.
I’ve been using Sacred to launch and track experiments so as to preserve the input and output files. I can throw something together that does the job in terms of running and pulling out the experiments tonight so will send it on.
Could you clarify for me what is meant by an
observed array and
results array in this context? In your pseudocode, I assume
a, b, c are randomly selected variables for the given case? Where I am struggling is in differentiating between results (the value(s) to be minimised?) and observed array (…?).
Apologies for my ignorance!
a, b and c represents the parameters of GULP that you want to estimate, and yeah PyMC3 will sample those parameters. Let suppose you were interesting in computing the Young’s modulus and all that you have is a set of experimental measurement of the Young’s modulus and a parameterized model that allows you to compute the Young’s modulus, SMC-ABC will let you find the parameters that make the theoretical values of the Young’s modulus come closer to the experimental ones. In the previous example
results is the theoretical Young’s modulus and
observed array is the experimental values. Notice that the values in
observed array should not necessary be from an real experiment they can be from a simulation or some other source, by they are basically the data you want mimic with your simulator.
Ah, okay. So in this instance the coordinates I feed in are that of the experimental run I am trying to match. I then allow the system to relax and can get the atom displacements out. Given I am trying to match the structure, my
results for a given run would be the displacement of the atoms from their starting position, and my
observed would be a vector of zeros?
Here’s a Gist which provides
simulator(a, b, c, ...)
Sorry @mjclarke94 I did not see your message. I will try to take a look at you gist ASAP.
Update on this: SMC-ABC worked really well for my system. One thing I am curious about now is model validation. Once the model has converged, is there a way of sampling from the posterior distribution to get a feel for the confidence intervals of the outputs?
Thanks again, this approach did more in a few days idle coding than 4 months of banging my head against a wall has achieved!
I’m glad that this worked! It’d be nice to have an example for the community of using simulator SMC-ABC; would you mind sharing the model that ran for you?