Swarm of Trajectories

Introduction

Like all string methods in general, the String Method with Swarms of Trajectories (often abbreviated to “Swarm of Trajectories” or even more simply “SoT”) is a method to identify a transition pathway in an arbitrarily high-dimensional collective variable space between two metastable states of a system. This pathway (the string) is a parametrized curve discretized into a set of images, each of which is itself a molecular system. The classical String Method in Collective Variables [15] evolves each image by estimating a mean force and metric tensor at each image with restrained molecular dynamics simulations. In the SoT method, the string is instead evolved by launching a large number (a swarm) of unrestrained trajectories from each image and estimating the average drift of the collective variables over the swarm.

The mathematical background of the method can be expressed in a few relatively straightforward equations, with further detail available in the original work of Benoît Roux and collaborators [17]. First, consider a path \(z(\alpha)\) constructed between two metastable states, such that \(\alpha=0\) represents the starting state and \(\alpha=1\) is the final state. The “Most Probable Transition Pathway” (MPTP) is defined such that a molecular system started from anywhere on the path will most probably evolve while staying on the path. It is shown in the original work that a mathematical definition for such a path is given when the collective variables evolve according to:

\[z_{i}(\alpha) = z_{i}(\alpha') + \sum\limits_{j}\left( \beta D_{ij}\left[ z(0) \right] F_{j}\left[z(0)\right] + \frac{\partial}{\partial z_{j}}\left( D_{ij}\left[z(0)\right]\right) \right)\delta\tau\]

Where the following notation is used: \(z_{i}\) represents the collective variables belonging to the string, \(\alpha\) represents the parameter identifying that point on the string, \(\beta\) represents the temperature, \(D_{ij}\) represents the diffusion tensor, \(F_{j}\) represents the mean force, \(z\) represents the collective variables constructed from the molecular system at a given moment in time, and \(\delta\tau\) represents the time step of the evolution of the dynamics. The SoT method approximates this equation using the average drift evaluated from a large number of unbiased trajectories, each of length \(\delta\tau\), launched from each image:

\[\overline{\Delta z_{i}(\delta\tau)} = \overline{z_{i}(\delta\tau) - z_{i}(0)} \equiv \sum\limits_{j} \left( \beta D_{ij}\left[z(0)\right] F_{j}\left[z(0)]\right] + \frac{\partial}{\partial z_{j}}\left( D_{ij}\left[ z(0)\right]\right)\right)\delta\tau\]

Like all string methods, there is an additional step beyond evolving the collective variables - after one iteration of evolution, the images along the path must be reparameterized such that they lie (for example) an equal arc length apart. This step is necessary to ensure that all images do not fall into one metastable basin or the other.

Algorithmically, the SoT method is implemented as follows:

  1. An initial string is defined between the two states of interest. This can be defined however one wishes; often it is simply a linear interpolation through the space of the collective variables. In fact, the ends of the string need not necessarily be in the basins of interest; the dynamic nature of the method should allow the ends to naturally fall into nearby metastable basins.

  2. For each image of the string, a molecular system with atomic coordinates that roughly correspond to the collective variables of that image is constructed.

  3. A set of equilibrium trajectories are generated from that system by performing restrained sampling around the image’s collective variables.

  4. That set of equilibrium trajectories is used as the starting point of a large number of short unbiased trajectories; the resulting average displacement of each collective variable is used to update the positions of the images.

  5. A reparameterization scheme is enforced to ensure that, for example, the string images are equally distant in collective variable space.

Steps 2–5 are iterated upon, leading to convergence of the method and the MPTP.

Options & Parameters

To construct a Swarm input file, the following options are available. A complete Swarm JSON file will inherit some of its inputs from the String schema (for parameters common to all string methods). The options unique to Swarm are:

initial_steps

For each iteration of the method, this is the number of steps to spend doing restrained sampling and not harvesting trajectories. This time is important to ensure the underlying molecular system’s CV values are close to the string CV values.

harvest_length

After the initial restraining is finished, a trajectory is harvested for later use in launching an unrestrained trajectory every so often - harvest length specifies how often this will be done. Harvest length multiplied by number of trajectories (see below) will determine overall how many more steps will be taken under restrained sampling.

number_of_trajectories

The total number of unrestrained trajectories to be included in each swarm.

swarm_length

The length of each unrestrained trajectory in the swarm. Swarm length multiplied by number of trajectories specifies how many total steps will be spent doing unrestrained sampling.

From the String schema, the options are:

type

This parameter identifies that a String-type method is being used, and thus should be set to “String”.

flavor

This parameter identifies the specific kind of string-type method being used; for swarm, it should be set to “Swarm”.

centers

The initial values of each CV for each image on the string are specified under “centers”, which is an array of size equal to the total number of images, with each entry consisting of an array with size equal to the number of CVs used for the string method. In this way, the initial string is defined.

tolerance

This is a tolerance threshold that can be set to trigger the end of the method; it is a percentage by which, if no node CV changes by this percentage, the method will end. It must be specified as an array with one entry for each CV desired.

max_iterations

A complementary stopping criterion can be specified; the method will stop if it undergoes this many iterations of the string method.

ksprings

A unique spring constant must be defined for each CV; its purpose is described above.

frequency

The frequency of each integration step. This should almost always be set to 1.

Tutorial

This tutorial will walk you step by step through the user example provided with the SSAGES source code that runs the SoT method on the alanine dipeptide using LAMMPS. First, be sure you have compiled SSAGES with LAMMPS. Then, navigate to the SSAGES/Examples/User/Swarm/ADP subdirectory. Now, take a moment to observe the in.ADP_Test and data.input files. In general, these should be the same as what you would use for any other method, but for the SoT method, it is important to define a larger skin distance than one normally would in the neighbor command in LAMMPS. This is because, under the hood, each unrestrained trajectory in the swarm is started by manually resetting the positions of each atom in the LAMMPS simulation to the start of a new trajectory. From the perspective of LAMMPS, this is a huge amount of distance to move in a single time step; this move triggers neighbor list rebuilding, but LAMMPS considers it a “dangerous build” which threatens to crash the simulation. Thus, we increase the skin distance, which forces LAMMPS to keep track of more pairs in the neighbor lists, and thus reduces the number of dangerous builds. Keep this in mind for future runs of the SoT method.

The next two files of interest are the Template_Input.json input file and the Input_Generator.py script. Both of these files can be modified in your text editor of choice to customize the inputs, but for this tutorial, simply observe them and leave them be. Template_Input.json contains all the information necessary to fully specify one driver; Input_Generator.py copies this information a number of times specified within the script (for this tutorial, 22 times) while also linearly interpolating through the start and end states defined in the script and substituting the correct values into the “centers” portion of the method definition. Execute this script as follows:

python Input_Generator.py

You will produce a file called Swarm.json. You can also open this file to verify for yourself that the script did what it was supposed to do. Now, with your JSON input and your SSAGES binary, you have everything you need to perform a simulation. Simply run:

mpiexec -np 22 ./ssages Swarm.json

Soon, the simulation will produce a node-X.log file for each driver, where X is the number specifying the driver (in this case, 0-21 for our 22 drivers). Each one will report the following information, in order: the node number, the iteration number, and for each CV, the current value of the string CV as well as the current value of the CV calculated from the molecular system.

Allow your system to run for the desired number of MD steps, but keep an eye on it - the system should exit once one driver reaches the maximum number of MD steps, but it is possible that instead one driver will exit and the rest will get stuck. Check in on your node files and see if they’ve been updated recently - if not, the simulation has likely finished. Once this is done, you can execute the included plotter.py function in a directory containing the node files with the command line argument of how many images your string had. The script also accepts an argument to plot a free energy surface alongside the string (generated with another method), but that goes beyond the scope of this tutorial. Thus, simply execute:

python plotter.py 22 none

And in a moment you should have a graph of your converged string. Thus concludes this tutorial.

Developer

  • Cody Bezik