Neuroevolution potential

The neuroevolution potential (NEP) approach was proposed in [Fan2021] (NEP1) and later improved in [Fan2022a] (NEP2), [Fan2022b] (NEP3), and [Song2024] (NEP4). Currently, GPUMD only supports NEP3 and NEP4, as NEP1 and NEP2 are deprecated. Both versions are identical for single-component systems. For multi-component systems, NEP4 usually has higher accuracy, if all the other hyperparameters are the same.

GPUMD not only allows one to carry out simulations using NEP models via the gpumd executable but even the construction of such models via the nep executable.

The neural network model

NEP uses a simple feedforward neural network (NN) to represent the site energy of atom \(i\) as a function of a descriptor vector with \(N_\mathrm{des}\) components,

\[U_i(\mathbf{q}) = U_i \left(\{q^i_{\nu}\}_{\nu =1}^{N_\mathrm{des}}\right).\]

There is a single hidden layer with \(N_\mathrm{neu}\) neurons in the NN and we have

\[U_i = \sum_{\mu=1}^{N_\mathrm{neu}}w^{(1)}_{\mu}\tanh\left(\sum_{\nu=1}^{N_\mathrm{des}} w^{(0)}_{\mu\nu} q^i_{\nu} - b^{(0)}_{\mu}\right) - b^{(1)},\]

where \(\tanh(x)\) is the activation function in the hidden layer, \(\mathbf{w}^{(0)}\) is the connection weight matrix from the input layer (descriptor vector) to the hidden layer, \(\mathbf{w}^{(1)}\) is the connection weight vector from the hidden layer to the output node, which is the energy \(U_i\), \(\mathbf{b}^{(0)}\) is the bias vector in the hidden layer, and \(b^{(1)}\) is the bias for node \(U_i\).

The descriptor

The descriptor for atom \(i\) consists of a number of radial and angular components as described below.

The radial descriptor components are defined as

\[q^i_{n} = \sum_{j\neq i} g_{n}(r_{ij})\]

with

\[0\leq n\leq n_\mathrm{max}^\mathrm{R},\]

where the summation runs over all the neighbors of atom \(i\) within a certain cutoff distance. There are thus \(n_\mathrm{max}^\mathrm{R}+1\) radial descriptor components.

For the angular descriptor components, we consider 3-body to 5-body ones. The formulation is similar but not identical to the atomic cluster expansion (ACE) approach [Drautz2019]. For 3-body ones, we define (\(0\leq n\leq n_\mathrm{max}^\mathrm{A}\), \(1\leq l \leq l_\mathrm{max}^\mathrm{3b}\))

\[q^i_{nl} = \sum_{m=-l}^l (-1)^m A^i_{nlm} A^i_{nl(-m)},\]

where

\[A^i_{nlm} = \sum_{j\neq i} g_n(r_{ij}) Y_{lm}(\theta_{ij},\phi_{ij}),\]

and \(Y_{lm}(\theta_{ij},\phi_{ij})\) are the spherical harmonics as a function of the polar angle \(\theta_{ij}\) and the azimuthal angle \(\phi_{ij}\). For expressions of the 4-body and 5-body descriptor components, we refer to [Fan2022b].

The radial functions \(g_n(r_{ij})\) appear in both the radial and the angular descriptor components. In the radial descriptor components,

\[g_n(r_{ij}) = \sum_{k=0}^{N_\mathrm{bas}^\mathrm{R}} c^{ij}_{nk} f_k(r_{ij}),\]

with

\[f_k(r_{ij}) = \frac{1}{2} \left[T_k\left(2\left(r_{ij}/r_\mathrm{c}^\mathrm{R}-1\right)^2-1\right)+1\right] f_\mathrm{c}(r_{ij}),\]

and

\[\begin{split}f_\mathrm{c}(r_{ij}) = \begin{cases} \frac{1}{2}\left[ 1 + \cos\left( \pi \frac{r_{ij}}{r_\mathrm{c}^\mathrm{R}} \right) \right],& r_{ij}\leq r_\mathrm{c}^\mathrm{R}; \\ 0, & r_{ij} > r_\mathrm{c}^\mathrm{R}. \end{cases}\end{split}\]

In the angular descriptor components, \(g_n(r_{ij})\) have similar forms but with \(N_\mathrm{bas}^\mathrm{R}\) changed to \(N_\mathrm{bas}^\mathrm{A}\) and with \(r_\mathrm{c}^\mathrm{R}\) changed to \(r_\mathrm{c}^\mathrm{A}\).

Model dimensions

Number of …	Count
atom types	\(N_\mathrm{typ}\)
radial descriptor components	\(n_\mathrm{max}^\mathrm{R}+1\)
3-body angular descriptor components	\((n_\mathrm{max}^\mathrm{A}+1) l_\mathrm{max}^\mathrm{3b}\)
4-body angular descriptor components	\((n_\mathrm{max}^\mathrm{A}+1)\) or zero (if not used)
5-body angular descriptor components	\((n_\mathrm{max}^\mathrm{A}+1)\) or zero (if not used)
descriptor components	\(N_\mathrm{des}\) is the sum of the above numbers of descriptor components
trainable parameters \(c_{nk}^{ij}\) in the descriptor	\(N_\mathrm{typ}^2 [(n_\mathrm{max}^\mathrm{R}+1)(N_\mathrm{bas}^\mathrm{R}+1)+(n_\mathrm{max}^\mathrm{A}+1)(N_\mathrm{bas}^\mathrm{A}+1)]\)
trainable NN parameters	\(N_\mathrm{nn} = (N_\mathrm{des} +2) N_\mathrm{neu}+1\) (NEP3)
	\(N_\mathrm{nn} = (N_\mathrm{des} +2) N_\mathrm{neu} N_\mathrm{typ}+1\) (NEP4)

The total number of trainable parameters is the sum of the number of trainable descriptor parameters and the number of NN parameters \(N_\mathrm{nn}\).

Optimization procedure

The name of the NEP model is owed to the use of the separable natural evolution strategy (SNES) that is used for the optimization of the parameters [Schaul2011]. The interested reader is referred to [Schaul2011] and [Fan2021] for details.

The key quantity in the optimization procedure is the loss (or objective) function, which is being minimized. It is defined as a weighted sum over the loss terms associated with energies, forces and virials as well as the \(\mathcal{L}_1\) and \(\mathcal{L}_2\) norms of the parameter vector.

\[\begin{split}L(\boldsymbol{z}) &= \lambda_\mathrm{e} \left( \frac{1}{N_\mathrm{str}}\sum_{n=1}^{N_\mathrm{str}} \left( U^\mathrm{NEP}(n,\boldsymbol{z}) - U^\mathrm{tar}(n)\right)^2 \right)^{1/2} \nonumber \\ &+ \lambda_\mathrm{f} \left( \frac{1}{3N} \sum_{i=1}^{N} \left( \boldsymbol{F}_i^\mathrm{NEP}(\boldsymbol{z}) - \boldsymbol{F}_i^\mathrm{tar}\right)^2 \right)^{1/2} \nonumber \\ &+ \lambda_\mathrm{v} \left( \frac{1}{6N_\mathrm{str}} \sum_{n=1}^{N_\mathrm{str}} \sum_{\mu\nu} \left( W_{\mu\nu}^\mathrm{NEP}(n,\boldsymbol{z}) - W_{\mu\nu}^\mathrm{tar}(n)\right)^2 \right)^{1/2} \nonumber \\ &+ \lambda_1 \frac{1}{N_\mathrm{par}} \sum_{n=1}^{N_\mathrm{par}} |z_n| \nonumber \\ &+ \lambda_2 \left(\frac{1}{N_\mathrm{par}} \sum_{n=1}^{N_\mathrm{par}} z_n^2\right)^{1/2}.\end{split}\]

Here, \(N_\mathrm{str}\) is the number of structures in the training data set (if using a full batch) or the number of structures in a mini-batch (see the batch keyword in the nep.in input file) and \(N\) is the total number of atoms in these structures. \(U^\mathrm{NEP}(n,\boldsymbol{z})\) and \(W_{\mu\nu}^\mathrm{NEP}(n,\boldsymbol{z})\) are the per-atom energy and virial tensor predicted by the NEP model with parameters \(\boldsymbol{z}\) for the \(n^\mathrm{th}\) structure, and \(\boldsymbol{F}_i^\mathrm{NEP}(\boldsymbol{z})\) is the predicted force for the \(i^\mathrm{th}\) atom. \(U^\mathrm{tar}(n)\), \(W_{\mu\nu}^\mathrm{tar}(n)\), and \(\boldsymbol{F}_i^\mathrm{tar}\) are the corresponding target values. That is, the loss terms for energies, forces, and virials are defined as the respective RMSE values between the NEP predictions and the target values. The last two terms represent \(\mathcal{L}_1\) and \(\mathcal{L}_2\) regularization terms of the parameter vector. The weights \(\lambda_\mathrm{e}\), \(\lambda_\mathrm{f}\), \(\lambda_\mathrm{v}\), \(\lambda_1\), and \(\lambda_2\) are tunable hyper-parameters (see the eponymous keywords in the nep.in input file). When calculating the loss function, we use eV/atom for energies and virials and eV/Å for force components.