Neuroevolution potential

The neuroevolution potential (NEP) approach was proposed in [Fan2021] (NEP1) and later improved in [Fan2022a] (NEP2) and [Fan2022b] (NEP3). Currently, GPUMD supports NEP2, NEP3 and NEP4 (to be published). All versions have comparable accuracy for single-component systems. For multi-component systems, NEP2 has the lowest accuracy, and NEP4 has the highest, if all the other hyperparameters are the same.

GPUMD not only allows one to carry out simulations using NEP models via the gpumd executable but even the construction of such models via the nep executable.

The neural network model

NEP uses a simple feedforward neural network (NN) to represent the site energy of atom \(i\) as a function of a descriptor vector with \(N_\mathrm{des}\) components,

\[U_i(\mathbf{q}) = U_i \left(\{q^i_{\nu}\}_{\nu =1}^{N_\mathrm{des}}\right).\]

There is a single hidden layer with \(N_\mathrm{neu}\) neurons in the NN and we have

\[U_i = \sum_{\mu=1}^{N_\mathrm{neu}}w^{(1)}_{\mu}\tanh\left(\sum_{\nu=1}^{N_\mathrm{des}} w^{(0)}_{\mu\nu} q^i_{\nu} - b^{(0)}_{\mu}\right) - b^{(1)},\]

where \(\tanh(x)\) is the activation function in the hidden layer, \(\mathbf{w}^{(0)}\) is the connection weight matrix from the input layer (descriptor vector) to the hidden layer, \(\mathbf{w}^{(1)}\) is the connection weight vector from the hidden layer to the output node, which is the energy \(U_i\), \(\mathbf{b}^{(0)}\) is the bias vector in the hidden layer, and \(b^{(1)}\) is the bias for node \(U_i\).

The descriptor

The descriptor for atom \(i\) consists of a number of radial and angular components as described below.

The radial descriptor components are defined as

\[q^i_{n} = \sum_{j\neq i} g_{n}(r_{ij})\]

with

\[0\leq n\leq n_\mathrm{max}^\mathrm{R},\]

where the summation runs over all the neighbors of atom \(i\) within a certain cutoff distance. There are thus \(n_\mathrm{max}^\mathrm{R}+1\) radial descriptor components.

For the angular descriptor components, we consider 3-body to 5-body ones. The formulation is similar but not identical to the atomic cluster expansion (ACE) approach [Drautz2019]. For 3-body ones, we define (\(0\leq n\leq n_\mathrm{max}^\mathrm{A}\), \(1\leq l \leq l_\mathrm{max}^\mathrm{3b}\))

\[q^i_{nl} = \sum_{m=-l}^l (-1)^m A^i_{nlm} A^i_{nl(-m)},\]

where

\[A^i_{nlm} = \sum_{j\neq i} g_n(r_{ij}) Y_{lm}(\theta_{ij},\phi_{ij}),\]

and \(Y_{lm}(\theta_{ij},\phi_{ij})\) are the spherical harmonics as a function of the polar angle \(\theta_{ij}\) and the azimuthal angle \(\phi_{ij}\). For expressions of he 4-body and 5-body descriptor components, we refer to [Fan2022b].

The radial functions \(g_n(r_{ij})\) appear in both the radial and the angular descriptor components. For NEP3 [Fan2022b], in the radial descriptor components,

\[g_n(r_{ij}) = \sum_{k=0}^{N_\mathrm{bas}^\mathrm{R}} c^{ij}_{nk} f_k(r_{ij}),\]

with

\[f_k(r_{ij}) = \frac{1}{2} \left[T_k\left(2\left(r_{ij}/r_\mathrm{c}^\mathrm{R}-1\right)^2-1\right)+1\right] f_\mathrm{c}(r_{ij}),\]

and

\[\begin{split}f_\mathrm{c}(r_{ij}) = \begin{cases} \frac{1}{2}\left[ 1 + \cos\left( \pi \frac{r_{ij}}{r_\mathrm{c}^\mathrm{R}} \right) \right],& r_{ij}\leq r_\mathrm{c}^\mathrm{R}; \\ 0, & r_{ij} > r_\mathrm{c}^\mathrm{R}. \end{cases}\end{split}\]

For NEP2, the trainable parameters \(c^{ij}_{nk}\) reduce to \(c^{ij}_{n}\delta_{nk}\). In the angular descriptor components, \(g_n(r_{ij})\) have similar forms but with \(N_\mathrm{bas}^\mathrm{R}\) changed to \(N_\mathrm{bas}^\mathrm{A}\) and with \(r_\mathrm{c}^\mathrm{R}\) changed to \(r_\mathrm{c}^\mathrm{A}\).

Model dimensions

Number of …

Count

atom types

\(N_\mathrm{typ}\)

radial descriptor components

\(n_\mathrm{max}^\mathrm{R}+1\)

3-body angular descriptor components

\((n_\mathrm{max}^\mathrm{A}+1) l_\mathrm{max}^\mathrm{3b}\)

4-body angular descriptor components

\((n_\mathrm{max}^\mathrm{A}+1)\) or zero (if not used)

5-body angular descriptor components

\((n_\mathrm{max}^\mathrm{A}+1)\) or zero (if not used)

descriptor components

\(N_\mathrm{des}\) is the sum of the above numbers

trainable parameters \(c_{nk}^{ij}\) in the descriptor

\(N_\mathrm{typ}^2 [(n_\mathrm{max}^\mathrm{R}+1)+(n_\mathrm{max}^\mathrm{A}+1)]\) (NEP2)

\(N_\mathrm{typ}^2 [(n_\mathrm{max}^\mathrm{R}+1)(N_\mathrm{bas}^\mathrm{R}+1)+(n_\mathrm{max}^\mathrm{A}+1)(N_\mathrm{bas}^\mathrm{A}+1)]\) (NEP3, NEP4)

trainable NN parameters

\(N_\mathrm{nn} = (N_\mathrm{des} +2) N_\mathrm{neu}+1\) (NEP2, NEP3)

\(N_\mathrm{nn} = (N_\mathrm{des} +2) N_\mathrm{neu} N_\mathrm{typ}+1\) (NEP4)

The total number of trainable parameters is the sum of the number of trainable descriptor parameters and the number of NN parameters \(N_\mathrm{nn}\).

Optimization procedure

The name of the NEP model is owed to the use of the separable natural evolution strategy (SNES) that is used for the optimization of the parameters [Schaul2011]. The interested reader is referred to [Schaul2011] and [Fan2021] for details.

The key quantity in the optimization procedure is the loss (or objective) function, which is being minimized. It is defined as a weighted sum over the loss terms associated with energies, forces and virials as well as the \(\mathcal{L}_1\) and \(\mathcal{L}_2\) norms of the parameter vector.

\[\begin{split}L(\boldsymbol{z}) &= \lambda_\mathrm{e} \left( \frac{1}{N_\mathrm{str}}\sum_{n=1}^{N_\mathrm{str}} \left( U^\mathrm{NEP}(n,\boldsymbol{z}) - U^\mathrm{tar}(n)\right)^2 \right)^{1/2} \nonumber \\ &+ \lambda_\mathrm{f} \left( \frac{1}{3N} \sum_{i=1}^{N} \left( \boldsymbol{F}_i^\mathrm{NEP}(\boldsymbol{z}) - \boldsymbol{F}_i^\mathrm{tar}\right)^2 \right)^{1/2} \nonumber \\ &+ \lambda_\mathrm{v} \left( \frac{1}{6N_\mathrm{str}} \sum_{n=1}^{N_\mathrm{str}} \sum_{\mu\nu} \left( W_{\mu\nu}^\mathrm{NEP}(n,\boldsymbol{z}) - W_{\mu\nu}^\mathrm{tar}(n)\right)^2 \right)^{1/2} \nonumber \\ &+ \lambda_1 \frac{1}{N_\mathrm{par}} \sum_{n=1}^{N_\mathrm{par}} |z_n| \nonumber \\ &+ \lambda_2 \left(\frac{1}{N_\mathrm{par}} \sum_{n=1}^{N_\mathrm{par}} z_n^2\right)^{1/2}.\end{split}\]

Here, \(N_\mathrm{str}\) is the number of structures in the training data set (if using a full batch) or the number of structures in a mini-batch (see the batch keyword in the nep.in input file) and \(N\) is the total number of atoms in these structures. \(U^\mathrm{NEP}(n,\boldsymbol{z})\) and \(W_{\mu\nu}^\mathrm{NEP}(n,\boldsymbol{z})\) are the per-atom energy and virial tensor predicted by the NEP model with parameters \(\boldsymbol{z}\) for the \(n^\mathrm{th}\) structure, and \(\boldsymbol{F}_i^\mathrm{NEP}(\boldsymbol{z})\) is the predicted force for the \(i^\mathrm{th}\) atom. \(U^\mathrm{tar}(n)\), \(W_{\mu\nu}^\mathrm{tar}(n)\), and \(\boldsymbol{F}_i^\mathrm{tar}\) are the corresponding target values. That is, the loss terms for energies, forces, and virials are defined as the respective RMSE values between the NEP predictions and the target values. The last two terms represent \(\mathcal{L}_1\) and \(\mathcal{L}_2\) regularization terms of the parameter vector. The weights \(\lambda_\mathrm{e}\), \(\lambda_\mathrm{f}\), \(\lambda_\mathrm{v}\), \(\lambda_1\), and \(\lambda_2\) are tunable hyper-parameters (see the eponymous keywords in the nep.in input file). When calculating the loss function, we use eV/atom for energies and virials and eV/Å for force components.