TY - JOUR

T1 - Efficient Bayesian inference with latent Hamiltonian neural networks in No-U-Turn Sampling

AU - Dhulipala, Somayajulu L.N.

AU - Che, Yifeng

AU - Shields, Michael D.

N1 - Funding Information:
This research is supported through the INL Laboratory Directed Research & Development (LDRD) Program under DOE Idaho Operations Office Contract DE-AC07-05ID14517. This research made use of the resources of the High-Performance Computing Center at INL, which is supported by the Office of Nuclear Energy of the U.S. DOE and the Nuclear Science User Facilities under Contract No. DE-AC07-05ID14517.
Publisher Copyright:
© 2023

PY - 2023/11/1

Y1 - 2023/11/1

N2 - When sampling for Bayesian inference, one popular approach in the computational field is to use Hamiltonian Monte Carlo (HMC) and specifically the No-U-Turn Sampler (NUTS), which automatically decides the end time of the Hamiltonian trajectory. However, HMC and NUTS can require numerous numerical gradients of the target density, and can prove slow in practice when relying on computationally expensive forward models. We propose Latent Hamiltonian neural networks (L-HNNs) with HMC and NUTS for solving Bayesian inference problems. Once trained, L-HNNs do not require numerical gradients of the target density during sampling, and hence numerous evaluations of the forward computational model. Moreover, L-HNNs satisfy important properties such as perfect time reversibility and Hamiltonian conservation, making them well-suited for use within HMC and NUTS because stationarity can be shown. We also propose the integration of L-HNNs in an online error monitoring scheme, in which numerical gradients of the target density are used for a few samples whenever the L-HNNs prediction errors are large. This online error monitor scheme prevents sample degeneracy in regions of low probability density and ensures robust uncertainty quantification. We demonstrate L-HNNs in NUTS with online error monitoring on several analytical examples involving complex, heavy-tailed, and high-local-curvature probability densities. We then demonstrate the applicability of L-HNNs in NUTS to two computational case studies, namely the Allen-Cahn stochastic partial differential equation and an elliptic partial differential equation with 25 and 50 inference parameters, respectively. Overall, the L-HNNs in NUTS with online error monitoring satisfactorily inferred these probability densities. Compared to traditional NUTS, L-HNNs in NUTS with online error monitoring required 1–2 orders of magnitude fewer numerical gradients of the target density and improved the effective sample size (ESS) per gradient (which is a measure of both the sampling quality and the computational expense) by an order of magnitude.

AB - When sampling for Bayesian inference, one popular approach in the computational field is to use Hamiltonian Monte Carlo (HMC) and specifically the No-U-Turn Sampler (NUTS), which automatically decides the end time of the Hamiltonian trajectory. However, HMC and NUTS can require numerous numerical gradients of the target density, and can prove slow in practice when relying on computationally expensive forward models. We propose Latent Hamiltonian neural networks (L-HNNs) with HMC and NUTS for solving Bayesian inference problems. Once trained, L-HNNs do not require numerical gradients of the target density during sampling, and hence numerous evaluations of the forward computational model. Moreover, L-HNNs satisfy important properties such as perfect time reversibility and Hamiltonian conservation, making them well-suited for use within HMC and NUTS because stationarity can be shown. We also propose the integration of L-HNNs in an online error monitoring scheme, in which numerical gradients of the target density are used for a few samples whenever the L-HNNs prediction errors are large. This online error monitor scheme prevents sample degeneracy in regions of low probability density and ensures robust uncertainty quantification. We demonstrate L-HNNs in NUTS with online error monitoring on several analytical examples involving complex, heavy-tailed, and high-local-curvature probability densities. We then demonstrate the applicability of L-HNNs in NUTS to two computational case studies, namely the Allen-Cahn stochastic partial differential equation and an elliptic partial differential equation with 25 and 50 inference parameters, respectively. Overall, the L-HNNs in NUTS with online error monitoring satisfactorily inferred these probability densities. Compared to traditional NUTS, L-HNNs in NUTS with online error monitoring required 1–2 orders of magnitude fewer numerical gradients of the target density and improved the effective sample size (ESS) per gradient (which is a measure of both the sampling quality and the computational expense) by an order of magnitude.

KW - Bayesian inference

KW - Deep neural networks

KW - Hamiltonian Monte Carlo

KW - Physics-based learning

KW - Symplectic integration

KW - Uncertainty quantification

UR - http://www.scopus.com/inward/record.url?scp=85168414981&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/e4f64643-ee71-37af-bc52-75b4308cdbaf/

U2 - 10.1016/j.jcp.2023.112425

DO - 10.1016/j.jcp.2023.112425

M3 - Article

AN - SCOPUS:85168414981

SN - 0021-9991

VL - 492

SP - 112425

JO - Journal of Computational Physics

JF - Journal of Computational Physics

M1 - 112425

ER -