TY - GEN

T1 - An Interpretation of the Bellman Equation for Risk-Informed Decision Making

AU - Warns, Kyle

AU - Amin Shah, Asad Ullah

AU - Kim, Junyung

AU - Kang, Hyun Gook

N1 - Publisher Copyright:
© 2023 American Nuclear Society, Incorporated.

PY - 2023/7/20

Y1 - 2023/7/20

N2 - Traditionally, the Bellman Equation is an equation iteratively solved in support of finding the optimal policy of a Markov Decision Process (MDP). Once determined, optimal policies can be followed for any system state to maximize the reward obtained by the system. Thus, the implementation of MDP policies constitutes optimal autonomous control of a system or plant. However, the reward function which is optimized is designed by hand, typically with a large degree of arbitrary tuning to generate the desired decisions. In this work, it is demonstrated that rather than maximize an arbitrary reward, the dynamic evolution of the core damage frequency (cdf) measure of risk used in probabilistic risk assessment by the nuclear industry can be calculated using a value iteration scheme within an MDP. An optimal policy of operational actions to minimize system risk can then be determined. As such, this work presents a first step towards using a measure accepted within the nuclear community as the objective of a machine learning (ML) approach. This work thus supports moving one step closer to unraveling issues of transparency and interpretability that keep risk-informed ML decision making methods from experiencing large scale implementation in nuclear applications.

AB - Traditionally, the Bellman Equation is an equation iteratively solved in support of finding the optimal policy of a Markov Decision Process (MDP). Once determined, optimal policies can be followed for any system state to maximize the reward obtained by the system. Thus, the implementation of MDP policies constitutes optimal autonomous control of a system or plant. However, the reward function which is optimized is designed by hand, typically with a large degree of arbitrary tuning to generate the desired decisions. In this work, it is demonstrated that rather than maximize an arbitrary reward, the dynamic evolution of the core damage frequency (cdf) measure of risk used in probabilistic risk assessment by the nuclear industry can be calculated using a value iteration scheme within an MDP. An optimal policy of operational actions to minimize system risk can then be determined. As such, this work presents a first step towards using a measure accepted within the nuclear community as the objective of a machine learning (ML) approach. This work thus supports moving one step closer to unraveling issues of transparency and interpretability that keep risk-informed ML decision making methods from experiencing large scale implementation in nuclear applications.

KW - core damage frequency

KW - dynamic conditional risk measure

KW - Markov decision process

KW - probabilistic risk assessment

KW - risk-informed decision making

UR - http://www.scopus.com/inward/record.url?scp=85183315056&partnerID=8YFLogxK

U2 - 10.13182/NPICHMIT23-41143

DO - 10.13182/NPICHMIT23-41143

M3 - Conference contribution

AN - SCOPUS:85183315056

T3 - Proceedings of 13th Nuclear Plant Instrumentation, Control and Human-Machine Interface Technologies, NPIC and HMIT 2023

SP - 478

EP - 484

BT - Proceedings of 13th Nuclear Plant Instrumentation, Control and Human-Machine Interface Technologies, NPIC and HMIT 2023

PB - American Nuclear Society

T2 - 13th Nuclear Plant Instrumentation, Control and Human-Machine Interface Technologies, NPIC and HMIT 2023

Y2 - 15 July 2023 through 20 July 2023

ER -