ICLR2025

A Black Swan Hypothesis: The Role of Human Irrationality in AI Safety

Hyunin Lee, Chanwoo Park, David Abel, Ming Jin

Abstract

Black swan events are statistically rare occurrences that carry extremely high risks. A standard view of black swans assumes that they originate from an unpredictable and changing environment; however, the community lacks a comprehensive definition of black swan events. To this end, this paper challenges that the standard view is incomplete and claims that high-risk, statistically rare events can also occur in unchanging environments due to human misperception of events' values and likelihoods, which we refer to as S-BLACK SWAN . We first carefully categorize black swan events, focusing on S-BLACK SWAN , and mathematically formalize the definition of black swan events. We hope these definitions can pave the way for the development of algorithms to prevent such events by rationally correcting limitations in perception.