NeurIPS2022

On Computing Probabilistic Explanations for Decision Trees

Marcelo Arenas, Pablo Barceló, Miguel A. Romero Orth, Bernardo Subercaseaux

被引用 52 次

摘要

Formal XAI (explainable AI) is a growing area that focuses on computing explanations with mathematical guarantees for the decisions made by ML models. Inside formal XAI, one of the most studied cases is that of explaining the choices taken by decision trees, as they are traditionally deemed as one of the most interpretable classes of models. Recent work has focused on studying the computation of sufficient reasons, a kind of explanation in which given a decision tree and an instance , one explains the decision ( ) by providing a subset of the features of such that for any other instance compatible with , it holds that ( ) = ( ), intuitively meaning that the features in are already enough to fully justify the classification of by . It has been argued, however, that sufficient reasons constitute a restrictive notion of explanation. For such a reason, the community has started to study their probabilistic counterpart, in which one requires that the probability of ( ) = ( ) must be at least some value ∈ (0, 1], where is a random instance that is compatible with . Our paper settles the computational complexity of -sufficient-reasons over decision trees, showing that both (1) finding -sufficient-reasons that are minimal in size, and (2) finding -sufficient-reasons that are minimal inclusion-wise, do not admit polynomial-time algorithms (unless PTIME = NP). This is in stark contrast with the deterministic case ( = 1) where inclusion-wise minimal sufficient-reasons are easy to compute. By doing this, we answer two open problems originally raised by Izza et al., and extend the hardness of explanations for Boolean circuits presented by Wäldchen et al. to the more restricted case of decision trees. On the positive side, we identify structural restrictions of decision trees that make the problem tractable, and show how SAT solvers might be able to tackle these problems in practical settings. Problem. In the last few years, the XAI community has studied for which Boolean ML models the problem of computing (minimum or minimal) sufficient reasons is computationally tractable and for which it is computationally hard (see, e.g., (Barceló et al., 2020; Marques-Silva et al., 2020 , 2021)). It has been argued, however, that for practical applications sufficient reasons might be too rigid, as they are specified under worst-case conditions. That is, is a sufficient reason for under ℳ if every "completion" of is classified by ℳ in the same way as . As several authors have noted already, there is a natural way in which this notion can be relaxed in order to become more suitable for real-world explainability tasks: Instead of asking for each completion of to yield the same result as on ℳ, we could allow for a small fragment of the completions of to be classified differently than (Izza et al., 2021; Wäldchen et al., 2021; Wang et al., 2021) . More precisely, we would like to ensure that a random completion of is classified as with probability at least ∈ (0, 1], a threshold that the recipient of the explanation controls. In such case, we call a -sufficient reason for e under ℳ. The study of the cost of computing minimum -sufficient reasons for expressive Boolean ML models based on propositional formulas was started by Wäldchen et al. (2021) . They show, in particular, that the decision problem of checking if admits a -sufficient reason of a certain size under a model ℳ, where ℳ is specified as a CNF formula, is NP PP -complete. This result shows that the problem is very difficult for complex models, at least in theoretical terms. Nonetheless, it leaves the door open for obtaining tractability results over simpler Boolean models, starting from those which are often deemed to be "easy to interpret", e.g., decision trees (Gilpin et al., 2018; Izza et al., 2020a; Lipton, 2018) . In particular, the study of the cost of computing both minimum and minimal -sufficient reasons for decision trees was initiated by Izza et al. (2021 Izza et al. ( , 2022)) , but nothing beyond the fact that the problem lies in NP has been obtained. Work by Blanc et al. (2021) has shown that it is possible to obtain efficient algorithms that succeed with a certain probability, and that instead of finding a smallest (either cardinality or inclusion-wise) -sufficient reason, find -sufficient reasons that are small compared to the average size of -sufficient reasons for the considered model. Our results. In this paper we provide an in-depth study of the complexity of the problem of minimum and minimal -sufficient reasons for decision trees.