2 months ago

A Unified Approach to Interpreting Model Predictions

Lundberg, Scott ; Lee, Su-In

Abstract

Understanding why a model makes a certain prediction can be as crucial as theprediction's accuracy in many applications. However, the highest accuracy forlarge modern datasets is often achieved by complex models that even expertsstruggle to interpret, such as ensemble or deep learning models, creating atension between accuracy and interpretability. In response, various methodshave recently been proposed to help users interpret the predictions of complexmodels, but it is often unclear how these methods are related and when onemethod is preferable over another. To address this problem, we present aunified framework for interpreting predictions, SHAP (SHapley AdditiveexPlanations). SHAP assigns each feature an importance value for a particularprediction. Its novel components include: (1) the identification of a new classof additive feature importance measures, and (2) theoretical results showingthere is a unique solution in this class with a set of desirable properties.The new class unifies six existing methods, notable because several recentmethods in the class lack the proposed desirable properties. Based on insightsfrom this unification, we present new methods that show improved computationalperformance and/or better consistency with human intuition than previousapproaches.