"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Despite widespread adoption, machine learning models remain mostly blackboxes. Understanding the reasons behind predictions is, however, quiteimportant in assessing trust, which is fundamental if one plans to take actionbased on a prediction, or when choosing whether to deploy a new model. Suchunderstanding also provides insights into the model, which can be used totransform an untrustworthy model or prediction into a trustworthy one. In thiswork, we propose LIME, a novel explanation technique that explains thepredictions of any classifier in an interpretable and faithful manner, bylearning an interpretable model locally around the prediction. We also proposea method to explain models by presenting representative individual predictionsand their explanations in a non-redundant way, framing the task as a submodularoptimization problem. We demonstrate the flexibility of these methods byexplaining different models for text (e.g. random forests) and imageclassification (e.g. neural networks). We show the utility of explanations vianovel experiments, both simulated and with human subjects, on various scenariosthat require trust: deciding if one should trust a prediction, choosing betweenmodels, improving an untrustworthy classifier, and identifying why a classifiershould not be trusted.