Explainable AI

Explainable AI (XAI) or interpretable AI is a branch of AI where the results from the models can be understood by the users. XAI explains the predictions made by a model to the user and aims to build user's trust and accountability on AI predictions. XAI is a framework that contains different packages to help decipher the model predictions and build human interpretable models. XAI is a sub-field of responsible AI that focuses on creating models that are accountable and easy to understand for the users. Responsible AI is an umbrella term that ensures the following features in an AI system:
- Fairness: System should give unbiased results
- Privacy: Ensuring that sensitive user information in the data is .
- Reliability or Robustness: Ensuring that small changes in the input do not lead to large changes in the prediction.
- Causality: Check that only causal and important relationships are picked up.
- Trust: System explains its decisions for the results
XAI belongs to the trust component. It ensures that the systems are making the right decisions for the right reasons. Further, XAI will highlight why a model failed and can help one to learn more about the problems, input data, and the reason why a model might fail. XAI is useful for data scientists for debugging models to improve accuracy and performance. It also helps the end-users in understanding its predictions like in the medical field or recommender system predictions in e-commerce sites. Public stakeholders also benefit from XAI as it helps them to decide if a system is safe and fit to be used by people and organizations and design policies based on them.
The framework deals with both simple as well as black box models where no explanation exists for the results. Black box models are complex models like ensemble or neural networks where users give input data and the model processes the input and gives the results without explaining the steps that it took to arrive at the results. A black box model's behavior cannot be comprehended even when the weights and internal steps of the models are known. The black box models are always interpreted after we have got the results. On the other hand, simple models like linear regression and decision trees are intuitive and the predictions can be arrived at by tracing the steps taken by the model. A model's prediction is nothing but mathematical optimization over a set of probabilities. XAI does not highlight the exact mathematical way in which a model arrives at the output but tries to explain the result in terms of features, weights, etc that a human can comprehend.
Machine learning interpretability can be divided into different categories based on different criteria
Criterion 1: The stage at which interpretation is done on a model
Intrinsic methods: Interpretation is done by restricting the complexity of an ML model. These include models that have simple structure and are easily interpretable
Post hoc methods: Interpretation is done through methods that analyze models after it is trained. It is also known as model agnostic interpretation
Criterion 2: The different types of methods to show the interpretation results
Feature summary statistic: Interpretation methods that give summary numbers for each feature like feature strength, pairwise feature interaction strength, etc
Feature summary visualization: Interpretation methods that visualize the summary statistics through graphs, plots, and other visualization methods. A partial dependence plot is an example.
Model internals (e.g. learned weights): The interpretation of models that have simple structure or are intrinsically interpretable belongs to this category. Examples are the weights in linear models or the learned tree structure (the features and thresholds used for the splits) of decision trees.
Data point: Methods that give data points to make a model interpretable. For example, in counterfactual explanation, the method tries to explain the prediction by finding a similar point where the predicted outcome changes significantly when some features are changed
Intrinsically interpretable model: Interpretation methods for black box models that approximate (either globally or locally) with an interpretable model. It is done by looking at model parameters or feature summary statistics
Criterion 3: The various types of interpretation method
Model specific: Interpretation methods that are specific to a given model.For example the regression weights in a linear model
Model agnostic: Interpretation method that does not belong to any specific model and is usually applied after training a model. For example feature input and output pairs analysis
Criterion 4: The scope or extent of interpretation
Global interpretation: The interpretation method explains the entire model behavior. For example, the trend for house rent cost trend in a given area
Local interpretation: The interpretation method explains the specific prediction of the model. For example, the higher predicted cost of a particular house in a given area
Reference
Molnar, Christoph. 2022-02-06, Interpretable machine learning, Second edition, https://christophm.github.io/interpretable-ml-book/