Causal Attention
Causal Attention (CATT) is an innovative attention mechanism that improves the interpretability and performance of models by incorporating causal inference, especially in vision-language tasks. This mechanism was proposed by researchers from Nanyang Technological University and Monash University in Australia in 2021. The related paper is titled “Causal Attention for Vision-Language Tasks".
The core idea of causal attention is to use the "front-door criterion" in causal inference to solve the problem of false correlation in training data. In the traditional self-attention mechanism, due to the lack of supervision, the attention weight may be affected by data bias, causing the model to mislead during reasoning. For example, in the image description task, if there are more scenes of "people riding horses" than "people driving carriages" in the training data, the model may mistakenly associate the action of "riding" with "people" and "horses", and ignore the existence of "carriages".
To solve this problem, researchers proposed the causal attention mechanism, which identifies and strengthens causal effects by:
- In-Sample Attention (IS-ATT): Perform attention calculation within a single sample to avoid interference from other samples.
- Cross-Sample Attention (CS-ATT): Introduce the information of other samples into the attention calculation of the current sample, imitating the effect of causal intervention.
This mechanism can be used as a pluggable module to replace existing self-attention mechanisms, such as the attention module in Transformer. Experimental results show that causal attention can significantly improve the performance of the model on tasks such as image description and visual question answering.