FlexAttention
FlexAttention is a new API made public by the PyTorch team in July 2024 that provides a flexible interface that allows implementing many attention variants in a few lines of typical PyTorch code and using torch.compile
It is reduced to a fused FlashAttention kernel, thus providing flexibility without sacrificing performance.FlexAttention for Efficient High-Resolution Vision-Language Models", has been accepted by ECCV 2024.
FlexAttention is a flexible attention mechanism designed to improve the efficiency of high-resolution visual language models. The mechanism significantly reduces the computational cost by encoding high-resolution and low-resolution image tags and computing the attention map using only the low-resolution tags and a few selected high-resolution tags. The selection of high-resolution tags is performed by a high-resolution selection module that can retrieve tags for relevant regions based on the input attention map. The selected high-resolution tags are then input to the hierarchical self-attention layer together with the low-resolution tags and text tags, and the attention map generated by this layer is used for the next step of high-resolution tag selection. This process is iterated at each attention layer. Experiments show that FlexAttention outperforms existing high-resolution visual language models on multimodal benchmarks while significantly reducing the computational cost by nearly 40%.