HyperAI

FlexAttention is a new API made public by the PyTorch team in July 2024 that provides a flexible interface that allows implementing many attention variants in a few lines of typical PyTorch code and using torch.compile It is reduced to a fused FlashAttention kernel, thus providing flexibility without sacrificing performance.FlexAttention for Efficient High-Resolution Vision-Language Models", has been accepted by ECCV 2024.

FlexAttention is a flexible attention mechanism designed to improve the efficiency of high-resolution visual language models. The mechanism significantly reduces the computational cost by encoding high-resolution and low-resolution image tags and computing the attention map using only the low-resolution tags and a few selected high-resolution tags. The selection of high-resolution tags is performed by a high-resolution selection module that can retrieve tags for relevant regions based on the input attention map. The selected high-resolution tags are then input to the hierarchical self-attention layer together with the low-resolution tags and text tags, and the attention map generated by this layer is used for the next step of high-resolution tag selection. This process is iterated at each attention layer. Experiments show that FlexAttention outperforms existing high-resolution visual language models on multimodal benchmarks while significantly reducing the computational cost by nearly 40%.

FlexAttention

Build AI with AI

Hyper Newsletters

Command Palette

FlexAttention

Build AI with AI

Hyper Newsletters