HyperAIHyperAI
15 days ago

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

He, Ziwei, Yang, Meng, Feng, Minwei, Yin, Jingcheng, Wang, Xinbing, Leng, Jingwen, Lin, Zhouhan
Fourier Transformer: Fast Long Range Modeling by Removing Sequence
  Redundancy with FFT Operator
Abstract

The transformer model is known to be computationally demanding, andprohibitively costly for long sequences, as the self-attention module uses aquadratic time and space complexity with respect to sequence length. Manyresearchers have focused on designing new forms of self-attention orintroducing new parameters to overcome this limitation, however a large portionof them prohibits the model to inherit weights from large pretrained models. Inthis work, the transformer's inefficiency has been taken care of from anotherperspective. We propose Fourier Transformer, a simple yet effective approach byprogressively removing redundancies in hidden sequence using the ready-madeFast Fourier Transform (FFT) operator to perform Discrete Cosine Transformation(DCT). Fourier Transformer is able to significantly reduce computational costswhile retain the ability to inherit from various large pretrained models.Experiments show that our model achieves state-of-the-art performances amongall transformer-based models on the long-range modeling benchmark LRA withsignificant improvement in both speed and space. For generative seq-to-seqtasks including CNN/DailyMail and ELI5, by inheriting the BART weights ourmodel outperforms the standard BART and other efficient models. Our code ispublicly available at https://github.com/LUMIA-Group/FourierTransformer

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator | Latest Papers | HyperAI