HyperAIHyperAI

Command Palette

Search for a command to run...

Graph Convolutions Enrich the Self-Attention in Transformers!

Jeongwhan Choi Hyowon Wi Jayoung Kim Yehjin Shin Kookjin Lee Nathaniel Trask Noseong Park

Abstract

Transformers, renowned for their self-attention mechanism, have achievedstate-of-the-art performance across various tasks in natural languageprocessing, computer vision, time-series modeling, etc. However, one of thechallenges with deep Transformer models is the oversmoothing problem, whererepresentations across layers converge to indistinguishable values, leading tosignificant performance degradation. We interpret the original self-attentionas a simple graph filter and redesign it from a graph signal processing (GSP)perspective. We propose a graph-filter-based self-attention (GFSA) to learn ageneral yet effective one, whose complexity, however, is slightly larger thanthat of the original self-attention mechanism. We demonstrate that GFSAimproves the performance of Transformers in various fields, including computervision, natural language processing, graph-level tasks, speech recognition, andcode classification.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp