HyperAI

For the First Time! GPT-2 Empowers the Physical Layer of Wireless Communications, and the Peking University Team Proposes a Channel Prediction Solution Based on Pre-trained LLM

特色图像

In wireless communications, signals transmitted through wireless channels are usually affected by energy attenuation and noise interference, resulting in a certain degree of difference between the signal received by the user and the signal sent by the base station, just as when people travel, they are affected by the actual road conditions and the time they arrive at their destination will be different from their expectations. In order to match expectations with actual conditions, it is necessary to understand accurate road status information when traveling. In wireless communications, in order to ensure the accuracy and effectiveness of signal transmission, it is necessary to understand accurate channel state information (CSI) and restore the original transmitted signal based on the signal on the receiving side.

Channel prediction is a core technology to achieve efficient CSI acquisition.It predicts future CSI based on the CSI sequence at the historical moment, which can greatly reduce the channel estimation and feedback overhead. Especially for 5G/6G MIMO wireless communication systems, channel prediction has shown unprecedented importance. However, the existing channel prediction methods based on parameterized models and deep learning still have problems such as low prediction accuracy and poor generalization, and are difficult to apply to actual complex channel environments.

In recent years, with the great success of large language models (LLM) in natural language processing and other fields, more and more research teams have focused on this. However, the current application of large language models in communication tasks is still limited to tasks such as language-based protocol understanding, and there are still doubts about whether they can enable non-language wireless communication physical layer tasks.

first,Channel state information is high-dimensional structured data with complex "space-time-frequency" three-dimensional relationships, which increases the complexity of processing;Secondly,There is a domain gap between the knowledge in the channel domain and the natural language domain, which further increases the difficulty of knowledge transfer.

To overcome the above challenges,Cheng Xiang's team from the School of Electronics at Peking University proposed a MIMO-OFDM channel prediction scheme LLM4CP based on a pre-trained large language model.It can be applied to TDD (time division duplex) and FDD (frequency division duplex) communication systems.

The related results were published in the journal "Journal of Communications and Information Networks" under the title "LLM4CP: Adapting Large Language Models for Channel Prediction".

Specifically, the research team built a channel prediction neural network based on pre-trained GPT-2, which includes a preprocessing module, an embedding module, a pre-trained LLM module, and an output module, thereby improving the predictive and generalization capabilities of the large language model in channel prediction, creating more possibilities for deployment in actual application scenarios.

Research highlights:

* For the first time, the pre-trained large language model was applied to the channel prediction task, proving that the pre-trained large language model can break through the natural language form and enable the physical layer design of wireless communication * The designed channel feature enhancement neural network aligns the channel space with the large model feature space, achieving good transfer of the general knowledge of the pre-trained large model on the channel prediction task

* Simulation results show that the proposed scheme achieves the most advanced full-sample and few-sample prediction performance in TDD and FDD channel prediction tasks, and the frequency generalization performance is significantly ahead of existing schemes, while having the same training and inference time cost as a small deep learning model

Paper address:
https://ieeexplore.ieee.org/document/10582829


Dataset download:

https://go.hyper.ai/G0plJ

The open source project "awesome-ai4s" brings together more than 100 AI4S paper interpretations and provides massive data sets and tools:

https://github.com/hyperai/awesome-ai4s

Dataset: Fully compatible 3GPP  standard

During the experimental phase of the study,The team used the QuaDRiGa simulator to generate a 3GPP-compliant time-varying channel dataset for performance verification.

The team set up a MISO-OFDM system with a dual-polarized UPA (uniform planar array) on the base station side and a single omnidirectional antenna on the user side, with the antenna spacing being half the wavelength at the center frequency. The bandwidth of both the uplink and downlink channels is 8.64 MHz, and the pilot frequency spacing is 180 kHz. For both TDD and FDD modes, the center frequencies of both the uplink and downlink channels are set to 2.4 GHz. For FDD mode, the uplink and downlink channels are adjacent. The research team set the pilot frequency spacing time to 0.5 ms in the prediction experiment.
* TDD: It is a duplex mode of communication system, used to separate receiving and transmitting channels in mobile communication systems.
* FDD: refers to the uplink (mobile station to base station) and downlink (base station to mobile station) operating on two separate frequencies (with certain frequency spacing requirements).

The study considers the 3GPP urban macro channel model and non-line-of-sight scenarios. The number of clusters is 21, and the number of paths in each cluster is 20. The initial position of the user is randomized, and the movement trajectory is set to linear.

The training dataset and validation dataset contain 8,000 and 1,000 samples respectively.The user speeds are uniformly distributed between 10 and 100 km/h. The test dataset contains 10 speeds ranging from 10 km/h to 100 km/h, with 1,000 samples for each speed.

Model architecture: Channel prediction based on large language model

Existing downlink CSI capture methods have two major disadvantages: first, the CSI estimation and feedback process will incur additional computing and transmission time costs, leading to "channel aging" in high-dynamic scenarios; second, the additional downlink pilot occupies part of the time-frequency resources, which will especially reduce the spectrum efficiency of the FDD system.

The LLM4CP proposed in this paper is a MISO-OFDM channel prediction method based on LLM. It predicts the future downlink CSI sequence based on the historical uplink CSI sequence.It can effectively avoid the downlink pilot overhead and feedback delay.It provides a more pragmatic approach to solve the above two shortcomings.

In order to adapt the text-based pre-trained LLM to the complex matrix format of CSI data, the research team designed specific modules for LLM4CP for format conversion and feature extraction, including preprocessing module (Preprocessor), embedding module (Embedding), pre-trained LLM module (Pre-trained LLM) and output module (Output), as shown in the following figure:

LLM4CP network architecture diagram

The preprocessing module mainly solves the high-dimensional structured data of CSI's complex "space-time-frequency" three-dimensional relationship.To solve the high-dimensional problem in the spatial domain, the team parallelized the antenna dimension, that is, predicting the CSI of each pair of transmitting antennas and receiving antennas separately, thereby reducing network overhead while improving the scalability of the task; to fully capture the frequency domain characteristics, the team fully considered the channel structured characteristics and introduced the delay domain to directly characterize the multipath delay characteristics; to effectively extract the time domain features, the team adopted block processing to capture the local time domain change characteristics and reduce the computational complexity.

The design of the embedding module is mainly used for preliminary feature extraction before LLM.Including CSI attention and positional embeddings. Due to the significant difference between text information and CSI information, the pre-trained LLM cannot directly process non-language data, so the research team tried to use the general modeling ability of LLM to complete the channel prediction task. The embedded module is designed to further process the pre-processed features to align the feature space of the pre-trained LLM, overcoming the domain difference.

In this study,The team chose GPT-2 as the LLM backbone network. The backbone of GPT-2 consists of learnable position embedding layers and stacked transformer decoders, and the number of stacks and feature sizes can be flexibly adjusted as needed. During training, the multi-head attention layer and feed forward layer of the pre-trained LLM remain frozen (as shown in the blue box above) to retain the general knowledge in the pre-trained LLM, while fine-tuning the addition, layer normalization, and position embedding to adapt the LLM to the channel prediction task.

It is worth noting that the team pointed out that in the method proposed in this paper,The GPT-2 backbone network can also be flexibly replaced with other large language models.

Finally, the output module aims to transform the output features of LLM into the final prediction results.

Research results: LLM4CP's prediction accuracy, achievable rate, and bit error rate are better than existing solutions

In order to verify the superiority of the proposed method, the research team compared LLM4CP with several channel prediction methods based on models or deep learning and interference-free conditions, including PAD, RNN, LSTM, GRU, CNN, Transformer, and no prediction, and set three performance indicators, namely NMSE (normalized mean square error), SE (spectral efficiency), and BER (bit error rate). The results show that the channel prediction accuracy, achievable rate, and bit error rate of LLM4CP are better than those of existing channel prediction schemes.

The research team compared three performance indicators of LLM4CP with other methods in TDD and FDD systems.

In the TDD system,The SE and BER performance indices of LLM4CP are 7.036 bit·(s·Hz)⁻¹ and 0.0039, respectively;In FDD system,They are 6.303 bit·(s·Hz)⁻¹ and 0.0347 respectively, as shown in the figure below:

SE and BER Performance of LLM4CP and Other Methods for TDD Systems
SE and BER performance of LLM4CP and other methods for FDD systems

In TDD and FDD systems,LLM4CP achieves state-of-the-art SE and BER performance.

In the noise robustness test, LLM4CP showed the lowest NMSE and the highest signal-to-noise ratio, indicating that it is highly robust to CSI noise. As shown in the following figure:

NMSE performance and signal-to-noise ratio of historical CSI in TDD systems
NMSE performance and signal-to-noise ratio of historical CSI in FDD systems

Training with a small number of samples plays a crucial role in the rapid deployment of the model. The team tested the proposed method's ability to learn with a small number of samples, using only the 10% dataset for network training. Compared with full sample training,The advantages of LLM4CP over other methods are evident in the few-sample prediction scenario.

In the frequency generalization test, the team applied the model trained at 2.4 GHz in the TDD system to the 4.9 GHz frequency with less training and zero samples. The results showed thatLLM4CP only needs a small number of samples, 30, to achieve the predictive performance of the parameterized model.This proves its excellent generalization ability. As shown in the following figure:

Relationship between cross-frequency generalization performance of TDD system and sample size

A viable solution with high performance and low cost

Cost investment is a key link in the implementation of the model in actual scenarios. The study evaluated the difficulty of deploying the proposed method in actual scenarios. The relevant comparison is shown in the following figure:

Training parameters and costs

Since PAD is a model-based method, the number of model parameters is relatively small and no training process is required, but due to its high processing complexity, its inference time is the longest. The inference time of LLM is greatly reduced compared to Transformer.Therefore, LLM4CP also has the potential to serve real-time channel prediction.

In addition, the team also evaluated the impact of selecting different numbers of GPT-2 layers on channel prediction performance, parameter cost, and inference time. As shown in the following figure:

NMSE performance, network parameters and interference time of LLM4CP with different GPT-2 layers

When testing with the 10% training dataset in a TDD system setting, it was found that both network parameters and inference time increased with the number of GPT-2 layers, and the best performance was achieved in the test range of a model with 6 GPT-2 layers. This means that more layers are not necessarily beneficial for prediction. In actual deployment, the type and size of the LLM backbone network needs to take into account the requirements for prediction accuracy, as well as the constraints of device storage and computing resources.

AI makes unlimited communication full of imagination

With the rapid development of wireless communications, especially in the current 5G era and the future 6G era, the importance of combining AI with communications is self-evident. In related technical fields, the application of AI technology has already received extensive attention and research in the industry.

For example, a study titled "A novel deep learning based time-varying channel prediction method" was previously published by Yang Lihua's team at Nanjing University of Posts and Telecommunications.A deep learning-based time-varying channel prediction method suitable for high-speed mobile scenarios is proposed.This method is based on the back propagation (BP) neural network, which performs offline training and online prediction. The paper points out that this method can significantly improve the prediction accuracy of time-varying channels and has low computational complexity.
* Paper address:

https://www.infocomm-journal.com/dxkx/CN/10.11959/j.issn.1000-0801.2021011

What is different from the past is that this study is the first to apply a large language model to the design of the physical layer of wireless communications, which undoubtedly opens a precedent for the combination of AI and communication technology.

As mentioned in the paper, in the past, there has never been a successful attempt to apply a pre-trained large language model to non-natural language tasks.It proves that pre-trained large language models can also break through the language form to enable wireless communications.

What is more worth mentioning is that through this experiment and attempt, the big language model will surely open a new chapter of empowerment. At the same time, with the help of the unique reasoning ability of the big language model, it can also make us more convinced that it will inevitably accelerate the integration of AI and vertical industry-related technologies, thus finding a shortcut to the combination of AI and thousands of industries.