Cracking the "black Box" Problem of Time Series Prediction! Huazhong University of Science and Technology Proposed CGS-Mask to Reveal the Key Indicators of Patient Survival Rate

With the widespread application of AI technology in our daily lives,The "interpretability" of the model has gradually become an issue that needs to be addressed.Especially when it comes to tasks involving human life and property safety, this "black box" algorithm not only undermines users' trust in AI systems, but also causes a series of problems, such as safety and discrimination.
This problem is particularly prominent in time series forecasting tasks.Time series forecasting involves multiple key industries, including but not limited to stock market forecasting, disease forecasting, energy forecasting, weather forecasting, etc. In tasks in these fields, it is crucial to understand the reasons behind AI decisions.Taking disease prediction as an example, doctors and patients not only need to know the prediction results of AI, but also need to understand how these results are obtained. If they can clearly point out which symptoms play a key role in diagnosis, it will enhance the trust of doctors and patients in AI-assisted medical diagnosis.
In order to make time series prediction not just an accurate number, but a "visible" process,The Lu Feng team from Huazhong University of Science and Technology, together with the Zomaya team from the University of Sydney and Tongji Hospital, proposed a new method, CGS-Mask.By combining time series forecasting with interpretability, this method can not only improve the model prediction accuracy, but also make the forecast results more intuitive and interpretable.
Specifically, by introducing a masking mechanism, the model can highlight which moments and which data have the greatest impact on the final result, just like clearly marking important signs on the road for you when driving, so that you understand why you made a decision to turn or slow down.This approach has broad potential applications in areas such as healthcare, astronomy, sensors, and energy, especially in time series forecasting tasks that require interaction with users.
This achievement, titled "CGS-Mask: Making Time Series Predictions Intuitive for All", has been accepted for publication in the Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI'24), one of the world's top artificial intelligence conferences.
Research highlights:
* Compared with traditional methods, CGS-Mask can more clearly show which time periods are most important to the forecast results and which factors are not important, making it easier for users to understand the forecast process
* CGS-Mask is suitable for various time series forecasting tasks, especially those that require user interaction and explanation of results, such as stock market forecasting, disease prediction, and weather forecasting.
* CGS-Mask is superior to other methods in terms of accuracy, interpretability and intuitiveness. It reduces the "black box" problem and improves the transparency of the model. Through this method, non-professionals can also understand the prediction results of the model, which is more user-friendly and enhances the applicability and credibility of the model.
* In the future, researchers will actively enhance CGS-Mask and work to demonstrate the applicability of CGS-Mask in more time series applications, especially in the field of healthcare, where the method can be used to identify significant features from medical records to reveal the onset, development, and deterioration of diseases.

Paper address:
https://ojs.aaai.org/index.php/AAAI/article/view/29325
Follow the official account and reply "Time Series Forecast" to get the complete PDF
The open source project "awesome-ai4s" brings together more than 100 AI4S paper interpretations and provides massive data sets and tools:
https://github.com/hyperai/awesome-ai4s
Datasets: Synthetic + real-world data, covering healthcare, astronomy, sensors, and energy
The researchers selected four synthetic datasets:They are "rare features", "rare time", "mixture" and "random" respectively.
* The rare features and rare time datasets contain a small number of significant features and a small number of significant time points, respectively.
* mixture dataset is created by combining rare features and rare time
* random The salient input regions of the dataset are randomly located
The real-world datasets selected by the researchers are: MIMIC-III dataset, LSST dataset, NATOPS dataset, and AE dataset.These datasets cover areas such as healthcare, astronomy, sensors, and energy, and are used to evaluate the performance of CGS-Mask in different fields.
MIMIC-III dataset:Contains health records of 40,000 intensive care unit (ICU) patients, each with 31 features, and is used to predict the patient's survival rate in the next 48 hours. This is a binary classification task, the goal is to distinguish whether the patient will survive or die.
LSST dataset:Simulate astronomical time series data in preparation for observations by the Large Synoptic Survey Telescope. The prediction model needs to classify this data into 14 different astronomical categories.
NATOPS Dataset:Generated by the gesture recognition sensor, the sensor data of the hand, elbow, wrist and thumb are recorded. These data need to be classified into 6 different gestures.
AE dataset:The home appliance energy prediction dataset from the UCI repository is used to predict the total energy usage of a house. This is a regression task, and the output of the prediction model is a numerical value that represents the total energy usage.
Model architecture: Optimizing bar masks, CGS-Mask provides clear and intuitive explanations of time series forecasts
CGS-Mask is a saliency method based on Cellular Genetic Strip Mask. By combining cellular genetic algorithm to optimize the strip mask, it can solve the "black box" problem in time series prediction tasks and improve the interpretability of the model.
* The bar mask considers consecutive time steps as a whole to evaluate the impact of features, which can effectively capture the temporal dependence of time series data; the binary value (0 or 1) of the bar mask enhances the interpretability of the results and makes the significance score more intuitive.
The specific steps of optimizing the bar mask are as follows: first, create a set of bar masks and map them into the cellular automaton; then, optimize each mask by using genetic operations (such as crossover, mutation, and translation) to evolve it to the next generation; after N rounds of generations, the mask with the highest fitness value will be selected as the optimal mask. The overall framework of CGS-Mask is shown in the figure below:

Population initialization:A population of bar masks is randomly initialized and these masks are mapped into a two-dimensional cellular automaton.
Fitness evaluation:A fitness value is calculated for each bar mask and evaluated by a defined perturbation error, which measures the impact of the mask on the model predictions.
Genetic Operator Optimization: Each mask is optimized using genetic operators such as crossover, mutation, and translation.
* Crossover: The algorithm performs a crossover operation between neighbor masks to generate a new mask. In CGS-Mask, a stripe is the basic unit of genetic operation. The stripe of the new mask can be inherited from any parent.
* Mutation: Increase genetic diversity by replacing the stripes in the mask with a certain probability and prevent the algorithm from converging to the local optimal solution too early.
* Translation: Adjust the position offset of the strips on the timeline to optimize the strip mask. This helps fine-tune the position of the strips to more accurately align them to the true salient regions in the input data.
Iterative Evolution:By iteratively applying the above genetic operators, the masks in the population evolve continuously to find masks with higher fitness values.
Choose the optimal mask:After N rounds of iterations, the mask with the highest fitness value is selected as the optimal mask (Optimal Mask M*).
CGS-Mask combines cellular automata and genetic algorithms to effectively optimize bar masks to provide clear and intuitive explanations of time series forecasts. This method does not require model internal information, so it is applicable to various black box models and can quickly provide users with meaningful explanations.
Experimental conclusion: CGS-Mask can effectively identify significant features that change over time and reveal key factors in disease development and deterioration
To evaluate the performance of the CGS-Mask method, the researchers compared it with eight other state-of-the-art saliency methods on synthetic and real-world datasets. These methods include Dynamask, DeepLIFT, RISE, FIT, Shapley Value Sampling (SVS), Feature Occlusion (FO), Feature Permutation (FP), and Integrated Gradient (IG). As shown in the figure below,Experimental results show that CGS-Mask exhibits higher accuracy in determining salient features, indicating that it is more effective in identifying salient features that change over time.

Taking the application in the field of medical health as an example, the researchers selected the MIMIC-III dataset to predict the survival rate of patients in the next 48 hours. The comparison of different methods is shown in the figure below. Figure f is the prediction result of CGS-Mask. The green bar indicates the key features related to the patient's outcome.Studies have found that decreased blood pressure, tachycardia, and shortness of breath all indicate an imminent risk of death, and doctors can intervene in a timely manner based on these characteristics.However, other comparative methods do not clearly identify the period and features that lead to this result, as shown in Figures (a)−(d).

To evaluate the readability of the generated masks, the researchers surveyed 254 participants of different age groups (5-83 years old) and different levels of domain knowledge.More than 65% users rated CGS-Mask as the method that best helped them understand salient features and their temporal correlations, and more than 85% users ranked it in the top 3.
In addition, the researchers conducted a pilot user study to evaluate the feature importance reaction time and accuracy when using 3 saliency masks (Q1, Q2, and Q3) to determine 4 features (A, B, C, and D) within 10 time steps. As shown in the figure below, the average reaction time of users using CGS-Mask (Q2) was 6.26 seconds and the accuracy was 85.4 %, while the average reaction time of users using numerical masks (Q1 and Q3) was 19.22 seconds and the accuracy was only 40.6%.This indicates that CGS-Mask can help users identify feature importance faster and with higher accuracy.

In summary, CGS-Mask, as a model-independent saliency method, is not only intuitive and user-friendly, but also can effectively explain time series forecasts. Its performance exceeds existing solutions in both synthetic and real-world data.Especially in the medical field, CGS-Mask has demonstrated excellent ability in identifying significant features in medical records, which is of great significance for revealing the occurrence, development and deterioration of diseases and has great application potential.
Cutting-edge applications of time series prediction models in the medical field
Time series forecasting is the analysis of data with a time sequence, aiming to capture the trend, seasonality and cyclical patterns in the data by building models. These models can not only predict the changing patterns of historical data, but also analyze future development trends. They are widely used in many fields, including finance, meteorology, medical care, transportation and energy forecasting.
In the medical field, the first author of this article, Professor Lu Feng from Huazhong University of Science and Technology, continues to focus on the application of sequence prediction models.In addition to the above research, she also collaborated with a team from the University of Sydney to publish a paper titled "A Composite Multi-Attention Framework for Intraoperative Hypotension Early Warning" in the Proceedings of the 37 AAAI Conference on Artificial Intelligence (AAAI'23).
Original paper:
https://ojs.aaai.org/index.php/AAAI/article/view/26681

In this paper, researchers proposed a follow-up warning framework for intraoperative hypotension based on multimodality and attention mechanism. Experiments on two large-scale real datasets showed that this method can achieve an accuracy of up to 94.1% for early warning of intraoperative hypotension events, while significantly reducing the signal sampling rate requirement by 3,000 times. In addition, in the most challenging 15-minute mean arterial pressure prediction task, the multimodal framework achieved a mean absolute error of 4.48 mmHg, which is 42.9% lower than the existing solution.
Similarly, a research team from Nanjing Medical University has developed a time series model.Used to predict the incidence of hepatitis. Using seasonal autoregressive moving average model and seasonal exponential smoothing model, they analyzed the number of cases of different types of hepatitis.
The study found that March of each year is the peak period for various hepatitis. In the past 10 years, the incidence of hepatitis A has generally shown a downward trend; the incidence of hepatitis B has fluctuated and has increased in recent years; the incidence of hepatitis C has continued to rise; and the incidence of hepatitis E has remained basically stable. These findings provide an important basis for the formulation of more effective hepatitis prevention and control measures. The study, titled "Time series analysis and forecasting of four hepatitis epidemic trends in China from 2012 to 2021", was published in the Journal of Nanjing Medical University (Natural Sciences).
In summary, the application of time series prediction technology in the medical field has shown great potential. With the continuous advancement of science and technology and the increasing abundance of data, we look forward to seeing more innovative time series prediction models and methods in the future to contribute greater strength to human health and well-being.
References:
https://mp.weixin.qq.com/s/8gYtFqcuctY0BqBYa1e_Hg
Finally, I would like to recommend an academic event! Click on the picture to learn more about the event↓
