【Included Complete Paper】Interpretation of Recent Highlight Papers from AIR - Tsinghua University Institute for AI Industry Research
### Abstract of Recent Highlight Papers from Tsinghua University's Academy of Intelligent Research (AIR) In the past six months, the Academy of Intelligent Research (AIR) at Tsinghua University has made significant strides in its research on smart transportation, smart healthcare, and smart IoT. The institute has published several high-level papers in prestigious international journals and conferences, including CVPR, ICRA, ICLR, and MobiSys. Notably, AIR has received nominations for the ICLR Outstanding Paper Award and the IEEE Micro Top Picks Award. At the recent ICLR 2023 conference, 14 papers from AIR were accepted. #### 1. **Bit Allocation using Optimization** - **Authors**: Tongda Xu, Han Gao, Chenjian Gao, Yuanyuan Wang, Dailam He, Jinyong Pi, Jixiang Luo, Ziyu Zhu, Mao Ye, Hongwei Qin, Yan Wang, Jingjing Liu, Ya Qin - **Institutions**: Tsinghua University's Academy of Intelligent Research (AIR), SenseTime, University of Electronic Science and Technology of China, Beihang University - **Conference**: ICML 2023 - **Summary**: This paper addresses the bit allocation problem in neural video compression (NVC). The authors establish a fundamental relationship between bit allocation in NVC and semi-amortized variational inference (SAVI). They prove that SAVI with GoP-level likelihood is equivalent to pixel-level bit allocation with precise rate and quality dependency models. Leveraging this equivalence, they propose a new bit allocation paradigm using SAVI, which does not require empirical models and is therefore optimal. To extend SAVI to multi-level latent variables in NVC, they recursively apply backpropagation within gradient ascent. Additionally, they introduce an efficient approximation algorithm for practical implementation. The method is optimal for scenarios where performance outweighs encoding speed and serves as an empirical upper bound for bit allocation in R-D performance. Experimental results show that the proposed algorithm improves PSNR by about 0.5 dB compared to the current state-of-the-art bit allocation methods. The code is open-sourced at [https://github.com/tongdaxu/Bit-Allocation-Using-Optimization](https://github.com/tongdaxu/Bit-Allocation-Using-Optimization). #### 2. **Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization** - **Authors**: Haoran Xu, Li Jiang, Jianxiong Li, Zorran Yang, Zhaoran Wang, Xianyuan Zhan - **Institutions**: Tsinghua University's Academy of Intelligent Research (AIR), Tsinghua University, Yale University, Northwestern University - **Conference**: ICLR 2023 (oral) - **Summary**: Most offline reinforcement learning (RL) methods face a trade-off between improving policies to surpass behavioral policies or limiting policies to reduce deviations. This paper explores the in-sample learning paradigm, which improves policies using only data samples for quantile regression without querying the value functions of unseen actions. The authors show that in-sample learning operates under the framework of implicit value regularization, explaining its effectiveness. They propose two practical algorithms, Sparse Q-learning and Exponential Q-learning, which use the same value regularization as existing methods but in a fully in-sample manner. Experiments in various scenarios validate the effectiveness and superiority of the proposed algorithms. #### 3. **DPF: Learning Dense Prediction Fields with Weak Supervision** - **Authors**: Xiaoxue Chen, Yuhang Zheng, Yipeng Zheng, Qiang Zhou, Hao Zhao, Guyue Zhou, Ya Qin - **Institutions**: Tsinghua University's Academy of Intelligent Research (AIR), Department of Computer Science, Tsinghua University, Beihang University, Institute of Automation, Chinese Academy of Sciences - **Conference**: CVPR 2023 - **Summary**: Dense prediction networks are essential for tasks like semantic segmentation and image intrinsic decomposition. However, pixel-level annotations are expensive and often impractical. This paper introduces Dense Prediction Fields (DPF), a method that uses cheap sparse point supervision to train dense prediction networks. DPF is based on coordinate point queries and uses implicit neural functions to generate analyzable visual features for continuous 2D space positions, allowing for predictions at arbitrary resolutions. The effectiveness of DPF is demonstrated on three large public datasets: PASCALContext, ADE20K, and IIW, where it achieves state-of-the-art results and significant improvements over previous methods. The code and models are available at [https://github.com/cxx226/DPF](https://github.com/cxx226/DPF). #### 4. **ADAPT: Action-aware Driving Caption Transformer** - **Authors**: Bu Jin, Xinyu Liu, Yipeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou, Jingjing Liu - **Institutions**: Tsinghua University's Academy of Intelligent Research (AIR), Institute of Automation, Chinese Academy of Sciences, Department of Computer Science, Tsinghua University, Xidian University, Southern University of Science and Technology, Beihang University - **Conference**: ICRA 2023 - **Summary**: Autonomous driving has made significant progress, but many methods treat it as a supervised learning problem, which limits transparency and explainability. This paper introduces ADAPT, an end-to-end Transformer-based architecture that provides natural language descriptions and explanations of vehicle decisions. The authors use multi-task learning to jointly train the vehicle decision task and the text description task, reducing the gap between them. ADAPT is validated on the large-scale BDD-X dataset and has shown excellent results in real-world tests. The code and models are available at [https://github.com/jxbbb/ADAPT](https://github.com/jxbbb/ADAPT). #### 5. **Annotating Covert Hazardous Driving Scenarios Online: Utilizing Drivers' Electroencephalography (EEG) Signals** - **Authors**: Chen Zheng, Muxiao Zhi, Wenjie Jiang, Mengdi Chu, Yan Zhang, Jirui Yuan, Guyue Zhou, Jiangtao Gong - **Institutions**: Tsinghua University's Academy of Intelligent Research (AIR) - **Conference**: ICRA 2023 - **Summary**: With the rise of autonomous driving, the need for fine-grained driving scenario databases is increasing. However, human annotations are expensive, time-consuming, and prone to cognitive and behavioral biases. This paper proposes using drivers' EEG signals to annotate driving risks. The authors conducted experiments with 10 experienced driving instructors, who watched real and simulated driving scenario videos. The results show that EEG signals are more sensitive to both overt and covert driving risks compared to manual annotations, providing a more accurate risk annotation. They also used Time-Series AI to classify driving risks based on EEG signals, demonstrating the feasibility of this approach. The paper discusses the necessary follow-up work to implement this technology. #### 6. **Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack** - **Authors**: Hideaki Takahashi, Jingjing Liu, Yang Veronica Liu - **Institutions**: Tsinghua University's Academy of Intelligent Research (AIR) - **Conference**: CVPR 2023 - **Summary**: Federated learning with model distillation (FedMD) is a privacy-preserving distributed learning paradigm that shares refined knowledge from a public dataset rather than private model parameters. This paper reveals that even this approach is vulnerable to data exposure through a paired-logits inversion (PLI) attack. The authors demonstrate that a malicious server can train an inversion neural network to exploit the confidence gap between server and client models, successfully reconstructing private images from public datasets. Experiments on multiple facial recognition datasets validate the high success rate of the attack, highlighting the need for stronger security measures in FedMD. #### 7. **Conditional Antibody Design as 3D Equivariant Graph Translation** - **Authors**: Xiangzhe Kong, Wenbing Huang, Yang Veronica Liu - **Institutions**: Tsinghua University, Highling AI Institute, Renmin University of China, Tsinghua University's Academy of Intelligent Research (AIR) - **Conference**: ICLR 2023 - **Award**: Outstanding Paper Nomination - **Summary**: Antibody design is crucial for therapeutic and biological research. Existing deep learning methods face challenges in modeling the full context required for generating complementary determining regions (CDRs), capturing complete 3D geometry, and efficiently predicting CDR sequences. This paper introduces Multi-Channel Equivariant Attention Networks (MEAN) to co-design 1D sequences and 3D structures of CDRs. MEAN treats antibody design as a conditional graph translation problem, incorporating target antigens and antibody light chains. The network uses E(3)-equivariant message passing and a novel attention mechanism to better capture geometric correlations between components. Experiments show that MEAN significantly outperforms existing methods in sequence and structure modeling, antigen-binding CDR design, and binding affinity optimization, with improvements of about 23% and 34% respectively. #### 8. **End-to-End Full-Atom Antibody Design** - **Authors**: Xiangzhe Kong, Wenbing Huang, Yang Veronica Liu - **Institutions**: Tsinghua University, Highling AI Institute, Renmin University of China, Tsinghua University's Academy of Intelligent Research (AIR) - **Conference**: ICML 2023 - **Summary**: Antibody design is a challenging task in therapeutic and biological research. Current learning-based methods handle only sub-tasks within the design process and often omit framework regions or side chains, leading to incomplete atomic geometry. This paper proposes Dynamic Multi-Channel Equivariant Attention Networks (dyMEAN), an end-to-end full-atom model for antibody design. dyMEAN incorporates knowledge-guided antibody structure initialization and introduces shadow sites to bridge the gap between antigens and antibodies. The model uses an adaptive multi-channel equivariant encoder to update 1D sequences and 3D structures, considering variable-sized protein residues. Finally, it aligns the shadow sites to dock the updated antibody to the antigen. Experiments show that dyMEAN outperforms existing methods in antigen-binding CDR-H3 design, complex structure prediction, and binding affinity optimization. #### 9. **Fractional Denoising for 3D Molecular Pretraining** - **Authors**: Shikun Feng, Ting Ni, Yan Yan, Zhiming Ma, Weiying Ma - **Institutions**: Tsinghua University, Institute of Mathematics and Systems Science, Chinese Academy of Sciences, Tsinghua University's Academy of Intelligent Research (AIR) - **Conference**: ICML 2023 - **Summary**: Coordinate denoising is a popular method for 3D molecular pretraining, where Gaussian noise is added to the equilibrium molecular coordinates, and the model learns to denoise. However, this method has low sample coverage and inaccurate force field approximation due to the assumption of isotropic molecular probability distributions. This paper proposes a hybrid noise strategy, adding Gaussian noise to rotatable dihedral angles and molecular coordinates. This approach samples more valuable low-energy structures and captures anisotropic molecular probability distributions. However, the authors note that denoising hybrid noise is not equivalent to force field learning due to the variance of dihedral angle noise. They introduce Fractional Denoising (Frad) to address this, focusing on denoising only the coordinate part of the hybrid noise. Experiments on QM9 and MD17 benchmarks show that Frad outperforms existing methods in molecular representation. #### 10. **Coarse-to-Fine: a Hierarchical Diffusion Model for Molecule Generation in 3D** - **Authors**: Bo Qiang, Yuxuan Song, Mingkai Xu, Jingjing Gong, Bowen Gao, Hao Zhou, Weiying Ma, Yan Yan - **Institutions**: Tsinghua University, Peking University, Tsinghua University's Academy of Intelligent Research (AIR) - **Conference**: ICML 2023 - **Summary**: Generating 3D molecular structures is crucial for various applications. However, existing methods are inefficient, especially for large molecules. This paper introduces Hierarchical Diffusion (HierDiff), a model that generates 3D molecular structures without relying on autoregressive modeling. HierDiff first generates coarse-grained molecular geometry using equivariant diffusion, where each node represents the relative position and some physicochemical properties of molecular fragments. It then decodes these nodes into fine-grained fragments using message passing and a novel iterative refinement sampling module. Finally, it assembles the fragments into complete atomic molecular structures. Experiments show that HierDiff generates more stable and low-energy molecular conformations with better physicochemical properties, making it a promising tool for drug design. #### 11. **Weakly Supervised Vision-and-Language Pre-training with Relative Representations** - **Authors**: Chi Chen, Peng Li, Maosong Sun, Yang Veronica Liu - **Institutions**: Department of Computer Science, Tsinghua University, Tsinghua University's Academy of Intelligent Research (AIR) - **Conference**: ACL 2023 - **Summary**: Weakly supervised vision-and-language pre-training (WVLP) can reduce data costs while maintaining good performance on downstream tasks. However, current WVLP methods use local image descriptions as cross-modal anchors, which affects data quality and pre-training effectiveness. This paper proposes Relative Representation Learning for Weakly Supervised Vision-and-Language Pre-training (RELIT), a framework that uses a few aligned image-text pairs as anchors and represents unaligned samples based on their similarity to the anchors. RELIT collects high-quality weakly aligned image-text pairs from large-scale pure image and text data for pre-training. Experiments show that RELIT outperforms existing methods on four downstream tasks in a weakly supervised environment. #### 12. **AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments** - **Authors**: Hao Wen, Yuanchun Li, Zunshuai Zhang, Shiqi Jiang, Xiaozhou Ye, Ye Ouyang, Ya Qin, Yunxin Liu - **Institutions**: Tsinghua University's Academy of Intelligent Research (AIR), Microsoft Research, AsiaInfo Technologies, Shanghai University, Pujiang Lab - **Conference**: MobiCom 2023 - **Summary**: Deploying deep learning models on real-time edge devices requires adapting to diverse edge environments to ensure stable service quality. Traditional methods struggle with the diversity of edge environments and the need for edge information. This paper introduces AdaptiveNet, a method for post-deployment neural architecture adaptation. AdaptiveNet measures model quality accurately and preserves private edge data to generate customized models for different conditions. It uses a cloud-based model elasticity method and an edge-device architecture search method. The elasticity method generates a high-quality model architecture search space guided by developer-specified prediction models. Each sub-network in the space is effective and has different environmental affinities, allowing each device to find and maintain the most suitable sub-network. Experiments show that AdaptiveNet achieves significantly better accuracy-latency trade-offs with minimal overhead. #### 13. **NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors** - **Authors**: Jianyu Wei, Ting Cao, Shijie Cao, Shiqi Jiang, Shaowei Fu, Mao Yang, Yan Zhang, Yunxin Liu - **Institutions**: University of Science and Technology of China, Microsoft Research, Tsinghua University's Academy of Intelligent Research (AIR) - **Conference**: MobiSys 2023 - **Summary**: Mobile devices are increasingly equipped with heterogeneous multi-processors, but most neural network models are single-branch, limiting the utilization of these processors. This paper proposes NN-Stretch, a method that automatically branches a given model to leverage the computational power of heterogeneous processors. NN-Stretch horizontally stretches the model, transforming a long and narrow model into a short and wide one with multiple branches. The method abstracts the problem as an optimization with a large design space, narrowing the search space by considering hard latency constraints and maintaining the model structure and expressiveness of each branch. Experiments show that NN-Stretch achieves up to 3.85x speedup compared to single-processor execution, with a maximum accuracy improvement of 0.8%. #### 14. **Multimodal Federated Learning via Contrastive Representation Ensemble** - **Authors**: Qiying Yu, Yang Veronica Liu, Jingjing Liu - **Institutions**: Tsinghua University's Academy of Intelligent Research (AIR) - **Conference**: ICLR 2023 - **Summary**: With the increase of multimedia data on modern mobile systems and IoT devices, utilizing this data without compromising user privacy is a significant challenge. Federated learning is a privacy-preserving distributed learning paradigm, but current multimodal methods rely on single-modal model aggregation, limiting server-side model complexity. This paper introduces CreamFL, a multimodal federated learning framework that trains larger server-side models from clients with heterogeneous model architectures and data modalities. CreamFL uses a global-local collaborative cross-modal aggregation strategy and inter-modal and intra-modal contrastive learning to regularize client model training. Experiments on image-text retrieval and visual question answering tasks show that CreamFL outperforms state-of-the-art federated learning methods. #### 15. **ConvReLU++: Reference-based Lossless Acceleration of Conv-ReLU Operations on Mobile CPU** - **Authors**: Rui Kong, Yuanchun Li, Yizhen Yuan, Linghe Kong - **Institutions**: Shanghai Jiao Tong University, Tsinghua University's Academy of Intelligent Research (AIR), Pujiang Lab - **Conference**: MobiSys 2023 - **Summary**: ReLU is a commonly used activation function in convolutional neural networks (CNNs), leading to many zero-valued output activations. Accelerating CNN inference by identifying and skipping zero-valued neurons can be effective but often results in accuracy loss. This paper proposes ConvReLU++, a lossless acceleration method for mobile CPU-based CNN inference. ConvReLU++ accurately detects and skips zero-valued computations, achieving significant speedup without any accuracy loss. The method is implemented in popular mobile inference frameworks and evaluated on common deep vision tasks. Experiments show that ConvReLU++ reduces computational load and achieves 2.90% to 8.91% end-to-end inference speedup on real edge devices. #### 16. **"I am the follower, also the boss": Exploring Different Levels of Autonomy and Machine Forms of Guiding Robots for the Visually Impaired** - **Authors**: Yan Zhang, Ziaon Li, Hao Yue Guo, Luoyao Wang, Q鹤 Chen, Wenjie Jiang, Ming Ming Fan, Guyue Zhou, Jiangtao Gong - **Institutions**: Tsinghua University's Academy of Intelligent Research (AIR), Hong Kong University of Science and Technology (Guangzhou) - **Conference**: CHI 2023 - **Summary**: Navigation robots based on autonomous driving technology can help visually impaired individuals achieve independent mobility. This study designs two forms of robots with switchable autonomy levels (smart cane and smart cart) and conducts control experiments in a laboratory (N=12) and natural walking field tests (N=9). Results show that while fully autonomous robots performed better in control experiments, participants in natural environments preferred retaining more control. The smart cart provided a higher sense of security and navigation efficiency compared to the smart cane. These findings offer empirical evidence for designing assistive robots tailored to the preferences of visually impaired users. ### Conclusion The recent highlight papers from AIR demonstrate significant advancements in various fields, including neural video compression, reinforcement learning, dense prediction in computer vision, autonomous driving, data security, antibody design, molecular pretraining, and edge computing. These contributions not only push the boundaries of technical innovation but also address practical challenges in real-world applications, providing valuable insights and solutions for future research and development. For more details and to download the papers, follow the instructions on the AIR official website or the provided links.
