HyperAI

Selected for AAAI 2025! The Hong Kong Polytechnic University Team Accurately Predicts the Optoelectronic Properties of Organic Material Molecules Based on Graph Transformer

特色图像

In 1966, a set of abnormal data from the laboratory of Tohoku University in Japan rewrote the history of energy. When the researchers placed a thin film of viologen dye crystals under sunlight, they were shocked by the sudden jump of the current detector.Organic materials can generate photocurrent without relying on silicon crystals!This milestone discovery, published in the Japan Journal of Applied Physics, was like a stone dropped into a deep pond, creating scientific ripples that lasted half a century.

However, the journey of organic solar cells (OSCs) is much more difficult than expected. In the following four decades, researchers were trapped in the "efficiency curse": the diffusion distance of excitons in organic materials is less than 10 nanometers, and the generated electron-hole pairs are annihilated before reaching the electrode. The turning point came in 2005, when Yang Yang's team at the University of California, Los Angeles got inspiration from the photosynthetic system of plants - they imitated the division of labor and cooperation between photosystems II and I in chloroplasts and constructed a nanoscale interpenetrating network using two materials, P3HT and PCBM.This "bulk heterojunction" structure increases the exciton separation efficiency to 60%, pushing the device efficiency to a historic breakthrough of 5%.The related results were published on the cover of Science.

Since then, the efficiency limit of organic solar cells has been continuously broken through. However, as organic solar cells approach the efficiency limit of 20%, the traditional "trial and error" research and development model has encountered a bottleneck. There are trillions of structural combinations hidden behind each new molecule, which has given rise to the strong rise of computational materials science.

The RingFormer framework recently published by the Hong Kong Polytechnic University team is setting off a cognitive revolution in molecular design. This method can accurately predict the optoelectronic properties of molecules by constructing a hierarchical graph Transformer architecture of atomic-chemical rings, combining local message passing with global attention mechanisms. On the test set of the Clean Energy Project Database (CEPDB) from Harvard University, its performance is 22.77% higher than traditional methods, which is equivalent to shortening the research and development cycle of new materials from several years to weeks, marking that organic solar cell research has officially entered a new era of "computation-guided experiments."

The related results, titled "RingFormer: A Ring-Enhanced Graph Transformer for Organic Solar Cell Property Prediction", were selected for AAAI 2025, the top academic conference in the field of AI.


Paper link:
https://doi.org/10.48550/arXiv.2412.09030

Download address of relevant data sets used in the research:

https://hyper.ai/cn/datasets/37721

GNN changes the lengthy traditional R&D model

As the global energy transition continues to demand renewable energy, organic solar cells (OSCs) have become a hot topic of research due to their excellent photoelectric conversion properties. These devices are based on organic small molecule semiconductor materials and achieve the conversion of light energy into electrical energy through the interaction between electron donors and acceptors in a conjugated structure. Their efficiency is closely related to the complexity of the molecular structure. However, the traditional R&D model relies on a lot of trial and error experiments and a lengthy synthesis process.The R&D cycle often takes 3-5 years.This severely limits the speed of material innovation.

In order to screen potential OSCs molecules more efficiently, researchers have begun to use machine learning methods to predict the performance of OSCs. Currently, fingerprint-based methods are widely used. These methods usually use manually designed molecular fingerprints (such as MACCS and ECFP) as molecular features and input them into existing machine learning models such as random forests and support vector machines. However, these fingerprints are simplified representations of molecular structures.Ignoring the complex molecular information and interactions,This is especially evident in OSCs molecules with complex structures.

Graph neural networks (GNNs) once brought hope to this dilemma - abstracting molecules into topological graphs of atomic nodes and chemical bond edges, and capturing structural features through deep learning. However, existing models face two challenges in parsing OSCs molecules:on the one hand,The "atomic myopia" of GNNs makes it difficult to capture long-range electronic coupling effects across multiple benzene rings;on the other hand,The lack of characterization of the connectivity patterns between ring systems makes it impossible to distinguish key structural differences (e.g., the effects of linear connectivity versus star topology on exciton separation).

In response to these challenges,The research team of Hong Kong Polytechnic University proposed an innovative framework RingFormer.This is the first graph transformer framework that captures the ring system in OSCs molecules. This framework breaks through the single perspective of traditional atomic-level modeling and constructs an atom-ring dual-level feature fusion system.

The core of this method is to establish a dynamic interaction mechanism: maintain sensitivity to microscopic features such as chemical bonds and charge distribution at the atomic level, and at the same time establish a cross-ring attention network at the ring level to accurately analyze macroscopic structural features such as shared edges of condensed rings and spatial arrangements of non-condensed rings.

By introducing the inter-ring connection matrix and the intra-ring atom weight assignment algorithm, the model can autonomously identify key ring systems and their interaction patterns. Experimental results show thatThis two-level modeling strategy improves the power conversion efficiency (PCE) prediction accuracy to 92%.It shows stronger characterization capabilities in molecules with complex systems containing more than five rings. This breakthrough not only provides a new paradigm for OSCs material design, but also opens up a new path for machine learning modeling of complex molecular systems.

RingFormer: Representing the molecular structure of OSCs at both atomic and ring levels

In order to better evaluate this method,The researchers collated five OSCs molecular datasets.Including the CEPDB dataset generated based on density functional theory (DFT), as well as the HOPV, PFD, NFA and PD datasets composed of different types of OSCs molecules. These datasets are divided into training set, validation set and test set in a ratio of 6:2:2.

Download address of relevant data sets used in the research:

https://hyper.ai/cn/datasets/37721

In order to accurately capture the atomic and ring-level structural features in OSCs molecules, this study proposed the RingFormer framework.First, a multi-level OSC graph is constructed, and then this multi-level graph is encoded as a whole through the RingFormer layer to predict its performance.As shown in the figure below, this multi-level OSC diagram includes Atom-level, Ring-level and Inter-level diagrams.

RingFormer Framework

The atomic-level graph describes the atomic bonding structure of OSCs molecules in detail, while the ring-level graph focuses on the rings and their connections to capture complex ring systems. The cross-level graph is responsible for modeling the relationship between rings and atoms, thereby fully representing the hierarchical structure of the molecule. The integration of these three levels provides a comprehensive description of the OSCs molecular structure, making performance predictions more accurate.

Next,The RingFormer framework combines local message passing and global attention mechanisms.To capture the unique structural patterns in each level and learn expressive graph representations. On the atomic level graph, the RingFormer layer uses message passing GNNs to encode local structural features into atomic node representations.

For the ring-level graph, the RingFormer layer introduces an innovative cross-attention mechanism specifically designed to capture global patterns in the ring system, especially the connections between rings. In addition, the RingFormer layer also promotes the interaction between ring nodes and atomic nodes through message passing on the cross-level graph. At the end of each RingFormer layer, a hierarchical fusion strategy is implemented to ensure that information at different levels can complement each other.

Finally, after multiple layers of stacking, RingFormer aggregates the node representations of atoms and rings to form a graph representation that comprehensively encodes the molecular structure of OSCs, providing a solid foundation for performance prediction.

Next, to evaluate the effectiveness of RingFormer in OSCs performance prediction, the researchers compared it with 11 baseline models on 5 OSCs molecular datasets.RingFormer consistently outperforms the baseline model.Notably, on the large-scale CEPDB dataset, RingFormer achieves a significant relative improvement of 22.77% over the nearest competitor.

As shown in the table below, in terms of predicted power conversion efficiency (PCE),RingFormer performs best on almost all datasets, ranking second only on the PFD dataset.Especially on the NFA dataset with the largest average number of rings, RingFormer outperforms the fingerprint-based method ECFP by 4.96%. In addition, when dealing with larger and more complex OSCs molecules, RingFormer still performs well on these datasets, performing best in 3 of the 4 datasets.

Comparison of prediction performance of RingFormer and other baseline models

The researchers further used the CEPDB dataset to evaluate RingFormer’s performance in multi-task learning. The results showed thatRingFormer consistently outperforms other competing models in all 6 target performance metrics.And usually with significant advantages. In addition, due to the fusion of message passing and global attention mechanism, GPS also performs well in all target performances, second only to RingFormer,This further confirms the importance of capturing both local and global structural features in OSCs molecules.

Finally, the researchers also evaluated the performance of RingFormer when dealing with OSCs molecules with different numbers of rings. As the number of rings in the molecule increases, the performance improvement of RingFormer also increases accordingly.This indicates that there is a clear positive correlation between the superior performance of RingFormer and the complexity of the ring system.

In addition, the study also used UMAP technology to visualize the graph representation of OSCs molecules in the CEPDB test set. Compared with the embedding generated by GPS,The embeddings generated by RingFormer can be clearly distinguished according to the number of rings in the OSCs molecule.These observations further confirm the remarkable ability of RingFormer in capturing the complex structure of ring systems.

AI technology reshapes the future of the industry, and the Chinese power behind OSCs cannot be ignored

In the wave of global energy transformation, organic solar cells (OSCs) are gradually moving from the laboratory to the forefront of industrialization with their light weight, flexibility and low cost, and the research progress of Chinese scientists in the field of organic solar cells is eye-catching.

In 2015, Hou Jianhui's team at the Chinese Academy of Sciences proposed the theory of "polymer-small molecule synergistic effect" and developed the non-fullerene receptor ITIC, whose butterfly-shaped three-dimensional configuration enables precise engagement of molecules.The team's product can still maintain the efficiency of 82% under the extreme climate conditions of the Qinghai Plateau at an altitude of 4,200 meters, becoming the world's first proven case in high altitude areas.

By 2025, Li Yaowen's team at Soochow University achieved a certified efficiency of 20.82% by regulating the molecular arrangement gradient of the active layer through the "sequential crystallization strategy", and broke through the bottleneck of thick-film device industrialization.The efficiency of the 400-nanometer thick film reached 17.93%.It laid the foundation for the development of roll-to-roll printing technology.

At the same time, Ge Ziyi's team at Ningbo Institute of Materials designed a quinoxaline receptor SMA, which, through orderly molecular arrangement,The efficiency of rigid and flexible OSCs was increased to 20.22% and 18.42% respectively.The 96%'s performance was maintained even after 2,000 bends, setting a new standard for wearable energy devices.

The combination of AI and organic solar cell research has already appeared as early as 2023. Professor Li Youyong's team from the Institute of Functional Nanomaterials and Soft Matter at Soochow University collaborated with Professor Yuan Jianyu's team to use machine learning to achieve high-throughput screening of organic solar cells. They used DFT calculations to deeply explore the electronic structure properties of organic molecules and used big data technology to build a functional material database, providing a solid foundation for the training of machine learning models.This research not only improves the efficiency of organic optoelectronic material screening and reduces computational costs,It also provides strong support for the design and optimization of optoelectronic devices.


*Thesis Title:

Efficient screening framework for organic solar cells with deep learning and ensemble learning
*Paper link:

https://www.nature.com/articles/s41524-023-01155-9

In 2024, a research team from the University of Illinois and the University of Toronto proposed a breakthrough "Closed Loop Transfer (CLT)" method to transform AI into an explainable chemical knowledge engine. This method combines physical feature selection with supervised learning.Thirty new molecules were screened in five rounds of closed-loop experiments, including a light-harvesting molecule with a 5-fold increase in photostability.The strong correlation between high-energy triplet density of states (TDOS) and stability was revealed, providing a universal design principle for the photodegradation problem.

* Paper Title:

Closed-loop transfer enables artificial intelligence to yield chemical knowledge
* Paper link:

https://doi.org/10.1038/s41586-024-07892-1

Also in 2024, Christoph J. Brabec and Wu Jianchang from the Helmholtz Institute in Germany, Wang Luyao from Xiamen University, Pascal Friederich from the Karlsruhe Institute of Technology in Germany, and Sang Il Seok from the Ulsan National Institute of Science and Technology in South Korea jointly developed a closed-loop automated workflow. This process combines machine learning and experiments.Ability to quickly generate molecular design rules for specific device needs,It lays the foundation for the development of next-generation high-performance optoelectronic devices such as organic solar cells.

* Paper Title:

Inverse design workflow discovers hole-transport materials tailored for perovskite solar cells
* Paper link:

https://doi.org/10.1126/science.ads0901

It can be seen that AI technology plays an increasingly important role in the global research of organic solar cells, not only accelerating the discovery of new materials and performance optimization, but also providing new perspectives and solutions for solving long-standing scientific problems. With the continuous maturity of technology and the acceleration of industrialization, China has become a key engine for promoting the development of global organic solar cell technology, and is expected to contribute more Chinese wisdom and solutions to the future energy revolution.