HyperAIHyperAI

Command Palette

Search for a command to run...

منذ عام واحد

التنبؤ بمركّبات البروتين-الليجاند الواعية بالحالة باستخدام ألفافولد3 مع تسلسلات مُنقّاة

Enming Xing Junjie Zhang Shen Wang Xiaolin Cheng

نشر ألفافولد 3 بنقرة واحدة

20 ساعة فقط من موارد حوسبة RTX 5090 $1 (قيمة $7)
الانتقال إلى دفتر

الملخص

العنوان: [غير محدد]

الملخص: شهدت التنبؤات القائمة على التعلم العميق لمركبات البروتين-الليجند تقدماً كبيراً مع تطور معماريات مثل AlphaFold3 وBoltz-1 وChai-1 وProtenix وNeuralPlexer. وقد شكلت المحاذاة المتعددة للتسلسلات (MSA) مدخلاً رئيسياً، حيث توفر معلومات تطورية مشتركة حاسمة للاستدلال البنيوي. ومع ذلك، تكشف المعايير الحديثة عن قيد رئيسي: غالباً ما تحفظ هذه النماذج وضعيات الليجند من بيانات التدريب وتؤدي أداءً ضعيفاً تجاه الأنماط الكيميائية الجديدة أو أحداث الارتباط الديناميكية التي تنطوي على تغييرات تشكيلية كبيرة في جيوب الارتباط. للتغلب على هذا التحدي، قدمنا استراتيجية للتنبؤ بمركبات البروتين-الليجند واعية للحالة، تستفيد من مجموعات تسلسلية مُنقّاة تم إنشاؤها بواسطة AF-ClaSeq—وهي طريقة طوّرها فريقنا سابقاً. يعزل AF-ClaSeq الإشارات التطورية المشتركة ويختار التسلسلات التي تشفر تفضيلياً حالات بنيوية مميزة، كما يتنبأ بها AlphaFold2. ومن خلال تطبيق قيود تشكيلية مستمدة من المحاذاة المتعددة للتسلسلات، لاحظنا تحسناً كبيراً في التنبؤ بوضعيات الليجند. وفي الحالات التي فشل فيها AlphaFold3 سابقاً—مُنتجاً وضعيات غير صحيحة للليجند وتشكيلات بروتينية مقترنة بها—تمكنا من تصحيح التنبؤات باستخدام مجموعات تسلسلية تتوافق مع الحالة الوظيفية ذات الصلة، مثل الشكل غير النشط لإنزيم مرتبط بمعدّل خيفي سلبي.

One-sentence Summary

This study introduces a state-aware protein-ligand complex prediction strategy that integrates AlphaFold3 with AF-ClaSeq purified sequence subsets to isolate coevolutionary signals for distinct structural states, applying MSA-derived conformational restraints to overcome ligand pose memorization and correct AlphaFold3 prediction failures in dynamic binding events.

Key Contributions

  • A state-aware protein-ligand prediction strategy integrates AF-ClaSeq-derived purified sequence subsets into deep learning structure prediction pipelines. This approach filters multiple sequence alignments to retain only sequences encoding distinct structural conformations, applying evolutionary restraints that guide folding algorithms toward ligand-compatible states.
  • Accurate conformational sampling depends on sequence purity rather than alignment depth, with state-specific evolutionary signals distributed across diverse phylogenetic clades. This finding circumvents the signal-averaging limitations of attention-based transformers when processing heterogeneous evolutionary data.
  • Incorporating purified sequence subsets into the AlphaFold3 framework produces more accurate ligand placements and binding pocket geometries than default predictions. The method corrects prior modeling failures on targets requiring substantial conformational rearrangements or novel chemotype binding.

Introduction

Accurate prediction of protein-ligand complexes is a cornerstone of computational drug discovery and molecular modeling, with recent deep learning architectures like AlphaFold3 achieving breakthrough accuracy by leveraging coevolutionary signals from multiple sequence alignments. Despite these advances, current models frequently struggle with novel chemotypes and fail to capture binding pocket plasticity, often defaulting to a single static conformation or relying on memorized structural patterns from training data. To overcome these barriers, the authors leverage a sequence purification framework called AF-ClaSeq to isolate evolutionary subsets that specifically encode distinct protein conformations. By integrating these state-specific sequence subsets into AlphaFold3, they introduce conformational restraints that guide the model toward functionally relevant structural states, significantly improving ligand pose accuracy and enabling reliable predictions for dynamic binding events.

Method

The authors leverage a sequence purification strategy to improve the prediction of protein-ligand complexes involving allosteric inhibitors, particularly for proteins like EGFR and IL-1β where the target conformation is distinct from the default state predicted by AlphaFold3 (AF3). The core of the method involves biasing the multiple sequence alignment (MSA) used by AF3 to favor a specific protein conformation, thereby guiding the model toward the desired structural state. The process begins with a deep MSA generated from a large pool of homologous sequences. For EGFR, the initial MSA of 49,743 sequences was filtered to 43,365 using a coverage threshold. To assess the conformational bias encoded within this MSA, the authors performed a series of M-fold sampling experiments, predicting structures from randomly shuffled groups of sequences. The initial analysis revealed a strong bias toward the active state, which was attributed to the model's tendency to predict well-represented conformations.

To correct this bias, an iterative enrichment procedure was applied. The method uses the root-mean-square deviation (RMSD) of key structural elements—the αC helix and activation loop (A-Loop)—relative to a reference structure of the desired inactive state (PDB 2GS7) as a metric. Sequences are grouped into sets of six, and after prediction, the sequences yielding structures with the lowest RMSD to the inactive state are selected for the next iteration. This process was repeated four times, progressively enriching the sequence pool for those encoding the inactive conformation. The enriched sequence set from the final iteration was then used for M-fold sampling, generating a large number of predictions to map the conformational landscape. The results, visualized in a scatter plot, show a distribution of predicted structures along a conformational axis, with the local pLDDT score indicating the confidence of the predicted secondary structures.

To identify the most strongly biased sequences, a voting mechanism was employed. The prediction results were binned based on their RMSD to the active and inactive states. A "normal voting" scheme selected sequences that were most frequent in a given bin. A more stringent "enforced voting" scheme required not only the highest frequency but also that the frequency exceeded a threshold of 0.15, isolating sequences with a very strong conformational preference. This process identified a subset of sequences, designated as "pure sequences," that were highly biased toward the inactive state. These purified sequences were used to generate a final MSA for AF3 predictions of the protein-ligand complex, with the ligand's SMILES provided as input.

The method was validated on the EGFR-4 allosteric inhibitor complexes. Default AF3 predictions were highly divergent and failed to reproduce the correct ligand pose. However, predictions using the purified inactive-state sequences showed dramatically improved accuracy and consistency. The ligand RMSD dropped to near or below 2.5 Å, and the ligand atomic pLDDT scores increased significantly, indicating a reliable prediction of the binding mode. A similar approach was applied to the IL-1β/ligand system, where the key conformational change involves the displacement of the β4-5 loop. A similar iterative enrichment based on RMSD to the displaced loop state was used to generate an enriched sequence set. Predictions using the top 20 most frequent sequences from the target conformational region achieved perfect alignment with the experimental structure, demonstrating the power of conformational restraints derived from the MSA.

Experiment

The study evaluated two allosteric inhibitor systems, EGFR mutants and IL-1β cryptic pocket antagonists, by comparing default AlphaFold3 predictions against those guided by iterative sequence purification based on generalized conformational references. While default models consistently failed to capture the necessary allosteric or cryptic binding conformations and tended to revert to memorized training patterns, the purification approach successfully biased multiple sequence alignments toward functionally relevant inactive or allosteric states. Consequently, the refined predictions closely matched experimental crystal structures for both protein backbones and ligand poses, demonstrating that conformational constraint-guided sequence enrichment provides a broadly applicable framework for accurately modeling novel allosteric drug-target interactions.


بناء الذكاء الاصطناعي بالذكاء الاصطناعي

من الفكرة إلى الإطلاق — سرّع تطوير الذكاء الاصطناعي الخاص بك مع المساعدة البرمجية المجانية بالذكاء الاصطناعي، وبيئة جاهزة للاستخدام، وأفضل أسعار لوحدات معالجة الرسومات.

البرمجة التعاونية باستخدام الذكاء الاصطناعي
وحدات GPU جاهزة للعمل
أفضل الأسعار

HyperAI Newsletters

اشترك في آخر تحديثاتنا
سنرسل لك أحدث التحديثات الأسبوعية إلى بريدك الإلكتروني في الساعة التاسعة من صباح كل يوم اثنين
مدعوم بواسطة MailChimp