Achieving fine-grained Characterization of TCR Sequences! The Deep Learning Framework DeepTCR Expands Immunology Research Methods; Backed by 50,000 Lung Cancer Patient Data! Lung Cancer Risk Details Lung Cancer Risk factors.

10 months ago

T cell receptor sequencing (TCR-Seq) is an important application of next-generation sequencing (NGS) technology, enabling researchers to systematically characterize the diversity of adaptive immune responses. When analyzing T cell receptor sequencing data, traditional methods (such as motif searching or sequence alignment) have achieved results, but have also gradually exposed their limitations.When identifying low-frequency antigen-specific T cell responses in the body, their signals are often overwhelmed by a large number of nonspecific T cell backgrounds.This reflects the challenges that traditional methods face in identifying signals from noise.

As the demand for more refined characterization of TCR sequences continues to grow, researchers have turned their attention to deep learning technologies represented by convolutional neural networks (CNNs).DeepTCR emerged as a deep learning-based immune receptor sequencing analysis framework.The framework can learn CDR3 sequences, V/D/J gene usage, and MHC molecule type characteristics from TCR sequencing immune repertoire data and construct a joint representation to model highly complex TCR sequencing data.

DeepTCR systematically applies the deep learning framework to TCR sequence analysis, which not only expands the analytical methods of immunological research, but also further demonstrates the wide application of deep learning technology in different fields.

HyperAI's official website has launched "DeepTCR: Predicting TCR-Peptide Affinity Using Deep Learning." Come and try it out!

Online use：https://go.hyper.ai/gKmgi

From September 8th to September 12th, here’s a quick overview of the hyper.ai official website updates:

* High-quality public datasets: 10

* Selected high-quality tutorials: 2

* This week's recommended papers: 5

* Community article interpretation: 5 articles

* Popular encyclopedia entries: 5

* Top conferences with deadline in September: 5

Visit the official website:hyper.ai

Selected public datasets

1. New Plant Diseases Plant Disease Image Dataset

New Plant Diseases is an image dataset for plant disease identification and leaf classification research. It covers healthy leaves and various disease types. It is widely suitable for developing and evaluating machine learning and deep learning models, especially in crop health monitoring, disease identification, precision agriculture models and academic research, and has important benchmark value.

Direct use: https://go.hyper.ai/RKYtW

2. Intel Image Classification Natural Scene Image Classification Dataset

Intel Image Classification is an image classification dataset released by Intel that aims to classify images of natural and man-made scenes. The dataset contains approximately 25,000 color images distributed across six categories, including buildings and forests.

Direct use: https://go.hyper.ai/qgbeX

3. LongPage novel reasoning dataset

LongPage is the first comprehensive dataset for training artificial intelligence models to write complete novels with complex reasoning capabilities. It supports cold-start supervised fine-tuning to reinforcement learning training processes and is suitable for training large-scale language models with hierarchical reasoning capabilities and improving the coherence and planning of long-form writing.

Direct use: https://go.hyper.ai/odoKA

4. Lung Cancer Risk Dataset

Lung Cancer Risk is a tabular dataset for lung cancer risk prediction and health factor analysis. It aims to explore the association between smoking habits, lifestyle, and lung cancer risk through multidimensional features. It is suitable for lung cancer risk modeling, medical machine learning research, health prediction system development, and teaching experiments. It is particularly valuable in classification modeling and risk assessment scenarios.

Direct use：https://go.hyper.ai/YGFzG

5. IFEval-Inverse Reverse Instruction Evaluation Dataset

IFEval-Inverse is an adversarial instruction evaluation dataset for large language models released by ByteDance Seed in collaboration with Nanjing University, Tsinghua University, and other institutions. It aims to test whether the model can break the training inertia and achieve true instruction compliance when faced with reverse or abnormal instructions.

Direct use: https://go.hyper.ai/IcTqj

6. FinReflectKG Financial Knowledge Graph Dataset

FinReflectKG is a large-scale knowledge graph dataset for the financial sector. It aims to extract structured semantic relationships from corporate regulatory documents and promote the development of knowledge graph research in the financial field. It is suitable for entity recognition, relationship extraction, knowledge graph construction, time series analysis, and large-scale language model-driven information extraction evaluation and downstream financial intelligent application development in the financial field.

Direct use: https://go.hyper.ai/EB5em

7. WenetSpeech Yue Cantonese corpus dataset

WenetSpeech Yue is a large, multi-dimensionally annotated speech corpus for Cantonese speech recognition (ASR) and text-to-speech synthesis (TTS). It aims to fill the gap in resources in the Cantonese field and promote the training and evaluation of high-quality Cantonese models.

Direct access: https://go.hyper.ai/cICOv

8. UCIT Continuous Instruction Tuning Dataset

UCIT is a benchmark dataset for continuous instruction tuning of large multimodal language models. Each sample in this dataset consists of a task description (prompt/instruction) and the corresponding correct execution expectation (ground-truth response), which is used to measure the performance of the model under zero-shot conditions.

Direct use: https://go.hyper.ai/TZPwY

9. LoongBench Multi-Domain Reasoning Benchmark Dataset

LoongBench is a multi-domain reasoning evaluation dataset designed to provide LLM with a multi-domain, verifiable training and evaluation resource. The dataset contains 8,729 natural language questions, covering 12 reasoning-intensive domains, including advanced mathematics and advanced physics.

Direct use: https://go.hyper.ai/AcFOZ

10. CA‑1 Human Preference Alignment Dataset

CA-1 focuses on humans' value judgments and preferences for the default behaviors of AI models. It is a human feedback behavior dataset that combines model-generated content and annotator evaluations. It is suitable for studying group alignment differences, guiding model behavior norms, and developing value-sensitive reward mechanisms.

Direct use: https://go.hyper.ai/mXznO

Selected Public Tutorials

1. Wan2.2-S2V-14B: Film-grade audio-driven video generation

Wan2.2-S2V-14B is an open-source audio-driven video generation model developed by the Alibaba Tongyi Wanxiang team. Using only a single still image and audio, it can generate cinematic-quality digital human videos up to minutes long, supporting a variety of image types and image sizes. The model integrates multiple innovative technologies to enable audio-driven video generation for complex scenes, supporting long video generation and multi-resolution training and inference.

Run online: https://go.hyper.ai/TlSai

2. DeepTCR: Deep Learning to Predict TCR-Peptide Affinity

DeepTCR is a deep learning-based immune receptor sequencing analysis framework that can predict affinity from TCR sequencing immune repertoire data, extract and learn TCR CDR3 sequences, V/D/J gene usage or MHC molecule type characteristics, and jointly represent TCRs to model highly complex TCR sequencing data. It can extract antigen-specific TCRs from single-cell RNA-Seq with background noise and T cell culture-based assays.

Run online: https://go.hyper.ai/gKmgi

💡We have also established a Stable Diffusion tutorial exchange group. Welcome friends to scan the QR code and remark [SD tutorial] to join the group to discuss various technical issues and share application results~

This week's paper recommendation

1. Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

This paper proposes Swarm sAmpling Policy Optimization (SAPO), a fully decentralized and asynchronous reinforcement learning post-training algorithm. SAPO is designed for decentralized networks of heterogeneous computing nodes. Each node autonomously manages its own policy model while "sharing" its trajectory with other nodes. The algorithm does not rely on explicit assumptions about latency, model homogeneity, or hardware configuration, and nodes can operate independently on demand.

Paper link: https://go.hyper.ai/MWeWF

2. Why Language Models Hallucinate

This paper proposes that the fundamental reason language models experience hallucinations is that their training and evaluation mechanisms tend to reward guesswork rather than acknowledge uncertainty. It further analyzes the statistical roots of hallucinations in modern training processes. The systematic penalty imposed by large models on uncertain responses suggests that current mainstream, yet biased, benchmark scoring methods should be revised, rather than introducing additional metrics to assess hallucinations.

Paper link: https://go.hyper.ai/eXoOR

3. Reverse-Engineered Reasoning for Open-Ended Generation

This paper proposes a new paradigm—Reverse-Engineered Reasoning (REER)—that fundamentally changes the way reasoning is constructed. Unlike traditional methods that construct reasoning processes from the bottom up through trial and error or imitation, REER adopts a "reverse" strategy. Starting from known high-quality solutions, it computationally discovers the potential, step-by-step, deep reasoning paths that can generate these solutions.

Paper link: https://go.hyper.ai/xFygJ

4. Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

This paper presents Parallel-R1, the first reinforcement learning (RL) framework for complex real-world reasoning tasks that enables parallel thinking behaviors. This framework uses a progressive curriculum design to explicitly address the cold-start problem of training parallel thinking in RL.

Paper link: https://go.hyper.ai/s2OlH

5. WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Leveraging a carefully constructed, high-quality dataset, this paper successfully trained a state-of-the-art web proxy model, WebExplorer-8B, through supervised fine-tuning combined with reinforcement learning. This model supports context lengths up to 128KB and can execute up to 100 tool calls, enabling the solution of long-term problems. On multiple information retrieval benchmarks, WebExplorer-8B achieved state-of-the-art performance among models of similar size.

Paper link: https://go.hyper.ai/NusbG

More AI frontier papers:https://go.hyper.ai/iSYSZ

Community article interpretation

1. By correlating gene expression data with cell morphology images, the Chinese University of Hong Kong and others developed a transcriptome-guided diffusion model to accelerate phenotypic drug development.

Researchers from the Chinese University of Hong Kong, the Mohamed bin Zayed University for Artificial Intelligence, and other institutions have proposed a scalable transcriptome-guided diffusion model, MorphDiff, specifically designed to simulate the response of cell morphology to perturbations with high fidelity. This model, based on the Latent Diffusion Model (LDM) architecture, uses L1000 gene expression profiles as conditional input for denoising training.

View the full report: https://go.hyper.ai/f7WeP

2. From "blind screening" to "precise positioning," a team from China University of Petroleum has launched AlphaPPIMI, which surpasses existing methods in predicting PPIs interfacial modifiers.

A joint research team from China University of Petroleum and Yonsei University has integrated multiple advanced technologies to develop a new framework, called AlphaPPIMI. Combining a large-scale pre-trained model with an adaptive learning mechanism, this tool aims to address the core challenge of discovering modulators that specifically target the PPI interface, providing strong support for the future development of PPI-targeted drugs.

View the full report: https://go.hyper.ai/4tp0M

3. Apple Intelligence is fully implemented, and core product AI features are upgraded: real-time translation/visual intelligence/hypertension monitoring

At 1:00 AM Beijing time on September 10th, Apple's 2025 Fall Conference focused entirely on AI, announcing AI upgrades for three core products: the iPhone 17, Apple Watch Series 11, and AirPods Pro 3. Apple Intelligence has evolved from a concept showcase last year to a full-scale implementation, covering scenarios such as real-time translation, health monitoring, and visual intelligence. The next-generation A19 and M19 Pro chips serve as the cornerstone of its computing power.

View the full report: https://go.hyper.ai/IimjS

4. From ethical safeguards to medical history management, Wuhan University and others have proposed the Healthcare Agent, whose proactive and relevant consultations surpass closed-source models such as GPT-4.

Research teams from Wuhan University and Nanyang Technological University jointly released a Healthcare Agent consisting of three components: dialogue, memory, and processing. It can identify patients' medical purposes and automatically detect medical ethics and safety issues.

View the full report: https://go.hyper.ai/AdG2j

5. From Apple acquisition rumors to ASML's $1.3 billion investment to become a major shareholder, uncovering Mistral AI's technical and business secrets

In early September, Apple was reportedly interested in acquiring French startup Mistral AI. Semiconductor giant ASML followed suit, leading its Series C funding round with €1.3 billion. The company's valuation has now soared to $14 billion, making it a leading force in the European AI field.

View the full report: https://go.hyper.ai/zsQBu

Achieving fine-grained Characterization of TCR Sequences! The Deep Learning Framework DeepTCR Expands Immunology Research Methods; Backed by 50,000 Lung Cancer Patient Data! Lung Cancer Risk Details Lung Cancer Risk factors.

10 months ago

Information

AI for Science

Artificial Intelligence

Dataset

Deep Learning

HyperAI's official website has launched "DeepTCR: Predicting TCR-Peptide Affinity Using Deep Learning." Come and try it out!

Online use：https://go.hyper.ai/gKmgi

From September 8th to September 12th, here’s a quick overview of the hyper.ai official website updates:

* High-quality public datasets: 10

* Selected high-quality tutorials: 2

* This week's recommended papers: 5

* Community article interpretation: 5 articles

* Popular encyclopedia entries: 5

* Top conferences with deadline in September: 5

Visit the official website:hyper.ai

Selected public datasets

1. New Plant Diseases Plant Disease Image Dataset

Direct use: https://go.hyper.ai/RKYtW

2. Intel Image Classification Natural Scene Image Classification Dataset

Direct use: https://go.hyper.ai/qgbeX

3. LongPage novel reasoning dataset

Direct use: https://go.hyper.ai/odoKA

4. Lung Cancer Risk Dataset

Direct use：https://go.hyper.ai/YGFzG

5. IFEval-Inverse Reverse Instruction Evaluation Dataset

Direct use: https://go.hyper.ai/IcTqj

6. FinReflectKG Financial Knowledge Graph Dataset

Direct use: https://go.hyper.ai/EB5em

7. WenetSpeech Yue Cantonese corpus dataset

Direct access: https://go.hyper.ai/cICOv

8. UCIT Continuous Instruction Tuning Dataset

Direct use: https://go.hyper.ai/TZPwY

9. LoongBench Multi-Domain Reasoning Benchmark Dataset

Direct use: https://go.hyper.ai/AcFOZ

10. CA‑1 Human Preference Alignment Dataset

Direct use: https://go.hyper.ai/mXznO

Selected Public Tutorials

1. Wan2.2-S2V-14B: Film-grade audio-driven video generation

Run online: https://go.hyper.ai/TlSai

2. DeepTCR: Deep Learning to Predict TCR-Peptide Affinity

Run online: https://go.hyper.ai/gKmgi

This week's paper recommendation

1. Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper link: https://go.hyper.ai/MWeWF

2. Why Language Models Hallucinate

Paper link: https://go.hyper.ai/eXoOR

3. Reverse-Engineered Reasoning for Open-Ended Generation

Paper link: https://go.hyper.ai/xFygJ

4. Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper link: https://go.hyper.ai/s2OlH

5. WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Paper link: https://go.hyper.ai/NusbG

More AI frontier papers:https://go.hyper.ai/iSYSZ

Community article interpretation

View the full report: https://go.hyper.ai/f7WeP

2. From "blind screening" to "precise positioning," a team from China University of Petroleum has launched AlphaPPIMI, which surpasses existing methods in predicting PPIs interfacial modifiers.

View the full report: https://go.hyper.ai/4tp0M

3. Apple Intelligence is fully implemented, and core product AI features are upgraded: real-time translation/visual intelligence/hypertension monitoring

View the full report: https://go.hyper.ai/IimjS

View the full report: https://go.hyper.ai/AdG2j

5. From Apple acquisition rumors to ASML's $1.3 billion investment to become a major shareholder, uncovering Mistral AI's technical and business secrets

View the full report: https://go.hyper.ai/zsQBu

Command Palette

Achieving fine-grained Characterization of TCR Sequences! The Deep Learning Framework DeepTCR Expands Immunology Research Methods; Backed by 50,000 Lung Cancer Patient Data! Lung Cancer Risk Details Lung Cancer Risk factors.

Selected public datasets

Selected Public Tutorials

This week's paper recommendation

Community article interpretation

Popular Encyclopedia Articles

Command Palette

Achieving fine-grained Characterization of TCR Sequences! The Deep Learning Framework DeepTCR Expands Immunology Research Methods; Backed by 50,000 Lung Cancer Patient Data! Lung Cancer Risk Details Lung Cancer Risk factors.

Selected public datasets

Selected Public Tutorials

This week's paper recommendation

Community article interpretation

Popular Encyclopedia Articles

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

In Just 30 Minutes, the Biological multi-agent Robin Successfully Integrated 550 Research Papers, Establishing an Autonomous Research Loop and Identifying dAMD Candidate therapies.

Command Palette

Achieving fine-grained Characterization of TCR Sequences! The Deep Learning Framework DeepTCR Expands Immunology Research Methods; Backed by 50,000 Lung Cancer Patient Data! Lung Cancer Risk Details Lung Cancer Risk factors.

Selected public datasets

Selected Public Tutorials

This week's paper recommendation

Community article interpretation

Popular Encyclopedia Articles

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

In Just 30 Minutes, the Biological multi-agent Robin Successfully Integrated 550 Research Papers, Establishing an Autonomous Research Loop and Identifying dAMD Candidate therapies.

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

In Just 30 Minutes, the Biological multi-agent Robin Successfully Integrated 550 Research Papers, Establishing an Autonomous Research Loop and Identifying dAMD Candidate therapies.

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

In Just 30 Minutes, the Biological multi-agent Robin Successfully Integrated 550 Research Papers, Establishing an Autonomous Research Loop and Identifying dAMD Candidate therapies.