HyperAI

HyperAI Super Neural X Apache|Community Over Code Asia 2025 AI Special

特色图像

From July 25th to 27th, Community Over Code Asia 2025, the official global series of conferences of the Apache Software Foundation (ASF), will open at the Conference Center of Zhongguancun National Independent Innovation Demonstration Zone in Beijing!

The AI topic of this conference will be jointly presented by Wang Chenhan, founder and CEO of OpenBayes Bayesian Computing, Tan Zhongyi, member of the Apache Software Foundation, and Du Junping, founder and CEO of Datastrato, as expert producers. They will also bring you the latest information and cutting-edge practices in the construction and development of the Apache community from 14:00 to 17:15 on July 25. HyperAI will participate in this event as a cooperative community and set up a market booth. Everyone is welcome to come and play~

Special Topic

The AI forum is a professional communication platform focusing on the combination of artificial intelligence (AI) technology and Apache open source projects. The forum aims to bring together global developers, researchers and industry users to discuss the application and development of AI technology in the open source ecosystem, showcase cutting-edge technologies, share practical experience, and promote the implementation of open source AI solutions in various industries.

Topics suitable for this forum are:

  • AI-related projects under the Apache Foundation (for example, focusing on open source AI frameworks and basic library projects such as Apache TVM, Mahout, Singa, SystemML, etc.)
  • Optimization of Apache single projects in AI scenarios (such as Spark MLib, Flink ML, etc.)
  • AI solutions for industrial scenarios based on the combination of multiple Apache projects, such as the AI business construction method of a certain company

Producer

Wang Chenhan

Founder and CEO of OpenBayes Bayesian Computing, one of the translators of Apache TVM Chinese documents, and deputy director of Tianjin University Bayesian Computing Joint Research Center. He has worked for technology companies such as Walt Disney Interactive Media Group and AVOS Systems, and served as secretary-general of the open source organization CLUE Benchmarks Foundation.

Tan Zhongyi

Deputy Secretary-General of COPU, 20+ years of open source veteran, member of the Apache Software Foundation.

Du Junping

Founder and CEO of Datastrato, Director of LF AI & DATA Foundation, Member of Apache Software Foundation, Expert in Big Data Technology and Open Source, Member of Apache Open Source Foundation, Committer and PMC of Apache Hadoop, OZone, YuniKorn and other projects, Mentor of Apache Gravitino, NuttX and other projects. Former Chairman of the Open Source Committee of Fortune 500 Companies, Director of Big Data Platform R&D, General Manager of Open Source Business, former Head of Hortonworks Hadoop Computing Team, etc.

Agenda highlights

📅 July 25th 14:00 – 17:15

Speech topic: Maximizing heterogeneous GPU utilization in a cloud-native way | Unleashing the power of HAMi 

Sharing time: July 25, 14:00-14:30

Topic introduction: With the increasing popularity of AI, Kubernetes has become the de facto standard for AI infrastructure. However, the increasing number of clusters containing multiple AI devices (such as NVIDIA, Intel, Huawei Ascend, Haiguang, Muxi, Cambrian, Tianshu Zhixin, Suiyuan, etc.) has brought major challenges. AI devices are expensive, how to improve resource utilization? How to better integrate with K8s clusters? How to uniformly manage heterogeneous AI devices, support flexible scheduling strategies and achieve observability, all face many challenges. The HAMi project came into being. This speech includes:

  • How Kubernetes manages heterogeneous AI devices (unified scheduling, observability)
  • Improving device utilization through GPU sharing
  • Ensuring QoS for high-priority tasks in GPU sharing scenarios
  • Supports flexible GPU scheduling policies (NUMA affinity/anti-affinity, packing/dispersion, etc.)
  • Integration with other projects (such as Volcano, scheduler-plugin, etc.)
  • Real case sharing of production-level users
  • Current challenges and future plans

Speakers:

Xiao Zhang|dynamia.ai founder, a cloud-native enthusiast and community maintainer, focusing on the AI infrastructure

Xiao Zhang is the founder of dynamia.ai (focusing on infrastructure, AI, multi-cluster management, cluster lifecycle management (LCM) and the Open Container Initiative (OCI)). He is also an active contributor to the community and a cloud native technology enthusiast. He is currently a member of the Kubernetes/Kubernetes Special Interest Group (Kubernetes-sigs) and serves as the maintainer of the Karmada, kubean and cloudtty projects. In addition, he is also the co-sponsor and maintainer of the CNCF HAMi project, with the GitHub ID wawa0210.

Yu Yin|Product Owner @dynamia.ai, Open Source Maintainer @HAMi, Driving GPU Virtualization & AI Infra Innovation on Kubernetes

Yu Yin is the product owner of dynamia.ai and the core maintainer of HAMi, an open source project for GPU virtualization and heterogeneous computing on Kubernetes. With practical experience in building AI infrastructure, Yu focuses on enabling scalable GPU sharing, device pooling, and intelligent scheduling for multi-architecture environments. He has helped enterprise users in logistics, telecommunications, and finance adopt heterogeneous resource management in production. At the same time, Mr. Yu is also an active advocate of open source applications in China and leads the internationalization of the HAMi community.


Speech topic: Apache Doris's exploration and practice in the field of AI

Sharing time: July 25, 14:30-15:00

Topic introduction: As a popular OLAP real-time analysis database, Apache Doris has built or is planning more AI-related functions and peripheral components, such as vector retrieval, MCP, RAG and other functional modules, in the context of the current era of rapid changes in the AI wave. This speech will announce the current progress of Doris in the direction of AI through communication and demonstration.

Speakers:

YiJia Su|Apache Doris Committer, SelectDB Solutions Architect, PowerData Sponsor

Apache Doris Committer, Apache Doris community evangelist, Doris-MCP contributor, SelectDB senior solution architect, PowerData community initiator, has assisted hundreds of companies in the Apache Doris community in completing real-time data warehouse construction and optimization evolution.


Speech topic: Apache Gravitino | Metadata management solutions in the AI era 

Sharing time: July 25, 15:00-15:30

Topic Introduction:

Metadata management has become the cornerstone of the AI era. This talk will explore how Apache Gravitino can achieve large-scale unstructured data and model management, and how Xiaomi uses Gravitino for large language model (LLM) data processing and model lifecycle management.

Speech outline:

1. Challenges of dataset and model management in AI workflows, and how Gravitino addresses these issues through its Fileset Catalog (structured AI dataset governance) and Model Catalog (unified model lifecycle management)

2. Maximize operational efficiency and governance compliance by leveraging Gravitino’s tagging system, lineage tracking, and credential management capabilities

3. Fileset’s practice in Xiaomi’s data processing: In AI scenarios, data processing involves multiple stages such as downloading, extracting, filtering, deduplication, and training. Using Fileset improves the pipeline efficiency between data and AI engines, implements end-to-end data set management, and establishes a unified metadata view.

4. Xiaomi AI Big Model Management Practice: How Xiaomi manages big model metadata, deploys model services, and our future plans for integration with Gravitino

Speakers:

Xiaojing Fang|Apache Gravitino PPMC & datastrato software engineer

Apache Gravitino PPMC member, focusing on data and AI infrastructure systems.

Han Zhang|Software R&D Engineer at Xiaomi

Apache Gravitino contributor, responsible for the research and development of Xiaomi AI development platform.


Speech topic: Catalog as context | Using metadata to drive and manage the next wave of AI development 

Sharing time: July 25, 15:45-16:15

Topic Introduction: Developing powerful AI tools is our theme this year, and intelligent agents and basic models have made significant progress in various fields. But the core questions remain: How do we provide these applications with data that works effectively? How can enterprise-level scale be achieved? What is the essence of context? This speech will explore the current status of the big data ecosystem, the challenges facing AI data platforms, and why data catalogs and metadata are the only viable path to efficient and controllable AI development. We will use the open source framework Apache Gravitino as an example to explain why such solutions must remain vendor neutral.

Speakers:

Jerry Shao|Datastrato, CTO

Jerry Shao is the co-founder and CTO of Datastrato, and has been working in the field of open source big data for more than a decade. As an Apache member, he is a committer and PMC member of Apache Spark and Apache Inlong, and the founder of the Apache Gravitino (incubating) project.


Speech topic: From data to AI | Building a unified analysis platform based on Apache Cloudberry

Sharing time: July 25, 16:15-16:45

Topic Introduction:

Enterprises today have difficulty realizing the full potential of AI due to fragmented data systems, inefficient processing flows, and the gap between analytics and machine learning. Apache Cloudberry, an open source MPP data warehouse, redefines this paradigm by deeply integrating data processing and AI capabilities, removing barriers and accelerating innovation.

This talk will show how Cloudberry can:

  • Unified execution: Run native AI/ML models (such as PyTorch, Scikit-learn) directly on the data warehouse
  • Multimodal analysis: Processing structured and unstructured data (PDF, images, etc.) in a unified framework
  • Intelligent data applications: Building RAG-enhanced question-answering systems, conversational BI, and multimodal search

You’ll learn how to converge data and intelligence into a unified platform to simplify your architecture while scaling AI workloads.

Speakers:

Chuanxin Bian|HashData, Data & AI Engineer

Dr. Chuanxin Bian is a data scientist and applied mathematician specializing in deep learning, natural language processing, and time series modeling. He holds a PhD in Applied Mathematics from the Hong Kong Polytechnic University. He currently works at HashData, leading the development of AI tools such as HashML and ChatData, as well as AIGC applications. He was a senior R&D engineer at Baidu, where he participated in the development of the Wenxin big model, built a time series model based on PaddleTS, and promoted the upgrade of the user portrait system. He is proficient in Python and deep learning frameworks, and is good at connecting theory with practice to promote AI innovation.


Speech topic: Analysis of Apache Doris hybrid search technology

Sharing time: July 25, 16:45-17:15

Topic Introduction:

Apache Doris's hybrid search capability combines traditional full-text search (keyword-based dictionary search) with vector search (semantic-based search) to provide more accurate search results. This capability is particularly suitable for complex search scenarios that require both keyword matching and semantic understanding, such as e-commerce, content recommendation, and knowledge base search.

1. Core Principles of Hybrid Retrieval

Hybrid search takes full advantage of the advantages of both search methods:

  • Full-text search (BM25): Based on inverted index and keyword matching, it is good at accurately matching the query terms entered by the user. Doris uses the BM25 algorithm (default) to calculate the relevance score between the document and the query, which is suitable for structured text search.
  • Vector retrieval (semantic search): By converting text into vectors (embedding), a machine learning model is used to calculate the semantic similarity between the query and the document. It is good at understanding the query intent and context.
  • Fusion mechanism: Use specific scoring and ranking techniques (such as Reciprocal Rank Fusion/RRF or Convex Combination/CC) to integrate the search results of the two methods and balance the lexical relevance and semantic relevance.

2. Technical Implementation Architecture

Doris hybrid search relies on the following technical components and workflows:

1. Field type support

  • Text field: Generate an inverted index through a word segmenter to support full-text search
  • Vector field: Use the model to convert text into vector type storage

2. Composite Index

  • Supports storing both text and vector fields
  • Enable hybrid query functionality

3. Query execution process

  • Dictionary query: Use match query to retrieve documents matching keywords (based on BM25 algorithm)
  • Vector query: use knn query or ANN index to retrieve semantically similar documents (based on cosine similarity, etc.)
  • Hybrid query: execute two queries in parallel and combine the results through a fusion algorithm

4. Result fusion strategy

  • RRF (Reverse Ranking Fusion): Calculates a comprehensive score based on the ranking of the document in different query results, emphasizing documents that rank high in multiple search methods
  • CC (Convex Combination): Integrate BM25 and vector query scores through weighted summation, and manually adjust the weight balance
  • Supports further optimization of result ranking through script_score or Rerank model

Speakers:

Lee Happen|SelectDB Senior-RD

Apache Doris PMC Member


📅 July 26th 14:00 – 16:45

Speech topic: Integrating large language models into CI/CD pipelines | Practical cases for improving Apache project code quality

Sharing time: July 26, 14:00-14:30

Topic introduction: This talk will explore how to systematically integrate large language models (LLMs) into GitHub Actions to improve the code quality and security of Apache projects. The content is based on the real case of apache/brpc#2911. It is particularly suitable for developers and maintainers who are looking for actionable, low-overhead strategies to design, implement, and deploy AI agents to ensure code quality.

We will guide the audience to think about the following aspects:

  • Human-machine collaboration: Comparing the traditional “copilot” model (human-driven, synchronous) with asynchronous AI agent workflows in pipelines, highlighting efficiency gains and trade-offs
  • Practice: Learn how to use LLMs to perform targeted tasks such as code robustness scanning and CVE detection under resource constraints - without relying on RAG, fine-tuning or MCP

Speakers:

Yi Yuan|software developer

CNCF kepler project maintainer, mainly responsible for project pipeline related work.


Speech topic: Lance | Data formats for the cutting edge of multimodal AI

Sharing time: July 26, 14:30-15:00

Topic Introduction: Cutting-edge training of multimodal models requires processing PB-level multimodal AI data, including videos, images, and long texts. The complexity and scale of new AI data pose challenges to existing data infrastructure.

The Apache-licensed Lance format is built on Apache Arrow and Apache Datafusion, with the core written in Rust, and the development team consists of PMC members from Apache Hadoop, Apache HBase, Apache Iceberg, Apache Arrow, and Delta Lake. The Lance format is a new columnar storage format and table format focused on AI, deeply inspired by the Apache Parquet, Apache Iceberg, and Apache Hudi projects. The salient features of the Lance format are random access and zero-cost schema evolution - two favorite features of AI engineers. These features distinguish Lance from Apache Parquet, Apache ORC, or Apache Iceberg, making it more suitable for feature engineering and training of multimodal AI.

The Lance format has been adopted by many leading AI companies such as MidJourney, WorldLabs, Runway ML, Character AI, etc.

This conference will be jointly presented by LanceDB CTO Xu Lei (Apache Hadoop PMC member) and ByteDance Volcano Engine expert Yang Hua (Apache Hudi PMC member):

  • Infrastructure Challenges Supporting Workloads at Leading-Edge Multimodal AI Companies
  • The core design principles behind the Lance format
  • How ByteDance Volcano Engine builds the Lance data lake based on the Lance format and supports the world's top AI companies

Speakers:

Lei Xu|CTO @ LanceDB

Chief Technology Officer of LanceDB. Member of Apache Hadoop/HDFS PMC. Previously led the machine learning platform and data infrastructure team at Cruise Automation.

Vino Yang: Volcano Engine Technical Expert, Lance Committer.

Volcano Engine technical expert, Lance Committer. Apache Hudi/Kyuubi PMC member.


Speech topic: Quantum AI | The dawn of the era of super intelligence

Sharing time: July 26, 15:00-15:30

Topic Introduction: The integration of quantum computing and artificial intelligence is about to break the limits of classical computing and usher in the era of super intelligence. When quantum-enhanced models can achieve exponential learning, solve complex problems in seconds, and redefine the decision-making process, are we entering a new era where AI will surpass the scope of human understanding?

This talk will explore the next frontier of artificial intelligence, focusing on:

Quantum Machine Learning (QML): How AI can use quantum mechanics to achieve unprecedented problem-solving capabilities

Quantum Neural Networks: Can AI Learn at an Unimaginable Scale?

Quantum superposition and parallelism: Will AI evolve from sequential reasoning to multidimensional thinking?

Theoretical inspiration: Will quantum AI become the cornerstone of artificial super intelligence (ASI)?

As we move toward AI-driven scientific discovery, posthuman intelligence, and a potential knowledge singularity, this talk will challenge traditional AI paradigms and explore the possibilities when AI no longer thinks like humans—but thinks far better than humans.

Speakers:

Prakul Hiremath|VISVESVARAYA TECHNOLOGICAL UNIVERSITY, UNDERGRADUATE B.TECH STUDENT AND BIOLOOP, CEO AND FOUNDER

Prakul Hiremath is a researcher, technologist, and innovator from VTU Belagavi, India, working at the intersection of artificial intelligence, cybersecurity, and system optimization. With a deep passion for artificial intelligence, computing systems, and future technologies, he is actively involved in research on AI-driven cybersecurity, medical signal analysis, and Industry 4.0 innovations.

His work covers AI-driven threat detection, predictive analytics, and high-performance computing, with a focus on pushing the boundaries of intelligent systems and autonomous decision-making. He is also exploring AI-enhanced life, posthuman intelligence, and knowledge evolution, and is committed to contributing breakthrough insights into future technologies.

In addition to AI and cybersecurity research, Prakul is actively involved in Bioloop, an innovative research project that combines biotechnology and artificial intelligence to develop cutting-edge solutions in the fields of sustainability, healthcare, and industrial automation. Bioloop aims to revolutionize bio-AI systems by creating a new generation of intelligent ecosystems that optimize biological and technological processes.

At Community Over Code Asia 2025, Prakul will discuss the unstoppable rise of AI, the challenges it brings, and its profound implications for the future of technology, social development, and human intelligence.


Topic: To use MCP or not to use MCP? Designing composable AI systems using open protocols

Sharing time: July 26, 15:45-16:15

Topic introduction: With the diversified development of AI applications, the point-to-point integration of customized tools and services has brought about problems of fragmentation and high maintenance costs. As an open protocol, Model Context Protocol (MCP) establishes a unified connection method between AI agents and external tools through standardized discovery, calling and interaction processes. The sharing will introduce the challenges of interoperability in the current AI ecosystem. Taking llama-nexus as an example, it will show how to achieve the composability and flexible orchestration of AI systems through natively developed MCP services. Finally, the strategic significance of open protocols in promoting the interconnection and interoperability of AI systems, reducing integration complexity and promoting innovation, as well as the future development direction of the composable AI ecosystem will be discussed. github.com/LlamaEdge/llama-nexus

Speakers:

Miley Fu|CNCF Ambassador, Founding member of open source runtime WasmEdge

Miley is a developer evangelist who is passionate about empowering developers to build and contribute to open source projects. She is the co-chair and keynote speaker for KubeCon+Open Source Summit 2024 and AI Dev China 2024. She is a founding member of WasmEdge runtime in CNCF sandbox, has worked on the project for more than 6 years, and has spoken at events such as KubeCon, KCD, CloudDay Italy, DevRelCon, Japan Open Source Summit, AWS User Group, Global AI Note, KubeDay Singapore, etc. Miley writes technical content and organizes developer events, including KCD Beijing, KCD Shenzhen, WebAssembly & Rust Taipei, Singapore, etc. meetups.


Speech topic: Why do we need an open source AI gateway?

Sharing time: July 26, 16:15-16:45

Topic introduction: In the era of AI application explosion, API traffic has surged, but challenges such as cost control, security compliance, and multi-model management still exist. Apache APISIX, the world's most active open source API gateway, will officially launch AI gateway capabilities in 2025 to provide developers and enterprises with a one-stop solution.

Why choose APISIX AI Gateway?

Unified AI service management: Seamlessly proxy requests for mainstream large models such as OpenAI, Deepseek, and QWen, avoid vendor lock-in, and optimize cost/performance through dynamic traffic orchestration.

Security and Compliance: Built-in AI protection plug-ins (such as ai-prompt-guard to filter malicious inputs and ai-rate-limiting to implement token-based rate limiting) ensure data privacy and compliance.

Developer-first experience: hot reload plugins, multi-language support (Java/Python/Go), and native integration with microservices and Kubernetes ecosystem.

Whether you are a developer or an enterprise, APISIX AI Gateway can accelerate the implementation of AI applications and unleash the potential for innovation.

Speakers:

Yuansheng Wang|API7.ai, CTO

Apache APISIX PMC member Apache Foundation member.


🌟 Click the QR code below to scan and purchase tickets

The quantity is limited, come and participate 👆