Intel Launches AutoRound for LLMs, VLMs; Advances Graph Neural Networks
Qwen3: The latest advancement in large language models Recently, the Qwen team announced the release of their newest generation of large language models, Qwen3, marking a significant breakthrough in the field. The flagship model, Qwen3-235B-A22B, performed exceptionally well on various benchmarks, including coding, mathematics, and general capabilities, matching or surpassing top models like DeepSeek-R1, Grok-3, and Gemini-2.5-Pro. Additionally, the smaller expert mixed (MoE) model, Qwen3-30B-A3B, achieved performance comparable to QwQ-32B despite having only about one-tenth of its parameters. Even the smallest model, Qwen3-4B, matched the performance of the larger Qwen2.5-72B-Instruct. Main Features Dual Thinking Modes Qwen3 supports two distinct operation modes: "Thinking Mode" and "Non-Thinking Mode." - Thinking Mode: In this mode, the model performs step-by-step reasoning, ideal for complex problems requiring deep analysis. - Non-Thinking Mode: The model responds quickly to simple queries, prioritizing speed over depth. This flexible design allows users to choose the mode that best fits their needs, optimizing cost and quality. Multilingual Support Qwen3 supports 119 languages and dialects, spanning multiple linguistic families including Indo-European, Sino-Tibetan, and Afroasiatic. This broad support makes it highly versatile for global applications. Pre-training Process Compared to Qwen2.5, Qwen3's pre-training data volume has nearly doubled to about 36 trillion tokens. This data includes web content and text from PDF documents, with optimizations and content improvements by Qwen2.5-VL and Qwen2.5. To enhance mathematical and coding abilities, the team utilized Qwen2.5-Math and Qwen2.5-Coder to generate synthetic data. Pre-training is divided into three stages: 1. Initial Stage (S1): The model is trained on over 30 trillion tokens to acquire basic language skills and common knowledge. 2. Enhancement Stage (S2): An additional 5 trillion tokens of knowledge-intensive data (e.g., STEM, coding, and reasoning tasks) are used for further training. 3. Expansion Stage (S3): High-quality, long-lasting data extends the context length to 32K tokens, enabling the model to handle longer inputs. Post-training Process To create a model capable of both step-by-step reasoning and fast response, the Qwen team implemented a four-phase post-training process: 1. Long-chain Reasoning Cold Start: Diverse long-chain reasoning data is fine-tuned, covering areas like mathematics, coding, logic reasoning, and STEM. 2. Reinforcement Learning Based on Reasoning: Computational resources are expanded using rule-based rewards to improve exploration and exploitation. 3. Thinking Mode Fusion: Long-chain reasoning data and common instruction fine-tuning data are combined to ensure seamless integration of the two modes. 4. Conventional Reinforcement Learning: Enhanced learning is applied across over 20 general domain tasks to boost overall performance and correct undesirable behaviors. Usage Guidelines Qwen3 weights are now available on platforms like Hugging Face, ModelScope, and Kaggle, all released under the Apache 2.0 license. Users can deploy the model using frameworks like SGLang and vLLM, and local development is supported by tools such as Ollama, LMStudio, lla.cpp, and KTransformers. The model can be dynamically controlled in multi-round dialogues using /think and /no_think tags in prompts or system messages. Conclusion Qwen3 represents a major step towards artificial general intelligence (AGI) and artificial superintelligence (ASI). The team plans to continue enhancing the model by increasing pre-training data, model parameters, context length, modal support, and improving long-term reasoning capabilities through environmental feedback. This transition marks a new era in model training, focusing on agents rather than just models. Industry Evaluation Industry insiders have praised Qwen3 for its innovative and practical features. As a flagship project of Alibaba Cloud, Qwen3 is expected to lead the way in future competitions among large language models. The combination of high performance and cost-effectiveness positions Qwen3 as a strong contender in the evolving landscape of AI. AutoRound: Advancing Post-Training Quantization Intel has introduced AutoRound, the latest generation of post-training quantization (PTQ) tools, aiming to achieve high accuracy and efficiency in low-bit quantization. This tool optimizes weight rounding and clipping ranges using gradient descent methods, particularly effective in 2-bit to 8-bit quantization, where it outperforms existing baselines. For example, AutoRound shows a 2.1 times increase in relative accuracy compared to common PTQ methods at INT2 precision. Key Advantages High Accuracy in Low-Bit Quantization Evaluation results demonstrate AutoRound's superior performance across various tasks, especially at 2-bit precision. At 4-bit precision, it maintains a competitive edge, as shown in low-bit open LLM leaderboards. Broad Compatibility Models Supported: AutoRound is compatible with almost all popular large language model architectures, including Qwen, LLaMA, and DeepSeek, with pre-quantized versions available in repositories like Hugging Face's OPEA, Kaitchup, and fbaldassarri. Devices Supported: It runs seamlessly on CPUs, Intel GPUs, and CUDA devices, catering to a wide range of deployment scenarios. Flexible and Efficient Quantization AutoRound requires only 200 tuning steps and a small calibration dataset of 128 samples to achieve high accuracy. For instance, quantizing a 720 billion parameter model takes just 37 minutes on an Nvidia A100 GPU. Using AutoRound Installation bash pip install auto-round Command Line Usage AutoRound offers three configuration options: auto-round (default), auto-round-best (highest accuracy), and auto-round-light (fastest quantization speed). Select the appropriate configuration based on model size and precision needs. bash auto-round \ --model Qwen/Qwen3-0.6B \ --bits 4 \ --group_size 128 \ --format "auto_round,auto_awq,auto_gptq" \ --output_dir ./tmp_autoround For 2-bit precision, auto-round-best or auto-round is recommended. API Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from auto_round import AutoRound model_name = "Qwen/Qwen3-0.6B" model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) bits, group_size, sym = 4, 128, True autoround = AutoRound( model, tokenizer, bits=bits, group_size=group_size, sym=sym, ) output_dir = "./tmp_autoround" autoround.quantize_and_save(output_dir, format='auto_round,auto_awq,auto_gptq') ``` Inference AutoRound automatically selects the best backend library for inference, suggesting more optimal backend installations when available. This flexibility enhances deployment efficiency across different devices. Converting GPTQ/AWQ Models to AutoRound Format Most GPTQ/AWQ models can be converted to AutoRound format to improve compatibility and support on Intel devices. Note that the quantization configuration will change post-conversion. Conclusion AutoRound has made significant strides in post-training quantization, combining high accuracy, efficiency, and broad compatibility. Whether for large-scale LLM deployment or edge computing, AutoRound provides the tools needed to achieve optimal performance with minimal overhead. Users are encouraged to try AutoRound and join its growing community to push the boundaries of efficient AI deployment. Industry Evaluation Experts in the AI community have lauded AutoRound for its innovative approach to quantization, noting substantial improvements in model accuracy and reduced resource consumption during the process. This has significant implications for large-scale AI deployment and edge computing. Intel's commitment to AI technology is further reinforced with the release of AutoRound, solidifying its position as a leader in the field. Link Prediction: From Heuristics to Graph Neural Networks Link prediction is crucial in various domains, such as social networks, e-commerce recommendation systems, and biological protein interactions. This article explores the techniques and methods used for link prediction, ranging from simple heuristics to advanced graph neural networks (GNNs). Heuristic Methods Local Heuristics Local heuristic methods rely on the immediate neighborhood of nodes. - Common Neighbors: Nodes with many shared neighbors are likely to form a connection. For example, nodes A and B share 3 neighbors. - Jaccard Coefficient: Measures the ratio of common neighbors to the total number of neighbors. For A and B, this would be 3/5 = 0.6. - Adamic-Adar Index: Assigns higher weight to less popular common neighbors, making it useful for detecting connections between nodes A and B but less so for C and D. - Preferential Attachment: Nodes with higher degrees are more likely to form new connections. For A and B, with degrees 5 and 3 respectively, the attachment score is 15. Global Heuristics Global heuristic methods consider the entire graph structure. - Katz Index: Calculates the sum of all paths between nodes, with path lengths exponentially weighted. Shorter paths contribute more to the Katz index between nodes C and E. - Rooted PageRank: Evaluates the probability of a random walk from one node to another, indicating higher likelihoods for nodes with higher probabilities. Machine Learning and GNN Methods Machine Learning Approaches Link prediction can be framed as a binary classification problem, using feature vectors derived from various heuristics, node degrees, and embeddings. A logistic regression model trained on the Cora dataset, using Jaccard coefficients, yielded AUC and AP scores of 0.6958 and 0.6890, respectively. ```python from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_auc_score, average_precision_score from torch_geometric.datasets import Planetoid from torch_geometric.transforms import RandomLinkSplit Data preparation dataset = Planetoid('path', name='Cora') data_all = dataset[0] transform = RandomLinkSplit(num_val=0.05, num_test=0.1, is_undirected=True, split_labels=True) train_data, val_data, test_data = transform(data_all) Feature extraction def jaccard(u, v, adj): u_neighbors = set(adj[u].nonzero().view(-1).tolist()) v_neighbors = set(adj[v].nonzero().view(-1).tolist()) return len(u_neighbors & v_neighbors) / len(u_neighbors | v_neighbors) if (len(u_neighbors | v_neighbors)) > 0 else 0.0 def extract_features(pairs, adj): return [[jaccard(u, v, adj)] for u, v in pairs] X_train = extract_features(train_data.pos_edge_label_index.T.tolist() + train_data.neg_edge_label_index.T.tolist(), adj) y_train = [1] * train_data.pos_edge_label_index.size(1) + [0] * train_data.neg_edge_label_index.size(1) X_test = extract_features(test_data.pos_edge_label_index.T.tolist() + test_data.neg_edge_label_index.T.tolist(), adj) y_test = [1] * test_data.pos_edge_label_index.size(1) + [0] * test_data.neg_edge_label_index.size(1) Model training and evaluation model = LogisticRegression().fit(X_train, y_train) probs = model.predict_proba(X_test)[:, 1] print(f"[ML Heuristic] AUC: {roc_auc_score(y_test, probs):.4f}, AP: {average_precision_score(y_test, probs):.4f}") ``` GNN Approaches VGAE (Variational Graph Autoencoder): Uses graph convolutional networks (GCNs) to encode nodes and generate mean and variance vectors. These vectors are sampled from a normal distribution to infer edge existence. On the Cora dataset, VGAE achieved significantly better AUC and AP scores (0.9032 and 0.9179) compared to machine learning methods. ```python from torch_geometric.nn import GCNConv, VGAE class VGAEEncoder(torch.nn.Module): def init(self, in_channels, out_channels): super().init() self.conv1 = GCNConv(in_channels, 2 * out_channels) self.conv_mu = GCNConv(2 * out_channels, out_channels) self.conv_logstd = GCNConv(2 * out_channels, out_channels) def forward(self, x, edge_index): x = self.conv1(x, edge_index).relu() return self.conv_mu(x, edge_index), self.conv_logstd(x, edge_index) vgae = VGAE(VGAEEncoder(dataset.num_features, 32)) optimizer = torch.optim.Adam(vgae.parameters(), lr=0.01) Training x = data_all.x edge_index = train_data.edge_index for epoch in range(1, 101): vgae.train() optimizer.zero_grad() z = vgae.encode(x, edge_index) loss = vgae.recon_loss(z, train_data.pos_edge_label_index) loss += (1 / data_all.num_nodes) * vgae.kl_loss() loss.backward() optimizer.step() Evaluation vgae.eval() z = vgae.encode(x, edge_index) def score_edges(edges): return vgae.decoder(z, torch.tensor(edges).t().to(z.device)).view(-1).cpu().numpy() vgae_scores = np.concatenate([ score_edges(test_data.pos_edge_label_index.T.tolist()), score_edges(test_data.neg_edge_label_index.T.tolist()) ]) vgae_labels = np.array([1] * test_data.pos_edge_label_index.size(1) + [0] * test_data.neg_edge_label_index.size(1)) print(f"[VGAE] AUC: {roc_auc_score(vgae_labels, vgae_scores):.4f}, AP: {average_precision_score(vgae_labels, vgae_scores):.4f}") ``` SEAL (Subgraph Embedding Link Prediction): Extracts local subgraphs around each node pair and uses GNNs to learn and predict links, making it suitable for sparse graphs and diverse network types. On the Cora dataset, SEAL performed similarly to VGAE, with AUC and AP scores of 0.9038 and 0.9176, respectively. ```python from torch_geometric.nn.models import SEAL seal = SEAL(input_dim=dataset.num_features, hidden_dim=32, num_layers=3) optimizer = torch.optim.Adam(seal.parameters(), lr=0.01) Training for epoch in range(1, 51): seal.train() optimizer.zero_grad() z = seal.encode(x, edge_index) loss = seal.recon_loss(z, train_data.pos_edge_label_index) loss += (1 / data_all.num_nodes) * seal.kl_loss() loss.backward() optimizer.step() Evaluation seal.eval() z = seal.encode(x, edge_index) seal_scores = np.concatenate([ score_edges(test_data.pos_edge_label_index.T.tolist()), score_edges(test_data.neg_edge_label_index.T.tolist()) ]) seal_labels = np.array([1] * test_data.pos_edge_label_index.size(1) + [0] * test_data.neg_edge_label_index.size(1)) print(f"[SEAL] AUC: {roc_auc_score(seal_labels, seal_scores):.4f}, AP: {average_precision_score(seal_labels, seal_scores):.4f}") ``` Conclusion Link prediction methods vary widely, from simple heuristics to sophisticated GNN models. While heuristics offer quick and easy solutions, they are limited in performance. GNN-based methods like VGAE and SEAL significantly enhance prediction accuracy by learning complex graph structures. The choice of method depends on specific task requirements and data scale. Industry Evaluation Link prediction is a vital task in graph data processing with extensive applications. This article showcases a progressive approach from basic heuristics to advanced GNN models, providing valuable insights into the capabilities and applications of graph neural networks. The performance improvements demonstrated by VGAE and SEAL highlight the superiority of GNNs in handling intricate graph structures. Tools like PyTorch Geometric make these models accessible and easier to implement and optimize, fostering their adoption in research and industry. Company Background The Cora dataset is a citation network primarily used to study graph neural networks in academic settings. PyTorch Geometric is an open-source Python library focused on graph deep learning, offering a rich collection of GNN models and data processing tools widely used in both academia and industry.