HyperAIHyperAI

Command Palette

Search for a command to run...

NumExpr: The Fast Library Most Data Scientists Overlook

A data scientist recently uncovered a library called "NumExpr," claiming that it outperforms NumPy by up to 15 times in certain complex numerical computations. NumPy, a cornerstone of Python numerical computing, is widely used in data science, machine learning, and model training. The discovery of NumExpr has sparked significant interest, leading to a series of tests to validate its performance claims. What is NumExpr? NumExpr is a fast numerical expression evaluator specifically designed to complement NumPy. It optimizes memory usage and leverages multi-threading to speed up array operations, making it particularly effective on multi-core CPUs. According to its GitHub page, NumExpr can significantly reduce computation time and memory usage, especially beneficial for large-scale tasks. Setting Up the Development Environment To test NumExpr, it's recommended to set up a dedicated Python environment using tools like conda or Miniconda. The steps include creating a new environment, activating it, and installing necessary libraries: Create a new environment and install required packages: (base) $ conda create -n numexpr_test python=3.12 -y (base) $ conda activate numexpr_test (numexpr_test) $ pip install numexpr (numexpr_test) $ pip install jupyter Start Jupyter Notebook: Open Jupyter Notebook by entering jupyter notebook in the command line. If it doesn't open automatically, use the provided URL to access it manually. Performance Comparison Tests Example 1: Simple Array Addition The first test involved simple addition of large arrays, run 5000 times: NumPy Version: python time_np_expr = timeit.timeit(lambda: 2*a + 3*b, number=5000) print(f"NumPy execution time: {time_np_expr} seconds") Result: 12.036807 seconds NumExpr Version: python time_ne_expr = timeit.timeit(lambda: ne.evaluate("2*a + 3*b"), number=5000) print(f"NumExpr execution time: {time_ne_expr} seconds") Result: 1.807596 seconds NumExpr demonstrated a remarkable 6x speed improvement in this test. Example 2: Monte Carlo Simulation to Calculate π The second test used Monte Carlo simulation to estimate π, run 1000 times: NumPy Version: python time_np_expr = timeit.timeit(lambda: monte_carlo_pi_numpy(num_samples), number=1000) print(f"NumPy execution time: {time_np_expr} seconds") Result: 10.642844 seconds NumExpr Version: python time_ne_expr = timeit.timeit(lambda: monte_carlo_pi_numexpr(num_samples), number=1000) print(f"NumExpr execution time: {time_ne_expr} seconds") Result: 8.077501 seconds While not as dramatic as the first example, NumExpr still showed a 20% improvement. This is due to its less optimized SUM() function. Example 3: Sobel Image Filter The third test focused on image edge detection using the Sobel filter, run 100 times: NumPy Version: python time_np_sobel = timeit.timeit(lambda: sobel_filter_numpy(image), number=100) print(f"NumPy execution time: {time_np_sobel} seconds") Result: 8.093792 seconds NumExpr Version: python time_ne_sobel = timeit.timeit(lambda: sobel_filter_numexpr(image), number=100) print(f"NumExpr execution time: {time_ne_sobel} seconds") Result: 4.938702 seconds NumExpr nearly doubled NumPy's performance in this scenario, a significant gain. Example 4: Fourier Series Approximation The final test evaluated the Fourier series approximation of a complex periodic function: NumPy Version: python start_time = time.time() approx_np = np.zeros_like(t) for n in range(1, n_terms + 1, 2): approx_np += (4 / (np.pi * n)) * np.sin(2 * np.pi * n * 5 * t) numpy_time = time.time() - start_time print(f"NumPy Fourier series time: {numpy_time:.6f} seconds") Result: 7.765800 seconds NumExpr Version: python start_time = time.time() approx_ne = np.zeros_like(t) for n in range(1, n_terms + 1, 2): approx_ne = ne.evaluate("approx_ne + (4 / (pi * n)) * sin(2 * pi * n * 5 * t)", local_dict={"pi": pi, "n": n, "approx_ne": approx_ne, "t": t}) numexpr_time = time.time() - start_time print(f"NumExpr Fourier series time: {numexpr_time:.6f} seconds") Result: 1.553160 seconds In this case, NumExpr achieved a 5x speed improvement, further highlighting its capabilities. Summary Through multiple performance tests, it is evident that NumExpr can indeed run faster than NumPy in certain numerical computing tasks, especially on multi-core CPUs. While it may not always meet the claimed 15x acceleration, several-fold improvements are still substantial. Data scientists and researchers requiring high-performance numerical computations should consider using NumExpr. Despite its limitations in supporting all NumPy operations, the performance gains can be highly beneficial. Industry Insider Evaluation and Company Background NumExpr is primarily developed and maintained by members of the Python scientific computing community, known for continuously enhancing Python's performance and functionality. Its optimizations in multi-threaded processing and memory management have earned positive reviews. Senior data scientists note its usefulness in handling large datasets, offering significant speed improvements without compromising code readability. Recently, the Qwen team announced the release of their latest large language model, Qwen3. This marks another significant advancement in the Qwen series, with the flagship model Qwen3-235B-A22B demonstrating impressive performance across coding, mathematics, and general tasks, rivaling top models like DeepSeek-R1, Grok-3, and Gemini-2.5-Pro. Notably, the smaller Mixture of Experts (MoE) model, Qwen3-30B-A3B, achieves similar or even superior performance with only a fraction of the parameter activation volume compared to QwQ-32B. Even the smallest variant, Qwen3-4B, can match the capabilities of Qwen2.5-72B-Instruct. Key Features Dual Thinking Modes: Qwen3 supports two operational modes: Thinking Mode: In this mode, the model performs step-by-step reasoning before providing an answer, ideal for complex problems requiring deep analysis. Non-Thinking Mode: For simpler tasks, the model responds quickly, prioritizing speed over depth. This flexibility allows users to optimize computational costs and maintain high-quality inference. Multilingual Support: Qwen3 supports 119 languages and dialects, covering various language families such as Indo-European, Sino-Tibetan, and Afroasiatic, broadening its global application potential. Pre-Training Process Qwen3 was pre-trained on significantly more data than its predecessor, Qwen2.5, reaching approximately 36 trillion tokens. This extensive corpus included web-sourced content and text from PDF documents, enhanced and optimized using Qwen2.5-VL and Qwen2.5. To enrich the dataset with more math and coding examples, the team utilized Qwen2.5-Math and Qwen2.5-Coder to generate synthetic data. Pre-training occurred in three stages: 1. Initial Stage (S1): The model was trained on over 30 trillion tokens, acquiring foundational language skills and common knowledge. 2. Enhancement Stage (S2): An additional 5 trillion tokens of knowledge-intensive data, such as STEM, coding, and reasoning tasks, were added. 3. Extension Stage (S3): Using high-quality data, the context length was extended to 32K tokens, enabling better handling of longer inputs. Post-Training Process To develop a model capable of both step-by-step reasoning and rapid responses, Qwen3 underwent a four-stage fine-tuning process: Long-Chain Reasoning Cold Start: The model was fine-tuned on diverse long-chain reasoning data, including math, coding, logic, and STEM problems. Reinforcement Learning Based on Reasoning: Rules-based rewards expanded computational resources, improving exploration and exploitation capabilities. Thinking Mode Integration: Fine-tuning with a combination of long-chain reasoning data and general instruction-tuning data ensured seamless integration of both modes. Regular Reinforcement Learning: Applied across over 20 general task domains to enhance overall performance and correct adverse behaviors. Usage Guidelines Weights for Qwen3 models are available on platforms like Hugging Face, ModelScope, and Kaggle, released under the Apache 2.0 license. Users can deploy the models using frameworks like SGLang and vLLM, with local development tools like Ollama, LMStudio, lla.cpp, and KTransformers recommended for ease of use. Dynamic control of model behavior in multi-round conversations is possible through tags like /think and /no_think in prompts or system messages. For instance: ```python from transformers import AutoModelForCausalLM, AutoTokenizer class QwenChatbot: def init(self, model_name="Qwen/Qwen3-30B-A3B"): self.tokenizer = AutoTokenizer.from_pretrained(model_name) self.model = AutoModelForCausalLM.from_pretrained(model_name) self.history = [] def generate_response(self, user_input): messages = self.history + [{"role": "user", "content": user_input}] text = self.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = self.tokenizer(text, return_tensors="pt") response_ids = self.model.generate(**inputs, max_new_tokens=32768)[0][len(inputs.input_ids[0]):].tolist() response = self.tokenizer.decode(response_ids, skip_special_tokens=True) self.history.append({"role": "user", "content": user_input}) self.history.append({"role": "assistant", "content": response}) return response if name == "main": chatbot = QwenChatbot() user_input_1 = "How many r's are in strawberry?" print(f"User: {user_input_1}") response_1 = chatbot.generate_response(user_input_1) print(f"Bot: {response_1}") print("----------------------") user_input_2 = "Then, how many r's are in blueberry? /no_think" print(f"User: {user_input_2}") response_2 = chatbot.generate_response(user_input_2) print(f"Bot: {response_2}") print("----------------------") user_input_3 = "Really? /think" print(f"User: {user_input_3}") response_3 = chatbot.generate_response(user_input_3) print(f"Bot: {response_3}") ``` Agent Functionality Qwen3 also excels in tool calling, which can be maximized using the Qwen-Agent. This package encapsulates tool invocation templates and parsers, reducing programming complexity. Users can define available tools to extend Qwen3's capabilities, supporting both built-in and custom tools. Community Support The development of Qwen3 is heavily supported by the community. The team acknowledges all contributors and invites more individuals and organizations to join, contributing to the model's continuous improvement. Future Outlook Qwen3 represents a crucial step towards Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI). The team plans to enhance the model through various methods, such as increasing data scale, adding parameters, extending context length, expanding modality support, and improving long-term reasoning through environment feedback. This evolution signifies a shift from training models to training agents, promising more meaningful assistance to users in future iterations. Industry experts highly commend Qwen3's innovative and practical features, foreseeing its strong position in the competitive landscape of large language models. As an Alibaba Cloud flagship project, Qwen3 is poised to make significant contributions to the field of AI.

Related Links