NumExpr: The High-Performance Library Outshining NumPy in Complex Numerical Computations
Recently, a lesser-known library called NumExpr caught the attention of the data science community due to its claims of achieving speeds up to 15 times faster than NumPy, the dominant library for numerical computations in Python. NumPy is foundational for tasks ranging from exploratory data analysis to machine learning and model training. Given NumPy's prevalence, any significant performance improvements from a competing library could be highly valuable. I decided to test NumExpr's claims by running a series of common numerical operations and comparing the results. What is NumExpr? NumExpr is a fast numerical expression evaluator designed specifically for NumPy arrays. It optimizes array operations, reduces memory usage, and leverages multiple CPU cores to scale performance efficiently. This makes it particularly useful for computationally intensive tasks where speed and resource management are critical. Setting Up the Development Environment To get started, I created a separate Python environment using Conda, a popular package and environment management system. This ensures that any changes or installations don’t affect the rest of the system. Here are the steps: Create a new environment: (base) $ conda create -n numexpr_test python=3.12 (base) $ conda activate numexpr_test Install necessary packages: (numexpr_test) $ pip install numexpr (numexpr_test) $ pip install jupyter Start Jupyter Notebook: (numexpr_test) $ jupyter notebook Comparing NumExpr and NumPy Performance Example 1: Simple Array Addition I began with a straightforward vectorized addition of two large arrays, each containing a million random numbers, and repeated the operation 5000 times. Using NumPy: ```python import numpy as np import timeit a = np.random.rand(1000000) b = np.random.rand(1000000) time_np_expr = timeit.timeit(lambda: 2a + 3b, number=5000) print(f"Execution time (NumPy): {time_np_expr} seconds") ``` Using NumExpr: ```python import numpy as np import numexpr as ne import timeit a = np.random.rand(1000000) b = np.random.rand(1000000) time_ne_expr = timeit.timeit(lambda: ne.evaluate("2a + 3b"), number=5000) print(f"Execution time (NumExpr): {time_ne_expr} seconds") ``` The results were striking: NumExpr completed the task in 1.81 seconds, while NumPy took 12.04 seconds. This represents a 6-fold improvement over NumPy. Example 2: Monte Carlo Simulation to Estimate Pi Next, I ran a Monte Carlo simulation to estimate the value of Pi, a computationally intensive task often used in numerical analysis. Using NumPy: ```python def monte_carlo_pi_numpy(num_samples): x = np.random.rand(num_samples) y = np.random.rand(num_samples) inside_circle = (x2 + y2) <= 1.0 pi_estimate = (np.sum(inside_circle) / num_samples) * 4 return pi_estimate num_samples = 1000000 time_np_expr = timeit.timeit(lambda: monte_carlo_pi_numpy(num_samples), number=1000) pi_estimate = monte_carlo_pi_numpy(num_samples) print(f"Estimated Pi (NumPy): {pi_estimate}") print(f"Execution Time (NumPy): {time_np_expr} seconds") ``` Using NumExpr: ```python def monte_carlo_pi_numexpr(num_samples): x = np.random.rand(num_samples) y = np.random.rand(num_samples) inside_circle = ne.evaluate("(x2 + y2) <= 1.0") pi_estimate = (np.sum(inside_circle) / num_samples) * 4 return pi_estimate num_samples = 1000000 time_ne_expr = timeit.timeit(lambda: monte_carlo_pi_numexpr(num_samples), number=1000) pi_estimate = monte_carlo_pi_numexpr(num_samples) print(f"Estimated Pi (NumExpr): {pi_estimate}") print(f"Execution Time (NumExpr): {time_ne_expr} seconds") ``` NumExpr was faster by 2.56 seconds, achieving a 20% improvement. However, it's worth noting that NumExpr doesn’t have an optimized sum() function, which is why NumPy was still used for that part of the calculation. Example 3: Sobel Filter for Image Edge Detection I then implemented a Sobel filter, a common technique in image processing for edge detection, using both libraries. Using NumPy: ```python def sobel_filter_numpy(image): img_array = np.array(image.convert('L')) gradient_x = convolve(img_array, sobel_x) gradient_y = convolve(img_array, sobel_y) gradient_magnitude = np.sqrt(gradient_x2 + gradient_y2) gradient_magnitude *= 255.0 / gradient_magnitude.max() return Image.fromarray(gradient_magnitude.astype(np.uint8)) image = Image.open("/mnt/d/test/taj_mahal.png") time_np_sobel = timeit.timeit(lambda: sobel_filter_numpy(image), number=100) sobel_image_np = sobel_filter_numpy(image) sobel_image_np.save("/mnt/d/test/sobel_taj_mahal_numpy.png") print(f"Execution Time (NumPy): {time_np_sobel} seconds") ``` Using NumExpr: ```python def sobel_filter_numexpr(image): img_array = np.array(image.convert('L')) gradient_x = convolve(img_array, sobel_x) gradient_y = convolve(img_array, sobel_y) gradient_magnitude = ne.evaluate("sqrt(gradient_x2 + gradient_y2)") gradient_magnitude *= 255.0 / gradient_magnitude.max() return Image.fromarray(gradient_magnitude.astype(np.uint8)) image = Image.open("/mnt/d/test/taj_mahal.png") time_ne_sobel = timeit.timeit(lambda: sobel_filter_numexpr(image), number=100) sobel_image_ne = sobel_filter_numexpr(image) sobel_image_ne.save("/mnt/d/test/sobel_taj_mahal_numexpr.png") print(f"Execution Time (NumExpr): {time_ne_sobel} seconds") ``` NumExpr was almost twice as fast, reducing the execution time from 8.09 seconds to 4.94 seconds. Example 4: Fourier Series Approximation Finally, I compared the two libraries in approximating a square wave using a Fourier series, a method involving the sum of sine waves to simulate complex periodic functions. Using NumPy: ```python pi = np.pi t = np.linspace(0, 1, 1000000) signal = np.sign(np.sin(2 * np.pi * 5 * t)) n_terms = 10000 start_time = time.time() approx_np = np.zeros_like(t) for n in range(1, n_terms + 1, 2): approx_np += (4 / (np.pi * n)) * np.sin(2 * np.pi * n * 5 * t) numpy_time = time.time() - start_time ``` Using NumExpr: ```python pi = np.pi t = np.linspace(0, 1, 1000000) signal = np.sign(np.sin(2 * np.pi * 5 * t)) n_terms = 10000 start_time = time.time() approx_ne = np.zeros_like(t) for n in range(1, n_terms + 1, 2): approx_ne = ne.evaluate("approx_ne + (4 / (pi * n)) * sin(2 * pi * n * 5 * t)", local_dict={"pi": pi, "n": n, "approx_ne": approx_ne, "t": t}) numexpr_time = time.time() - start_time ``` Again, NumExpr outperformed NumPy, completing the task in 1.86 seconds compared to NumPy's 9.31 seconds, a 5-fold improvement. Summary Both NumPy and NumExpr are robust libraries for numerical computations in Python. While I didn’t observe the full 15x speed improvement claimed by NumExpr, the gains were substantial across various tests. For simple array operations, NumExpr was nearly 6 times faster. In more complex tasks like estimating Pi and implementing a Sobel filter, it offered a 20% to 100% improvement over NumPy. If you frequently perform heavy numerical computations and need to optimize performance, NumExpr is definitely worth considering. Its ability to use multiple CPU cores and reduce memory usage can lead to significant speedups with little downside. Some functions, like sum(), are still better handled by NumPy, but overall, integrating NumExpr into your workflow can enhance your computational efficiency. For more detailed information on NumExpr, including documentation and examples, you can visit the library's GitHub page. Industry Insights and Company Profiles Industry insiders praise NumExpr for its efficiency and ease of integration with existing NumPy-based workflows. The library, maintained by a dedicated community, has gained traction among high-performance computing professionals and data scientists looking to push the boundaries of their projects. Companies like Enthought, a leader in scientific computing software, have integrated NumExpr into their tools, further validating its effectiveness. While NumPython remains the go-to library for general numerical computations, NumExpr offers a compelling alternative for specialized, performance-critical tasks.