HyperAI

Tool Recommendation: High-performance Wheels Built Specifically for GPUs

6 years ago
Headlines
Recommended List
Dao Wei
特色图像

By Super Neuro

GPUs and databases have their own strengths. GPUs are good at processing tasks such as machine learning, while databases are good at calculations with specific requirements, such as complex connection calculations.

There are currently some database solution products that provide GPU acceleration, including the familiar MapD and Kinetica. Today we are going to introduce a young open source product BlazingSQL.

BlazingSQL is a GPU-accelerated database query tool built on RAPIDS. BlazingSQL extends RAPIDS and enables users to run SQL queries directly on Apache Arrow in GPU memory.

In addition to its adaptability to GPUs and speed, which are much faster than other similar products, most SQL data warehouses require enterprises to extract and copy data themselves, while BlazingDB can read data directly from Apache Parquet, which simplifies the data channel architecture while also supporting high-performance loads.

More importantly, BlazingSQL has also received investments from NVIDIA and Samsung, and maintains a very good cooperative relationship with NVIDIA.

Performance Evaluation

To compare the performance of the tools, you need to perform a comparative benchmark test and run an end-to-end analytical workload first.

* The steps are: Data Lake > FTL Feature Engineering > XGBoost Training

* We built two comparably priced clusters on GCP, using Apache Spark and BlazingSQL respectively.

* The final result is that BlazingSQL runs 5 times faster than Apache Spark.

(The new version runs 20 times faster than Apache Spark on the same workload.)

A good horse deserves a good saddle

The reason why Blazing SQL can achieve efficient running results is also because it luxuriously uses GCP's T4 GPU, which is a new entry-level GPU that is cheap but has strong performance.

Using the new T4 GPUs cut our costs in half, reducing the Apache Spark cluster to 4 CPU nodes to keep prices consistent.

But the end result is that even if the GPU memory is halved, the entire workload will be significantly faster.

Blazing SQL engineers have also developed a GPU execution kernel built specifically for GPU DataFrames (GDF) called the “SIMD Expression Interpreter.”

It would take a long time to describe the SIMD expression interpreter, so here I will just share some details about how it works and why such performance improvements occur.

The performance improvement of the SIMD expression interpreter is mainly achieved through the following key steps:

1. The machine supports multiple inputs. These inputs can be GDF columns, text, functions.

2. When loading these inputs, the SIMD expression interpreter optimizes the allocation of registers on the GPU, which improves GPU utilization and ultimately improves performance.

3. In addition, the virtual machine processes these inputs and generates multiple outputs simultaneously. For example, consider the following SQL query: SELECT colA + colB * 10, sin(colA) — cos(colD) FROM tableA