HyperAI超神经

A data lake is a central location in a cloud architecture that can store large amounts of raw data in native formats. Unlike data warehouses or silos, a data lake uses a flat architecture with object storage to maintain metadata for files.

The term “data lake” was coined in 2015, but the concept has been in practical use for more than 10 years. Data lakes address the need for a scalable data repository that can store large volumes of files of various types and sources for later analysis.

A data lake can be thought of as a centralized location that can hold petabytes of data in its original, native format. Compared to hierarchical data warehouses that store data in files and folders, data lakes use a flat architecture with object-based storage. Big data operations can find and retrieve data more easily across regions and with better performance through metadata tags and identifiers. It also allows multiple applications to utilize their own data formats.

References

【1】https://www.hpe.com/cn/zh/what-is/data-lake.html

Data Lake

References