HyperAI

Data Preprocessing

Data preprocessing refers to the manipulation, filtering or enhancement of data before analyzing it, and is usually an important step in the data mining process.The goal of data preprocessing is to improve the quality of data and make it more suitable for specific data mining tasks.

Common steps in data preprocessing

Data preprocessing involves cleaning and transforming raw data to make it suitable for analysis. Some common steps of data preprocessing include:

  • Data cleaning:This involves identifying and correcting errors or inconsistencies in the data, such as missing values, outliers, and duplicates. Data cleaning can be done using various techniques, such as imputation, deletion, and transformation.
  • Data Integration:This involves combining data from multiple sources to create a unified data set. Data integration can be challenging as it requires processing data with different formats, structures, and semantics. Techniques such as record linkage and data fusion can be used for data integration.

References

【1】https://en.wikipedia.org/wiki/Data_Preprocessing

【2】https://www.geeksforgeeks.org/data-preprocessing-in-data-mining/