T-Distribution Stochastic Neighbour Embedding
T – Distributed Random Neighbor EmbeddingIt is a machine learning method for dimensionality reduction and can be used to identify correlation patterns. Its main advantage is that it preserves local structure. This means that points that are close to each other in high-dimensional data space are still close when projected into low-dimensional space.
T-SNE Features
In low-dimensional space, using a t distribution with a heavier long-tail distribution can avoid crowding and optimization problems.
T-SNE Gradient Advantage
- For dissimilar points, a smaller distance is used to generate a larger gradient to repel the points;
- This repulsion is not infinite, to avoid dissimilar points being too far apart.
T-SNE is not enough
- T-SNE is mainly used for visualization, so it performs poorly in other aspects, such as dimensionality reduction on the test set. Since there is no explicit estimation part, it cannot be directly reduced on the test set.
- T-SNE tends to preserve local features. For data sets with high intrinsic dimensions, it is impossible to fully map them into 2-3 dimensional space.
- T-SNE does not have a unique optimal solution or estimation part. To make an estimation, you need to consider dimensionality reduction and then build a model such as a regression equation.
- The training is too slow, and many tree-based algorithms are improved on T-SNE.