HyperAIHyperAI

Command Palette

Search for a command to run...

il y a 3 ans

K+ Means : Une amélioration de l'algorithme de clustering K-Means

Srikanta Kolay Kumar S. Ray Abhoy Chand Mondal

Algorithme K-means

20 heures de calcul sur RTX 5090 pour seulement $1 (valeur $7)
Aller à Notebook

Résumé

Le K-means (MacQueen, 1967) [1] est l'un des algorithmes d'apprentissage non supervisé les plus simples qui résolvent le problème bien connu du clustering. La procédure suit une méthode simple et facile pour classifier un ensemble de données donné en un nombre prédéfini, disons K, de clusters. La détermination de K est une tâche difficile et il n'est pas connu quelle valeur de K peut partitionner les objets selon notre intuition. Pour surmonter ce problème, nous proposons l'algorithme K+ Means. Cet algorithme est une amélioration de l'algorithme K-Means.

One-sentence Summary

To address the difficulty of selecting an appropriate number of clusters in traditional K-means, the authors propose K+ Means, an enhanced unsupervised clustering algorithm that simplifies the data partitioning process.

Key Contributions

  • The K+ Means algorithm enhances standard K-Means clustering by automatically determining the final number of clusters based on data characteristics rather than requiring a predefined count.
  • The method calculates minimum, maximum, and average intra-cluster dissimilarity to detect outliers, subsequently promoting high-distance objects as new centroids until cluster representatives and assignments stabilize.
  • Evaluations against standard K-Means demonstrate that the approach isolates anomalous points into separate clusters while preserving the O(tk²n) time complexity and eliminating sensitivity to outlier placement.

Introduction

No source text was provided. Please share the abstract or body snippet so I can draft a concise summary that outlines the technical context and its practical significance, identifies the limitations of existing approaches, and explains the authors' main contribution from an explainer perspective.

Dataset

  • Dataset composition and sources: The authors rely on a small illustrative dataset presented in a single table to demonstrate the algorithm.
  • Key details for each subset: Only one example subset is provided. It contains a limited number of objects chosen specifically for a worked walkthrough rather than large scale modeling.
  • How the paper uses the data: The dataset functions exclusively as a practical example to walk readers through the K+ Means clustering steps. The excerpts do not mention formal training splits, mixture ratios, or evaluation protocols.
  • Processing and visualization details: The raw table values are directly mapped to a 2D coordinate space for visual demonstration. No cropping, metadata construction, or additional preprocessing pipelines are described.

Method

The authors leverage the K-Means algorithm as a foundational component to develop the K+ Means algorithm, which addresses key limitations of the original method. The framework begins by initializing K centroids and assigning data points to the nearest cluster, following the standard K-Means procedure. After convergence, the algorithm computes intra-cluster dissimilarity metrics—minimum, maximum, and average—for each cluster. The core enhancement lies in dynamically identifying outliers based on these metrics. Specifically, clusters exhibiting a high average intra-cluster distance, particularly when accompanied by a large maximum distance, are flagged for further analysis. The data point corresponding to the maximum distance within such a cluster is identified as an outlier and treated as a new cluster representative. This process iteratively increases the number of clusters by one, reassigning data points and recalculating centroids until no new representatives are formed and cluster assignments stabilize.

Experiment

The evaluation compares the standard K-means algorithm with the K+ Means approach using a two-cluster configuration on a dataset of objects. The experiments validate that while K-means establishes initial groupings based on selected centroids, the K+ Means algorithm delivers more accurate and robust cluster assignments, particularly in effectively managing outlier objects. Although the K+ Means method entails slightly higher computational complexity, it consistently achieves superior clustering outcomes overall.

The authors apply the K-means algorithm with initial centroids at p1 and p5, resulting in two clusters where one contains points p1, p2, and p3, and the other contains the remaining points. A comparison is made between K-means and K+ Means algorithms, with the latter producing better clustering results despite higher complexity, particularly in handling outlier objects. K-means clustering groups points into two clusters based on initial centroid selection. K+ Means algorithm produces superior clustering compared to K-means, especially for outlier objects. The K+ Means algorithm has higher computational complexity than K-means.

The evaluation compares standard K-means clustering against the K+Means algorithm using defined initial centroids to partition data into distinct groups. This setup validates the comparative effectiveness of both methods in forming coherent clusters while assessing their handling of anomalous data points. The findings demonstrate that K+Means consistently yields superior clustering quality, particularly in managing outlier objects, despite incurring greater computational overhead.


Créer de l'IA avec l'IA

De l'idée au lancement — accélérez votre développement IA avec le co-codage IA gratuit, un environnement prêt à l'emploi et le meilleur prix pour les GPU.

Codage assisté par IA
GPU prêts à l’emploi
Tarifs les plus avantageux

HyperAI Newsletters

Abonnez-vous à nos dernières mises à jour
Nous vous enverrons les dernières mises à jour de la semaine dans votre boîte de réception à neuf heures chaque lundi matin
Propulsé par MailChimp